0% found this document useful (0 votes)
31 views

GCP Snowflake.pptx

Uploaded by

Satyajit Ligade
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

GCP Snowflake.pptx

Uploaded by

Satyajit Ligade
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 83

SNOWFLAKE COURSE CONTENT

SNOWFLAKE COURSE CONTENT


1. Snowflake Overview and Architecture
1. Architecture
2. Connectivity

2. Getting started with snowflake


1. Virtual warehouse
2. Databases, Tables, views, data types

3. Data Loading and Unloading


4. Snowflake Queries
5. Data Sharing and collaboration
6. Alert notifications
7. Snowflake Security
8. Data Governance
9. Snowflake account management
10. Data Recovery
11. Performance optimization
SNOWFLAKE OVERVIEW AND ARCHITECTURE
WHAT IS SNOWFLAKE?
◼ Snowflakeʼs Data Cloud is powered by an advanced data platform provided as a self-managed service.
Snowflake enables data storage, processing, and analytic solutions that are faster, easier to use, and far more
flexible than traditional offerings.
◼ The Snowflake data platform is not built on any existing database technology or “big dataˮ software platforms
such as Hadoop.
◼ Instead, Snowflake combines a completely new SQL query engine with an innovative architecture natively
designed for the cloud. To the user, Snowflake provides all of the functionality of an enterprise analytic
database, along with many additional special features and unique capabilities.
◼ Snowflake is a true self-managed service, meaning:
• There is no hardware (virtual or physical) to select, install, configure, or manage.
• There is virtually no software to install, configure, or manage.
• Ongoing maintenance, management, upgrades, and tuning are handled by Snowflake.
◼ Snowflake runs completely on cloud infrastructure. All components of Snowflakeʼs service (other than optional
command line clients, drivers, and connectors), run in public cloud infrastructures.
◼ Snowflake uses virtual compute instances for its compute needs and a storage service for persistent storage of
data. Snowflake cannot be run on private cloud infrastructures (on-premises or hosted).
◼ Snowflake is not a packaged software offering that can be installed by a user. Snowflake manages all aspects
of software installation and updates.
SNOWFLAKE ARCHITECTURE Snowflakeʼs architecture is a hybrid of
traditional
• shared-disk and
• shared-nothing database
architectures.

Similar to shared-disk architectures,


Snowflake uses a central data repository for
persisted data that is accessible from all
compute nodes in the platform.

But similar to shared-nothing architectures,


Snowflake processes queries using MPP
(massively parallel processing) compute
clusters where each node in the cluster
stores a portion of the entire data set locally.

This approach offers the data management


simplicity of a shared-disk architecture, but
with the performance and scale-out benefits
of a shared-nothing architecture.
SNOWFLAKE ARCHITECTURE
◼ Database Storage
◼ When data is loaded into Snowflake, Snowflake reorganizes that data into its internal optimized, compressed, columnar
format. Snowflake stores this optimized data in cloud storage.
◼ Snowflake manages all aspects of how this data is stored — the organization, file size, structure, compression, metadata,
statistics, and other aspects of data storage are handled by Snowflake. The data objects stored by Snowflake are not directly
visible nor accessible by customers; they are only accessible through SQL query operations run using Snowflake.
◼ Query Processing
◼ Query execution is performed in the processing layer. Snowflake processes queries using “virtual warehouses”. Each virtual
warehouse is an MPP compute cluster composed of multiple compute nodes allocated by Snowflake from a cloud provider.
◼ Each virtual warehouse is an independent compute cluster that does not share compute resources with other virtual
warehouses. As a result, each virtual warehouse has no impact on the performance of other virtual warehouses.
◼ Cloud Services
◼ The cloud services layer is a collection of services that coordinate activities across Snowflake. These services tie together all
of the different components of Snowflake in order to process user requests, from login to query dispatch. The cloud services
layer also runs on compute instances provisioned by Snowflake from the cloud provider.
◼ Services managed in this layer include:
• Authentication
• Infrastructure management
• Metadata management
• Query parsing and optimization
• Access control
SUPPORTED CLOUD PLATFORMS
◼ Snowflake is provided as a self-managed service that runs completely on cloud infrastructure. This
means that all three layers of Snowflakeʼs architecture (storage, compute, and cloud services) are
deployed and managed entirely on a selected cloud platform.
◼ A Snowflake account can be hosted on any of the following cloud platforms:
• Amazon Web Services AWS
• Google Cloud Platform GCP
• Microsoft Azure Azure
SNOWFLAKE REGIONS
SNOWFLAKE EDITIONS
◼ Standard Edition
◼ Standard Edition is our introductory level offering, providing full, unlimited access to all of Snowflake’s standard features. It
provides a strong balance between features, level of support, and cost.
◼ Enterprise Edition
◼ Enterprise Edition provides all the features and services of Standard Edition, with additional features designed specifically
for the needs of large-scale enterprises and organizations.
◼ Business Critical Edition
◼ Business Critical Edition, formerly known as Enterprise for Sensitive Data (ESD), offers even higher levels of data protection
to support the needs of organizations with extremely sensitive data, particularly PHI data that must comply with HIPAA
and HITRUST CSF regulations.
◼ It includes all the features and services of Enterprise Edition, with the addition of enhanced security and data protection. In
addition, database failover/failback adds support for business continuity and disaster recovery.
◼ Virtual Private Snowflake VPS
◼ Virtual Private Snowflake offers our highest level of security for organizations that have the strictest requirements, such as
financial institutions and any other large enterprises that collect, analyze, and share highly sensitive data.
◼ It includes all the features and services of Business Critical Edition, but in a completely separate Snowflake environment,
isolated from all other Snowflake accounts (i.e. VPS accounts do not share any resources with accounts outside the VPS).
However, you may choose to enable data sharing with non-VPS customers.
CONNECTING TO SNOWFLAKE
◼ Snowflake supports multiple ways of connecting to the service:
• A web-based user interface from which all aspects of managing and using Snowflake can be accessed.
• Command line clients (e.g. SnowSQL) which can also access all aspects of managing and using Snowflake.
• ODBC and JDBC drivers that can be used by other applications (e.g. Tableau) to connect to Snowflake.
• Native connectors (e.g. Python, Spark) that can be used to develop applications for connecting to Snowflake.
• Third-party connectors that can be used to connect applications such as ETL tools (e.g. Informatica) and BI tools (e.g.
ThoughtSpot) to Snowflake.
CONNECTING TO SNOWFLAKE – SNOWSIGHT - WEBACCESS
CONNECTING TO SNOWFLAKE – SNOWSQL
◼ You can download SNOWSQL from https://developers.snowflake.com/snowsql/
◼ Once you download you can configure snowsql
◼ Windows - %USERPROFILE%\.snowsql\
◼ Linux - ~/.snowsql/
◼ Connect using snowsql
◼ Named connection – e.g.
[connections.bwcon]
accountname = RAAGLJS-ZZA75076
username = AZUREBRAINWORKS2023
password = xxxxxxxxxxxxxxxxxxxx
dbname = SNOWFLAKE_SAMPLE_DATA
schemaname = TPCH_SF1
warehousename = COMPUTE_WH
◼ Run command - $ snowsql –c bwcon
◼ Using key-pair authentication and key-pair rotation
◼ Step -1 Generate private key - openssl genrsa 2048 | openssl pkcs8 -topk8 -v2 des3 -inform PEM -out rsa_key.p8
◼ Step -2 Generate public key - openssl rsa -in rsa_key.p8 -pubout -out rsa_key.pub
◼ Step -3 Assign public key to snowflake user
ALTER USER jsmith SET RSA_PUBLIC_KEY='MIIBIjANBgkqh...';
◼ Run command - snowsql -a RAAGLJS-ZZA75076 -u AZUREBRAINWORKS2023 --private-key-path <path>/rsa_key.p8
◼ Using Proxy server – we can use proxy server to connect to snowflake. We will have to set proxy variables.
◼ Using a SSO-

◼ Snowsql command to connect


◼ snowsql -a RAAGLJS-ZZA75076 -u AZUREBRAINWORKS2023 -d SNOWFLAKE_SAMPLE_DATA -s TPCH_SF1 -r SYSADMIN -w COMPUTE_WH
CONNECTING TO SNOWFLAKE – ODBC/JDBC DRIVER
◼ You can download ODBC/JDBC driver from
https://docs.snowflake.com/en/developer-guide/odbc/odbc-download
◼ Install ODBC Driver
◼ Configure ODBC driver
◼ Open ODBC data source administrator window
◼ Click add DSN(Data Source Name)
◼ Provide configuration details, and test connection
◼ Reference – to provide password in connection setting -
https://community.snowflake.com/s/article/Error-Snowflake-DSI-20032-Required-setting-PWD-is-not-present-in-the-con
nection-settings-20032-with-windows-ODBC-snowflake-driver
CONNECTING TO SNOWFLAKE – USING PYTHON
◼ Installing python connector:
◼ pip install snowflake-connector-python
import snowflake.connector

class SnowflakeConnector:
def __init__(self, account, user, password, warehouse, database, schema, role):
self.account = account
self.user = user
self.password = password
self.warehouse = warehouse
self.database = database
self.schema = schema
self.role = role

def connect(self):
self.connection = snowflake.connector.connect(
user=self.user,
password=self.password,
account=self.account,
warehouse=self.warehouse,
database=self.database,
schema=self.schema
)
self.cursor = self.connection.cursor()

def execute_query(self, query):


self.cursor.execute(query)
return self.cursor.fetchall()

def close_connection(self):
self.cursor.close()
self.connection.close()

sf_connector = SnowflakeConnector(
GETTING STARTED WITH SNOWFLAKE
VIRTUAL WAREHOUSE
◼ Virtual warehouse is a cluster of compute resources in Snowflake. A virtual warehouse is available in
two types:
• Standard
• Snowpark-optimized
◼ A warehouse provides the required resources, such as CPU, memory, and temporary storage, to perform
the following operations in a Snowflake session:
◼ Executing SQL SELECT statements that require compute resources (e.g. retrieving rows from tables and views).
◼ Performing DML operations, such as:
◼ Updating rows in tables (DELETE , INSERT , UPDATE).
◼ Loading data into tables (COPY INTO <table>).
◼ Unloading data from tables (COPY INTO <location>).

◼ Properties of virtual warehouse


◼ Multi cluster
◼ Auto-scale
◼ Scaling policy
◼ Auto Terminate and Resume

◼ Monitoring Virtual warehouse


DATABASES, TABLES AND VIEWS - OVERVIEW
◼ All data in Snowflake is maintained in databases. Each database consists of one or more schemas,
which are logical groupings of database objects, such as tables and views. Snowflake does not place
any hard limits on the number of databases, schemas (within a database), or objects (within a schema)
you can create.
◼ Micro-partitions & Data Clustering
◼ All data in Snowflake tables is automatically divided into micro-partitions, which are contiguous units of storage. Each
micro-partition contains between 50 MB and 500 MB of uncompressed data (note that the actual size in Snowflake is
smaller because data is always stored compressed). Groups of rows in tables are mapped into individual micro-partitions,
organized in a columnar fashion. This size and structure allows for extremely granular pruning of very large tables, which
can be comprised of millions, or even hundreds of millions, of micro-partitions.
◼ Snowflake stores metadata about all rows stored in a micro-partition, including:
• The range of values for each of the columns in the micro-partition.
• The number of distinct values.
• Additional properties used for both optimization and efficient query processing.


SNOWFLAKE MICRO PARTITIONS
DATABASES, TABLES AND VIEWS - OVERVIEW
Benefits of Micro-partitions
◼ In contrast to traditional static partitioning, Snowflake micro-partitions are derived automatically; they
don’t need to be explicitly defined up-front or maintained by users.
◼ As the name suggests, micro-partitions are small in size (50 to 500 MB, before compression), which
enables extremely efficient DML and fine-grained pruning for faster queries.
◼ Micro-partitions can overlap in their range of values, which, combined with their uniformly small size, helps
prevent skew.
◼ Columns are stored independently within micro-partitions, often referred to as columnar storage. This
enables efficient scanning of individual columns; only the columns referenced by a query are scanned.
◼ Columns are also compressed individually within micro-partitions. Snowflake automatically determines
the most efficient compression algorithm for the columns in each micro-partition.
PARTITION OVERLAP
CLUSTERING KEY- OVERVIEW
TYPES OF TABLES
◼ Temporary Tables: Snowflake supports creating temporary tables for storing non-permanent, transitory
data (e.g. ETL data, session-specific data). Temporary tables only exist within the session in which they
were created and persist only for the remainder of the session. As such, they are not visible to other users
or sessions. Once the session ends, data stored in the table is purged completely from the system and,
therefore, is not recoverable, either by the user who created the table or Snowflake.
◼ After creation, temporary tables cannot be converted to any other table type.
CREATE TEMPORARY TABLE mytemptable (id NUMBER, creation_date DATE);
▪ Transient Table: Snowflake supports creating transient tables that persist until explicitly dropped and are
available to all users with the appropriate privileges. Transient tables are similar to permanent tables with
the key difference that they do not have a Fail-safe period. As a result, transient tables are specifically
designed for transitory data that needs to be maintained beyond each session (in contrast to temporary
tables), but does not need the same level of data protection and recovery provided by permanent tables.
CREATE TRANSIENT TABLE mytranstable (id NUMBER, creation_date DATE);
SNOWFLAKE TABLE COMPARISON
Type Persistence Cloning (source Time Travel Retent Fail-safe Period (D
type => target ion Period (Days) ays)
type)
Temporary Remainder of Temporary => 0 or 1 (default is 1) 0
session Temporary Tempora
ry => Transient
Transient Until explicitly Transient => 0 or 1 (default is 1) 0
dropped Temporary Transient
=> Transient
Permanent Until explicitly Permanent => 0 or 1 (default is 1) 7
(Standard Edition) dropped Temporary Perman
ent =>
Transient Permanen
t => Permanent
Permanent Until explicitly Permanent => 0 to 90 (default is 7
(Enterprise Edition dropped Temporary Perman configurable)
and higher) ent =>
Transient Permanen
t => Permanent
SNOWFLAKE VIEW
◼ Types of Views
• Non-materialized views (usually simply referred to as “viewsˮ)
• Materialized views.

◼ Non-materialized Views
◼ Any query expression that returns a valid result can be used to create a non-materialized view, such as:
◼ Selecting some (or all) columns in a table.
◼ Selecting a specific range of data in table columns.
◼ Joining data from two or more tables.

◼ Materialized Views
◼ A materialized view’s results are stored, almost as though the results were a table. This allows faster access, but requires
storage space and active maintenance, both of which incur additional costs.

◼ Secure Views
◼ Secure views have advantages over standard views, including improved data privacy and data sharing; however, they also
have some performance impacts to take into consideration.
VIEWS – MATERIALIZED VIEWS
◼ Materialized views require Enterprise Edition.
◼ A materialized view is a pre-computed data set derived from a query specification (the SELECT in the view
definition) and stored for later use. Because the data is pre-computed, querying a materialized view is
faster than executing a query against the base table of the view.
◼ This performance difference can be significant when a query is run frequently or is sufficiently complex.
As a result, materialized views can speed up expensive aggregation, projection, and selection operations,
especially those that run frequently and that run on large data sets.
◼ Materialized views are particularly useful when:
◼ Query results contain a small number of rows and/or columns relative to the base table (the table on which the view is
defined).
◼ Query results contain results that require significant processing, including:
◼ Analysis of semi-structured data.
◼ Aggregates that take a long time to calculate.
◼ The query is on an external table (i.e. data sets stored in files in an external stage), which might have slower performance
compared to querying native database tables.
◼ The view’s base table does not change frequently.
PERFORMANCE COMPARISON OF VIEW
Perform Security Simplifies Supports Uses Uses Credits Notes
ance Benefits Query Clustering Storage for
Benefits Logic Maintenance
Regular table ✔ ✔
Regular view ✔ ✔
Cached query ✔ Used only if data has not changed
result and if query only uses
deterministic functions (e.g. not
CURRENT_DATE).
Materialized ✔ ✔ ✔ ✔ ✔ ✔ Storage and maintenance
view requirements typically result
in increased costs.
External table Data is maintained outside
Snowflake and, therefore, does
not incur any storage charges
within Snowflake.

create materialized view mv4 as


select column_1, column_2, sum(column_3) from table1 group by column_1, column_2;
SECURE VIEW
VIEW TYPES
SECURED VIEW – ROW LEVEL SECURITY
EXAMPLE OF SECURE VIEW
◼ create table fact_payroll(employee varchar(100),team_manager varchar(100),salary number)insert into
fact_payroll(employee, team_manager, salary)values('Yvonne','Monique',5000),
('Mickey','Mickey',9000)('Monique','Monique',8500)('Krista','Monique',560)('Balke','Mickey',7000)

◼ create or replace secure view my_team asselect * from


BWDATABASE.BWSCHEMA.FACT_PAYROLLwhere upper(employee) = current_user() or
upper(team_manager)= current_user()
DATA TYPES
Category Type Notes
Numeric Data Types NUMBER
DECIMAL, NUMERIC Synonymous with NUMBER.
INT, INTEGER, BIGINT, SMALLINT, TINYINT, BYTEINT Synonymous with NUMBER except precision and scale cannot be specified.

[1]
FLOAT, FLOAT4, FLOAT8
[1]
DOUBLE, DOUBLE PRECISION, REAL Synonymous with FLOAT.
String & Binary Data Types VARCHAR
CHAR, CHARACTER Synonymous with VARCHAR except default length is VARCHAR(1).
STRING Synonymous with VARCHAR.
TEXT Synonymous with VARCHAR.
BINARY
VARBINARY Synonymous with BINARY.
Logical Data Types BOOLEAN Currently only supported for accounts provisioned after January 25, 2016.

Date & Time Data Types DATE


DATETIME Alias for TIMESTAMP_NTZ
TIME
TIMESTAMP Alias for one of the TIMESTAMP variations (TIMESTAMP_NTZ by default).

TIMESTAMP_LTZ TIMESTAMP with local time zone; time zone, if provided, is not stored.

TIMESTAMP_NTZ TIMESTAMP with no time zone; time zone, if provided, is not stored.
TIMESTAMP_TZ TIMESTAMP with time zone.
Semi-structured Data Types VARIANT
OBJECT
ARRAY
Geospatial Data Types GEOGRAPHY
GEOMETRY
Vector Data Types VECTOR
SNOWFLAKE DATATYPES
Data Type Category Data Type Description Example Data Type
NUMBER Variable precision NUMBER(10, 2) for values Category Data Type Description Example
Numeric Data Types
(NUMERIC) numeric values like 12345.67 Semi-structured Flexible data type for VARIANT for JSON data like
INTEGER for values like VARIANT
INTEGER Whole numbers Data Types semi-structured data {"name": "John", "age": 30}
42
Floating-point FLOAT for values like OBJECT for values like
FLOAT (DOUBLE) Collection of key-value
numbers 3.14159 OBJECT {"key1": "value1", "key2":
pairs
String & Binary Data STRING Variable-length VARCHAR(255) for values "value2"}
Types (VARCHAR) character strings like 'Hello, World!' ARRAY for values like [1, 2,
ARRAY Ordered list of elements
CHAR Fixed-length CHAR(10) for values like 3, 4]
(CHARACTER) character strings 'ABCDEFGHIJ'
Group of related fields STRUCT for values like
Variable-length BINARY(100) for binary Structured Data
BINARY STRUCT (emulated using nested STRUCT<name STRING,
binary strings data up to 100 bytes Types
objects) age INTEGER>
BOOLEAN for values like
Logical Data Types BOOLEAN True/False values Geospatial Data Represents geospatial GEOGRAPHY for values like
TRUE or FALSE GEOGRAPHY
Date & Time Data DATE for values like Types data POINT(-122.35 37.55)
DATE Calendar dates
Types '2023-05-18'
Array of numeric values VECTOR for values like [1.0,
TIME for values like Vector Data Types VECTOR
TIME Time of day (emulated using ARRAY) 2.0, 3.0]
'13:45:30'
Not natively supported,
TIMESTAMP for values Unsupported Data
TIMESTAMP Date and time XML can be stored as VARIANT for XML data
like '2023-05-18 13:45:30' Types
VARIANT
TIMESTAMP_LTZ for Not natively supported,
Timestamp with local CLOB STRING for large text
TIMESTAMP_LTZ values like '2023-05-18 can be stored as STRING
time zone
13:45:30'
TIMESTAMP_NTZ for Not natively supported, BINARY for binary large
TIMESTAMP_NT Timestamp without BLOB
values like '2023-05-18 can be stored as BINARY objects
Z time zone
13:45:30' Data Type CAST, Convert data from one
CAST('123' AS INTEGER)
TIMESTAMP_TZ for Conversion CONVERT type to another
Timestamp with
TIMESTAMP_TZ values like '2023-05-18 TO_CHAR,
specified time zone Functions to convert TO_DATE('2023-05-18',
13:45:30+01:00' TO_DATE,
between data types 'YYYY-MM-DD')
TO_NUMBER
SNOWFLAKE DATA LOAD
BULK LOAD
DATA LOADING TO SNOWFLAKE
◼ Bulk load:
◼ External stages
◼ Internal Stages
◼ User stage
◼ Table stage
◼ Named Stage

◼ Continuous load:
◼ Snowpipe
◼ Snowpipe Streaming
Feature Supported Notes
Location of Local environment Files are first copied (“staged”) to an internal (Snowflake) stage,
files then loaded into a table.
Amazon S3 Files can be loaded directly from any user-supplied bucket.
Google Cloud Storage Files can be loaded directly from any user-supplied bucket.
Microsoft Azure cloud storage Files can be loaded directly from any user-supplied container.
•Blob storage
•Data Lake Storage Gen2
•General-purpose v1
•General-purpose v2
File formats Delimited files (CSV, TSV, etc.) Any valid delimiter is supported; default is comma (i.e. CSV).
Semi-structured formats
•JSON
•Avro , ORC
•Parquet , XML (supported as a preview
feature)
Unstructured formats
File encoding File format-specific For delimited files (CSV, TSV, etc.), the default character set is
UTF-8
For semi-structured file formats (JSON, Avro, etc.), the only
supported character set is UTF-8.
BULK LOADING FROM MICROSOFT AZURE
CONFIGURING AN AZURE CONTAINER FOR LOADING DATA
◼ Configure a storage integration:
◼ Step 1 Create a Cloud Storage Integration in Snowflake
◼ Step 2 Grant Snowflake Access to the Storage Locations
◼ Step 3 Create an external stage

◼ Generate a shared access signature SAS) token


◼ Step 1: Generate the SAS token
◼ Step 2: Create an external stage
CREATE STORAGE INTEGRATION
◼ Configure a storage integration:
◼ Step 1 Create a Cloud Storage Integration in Snowflake
CREATE STORAGE INTEGRATION <integration_name>
TYPE  EXTERNAL_STAGE
STORAGE_PROVIDER = 'AZURE'
ENABLED  TRUE
AZURE_TENANT_ID = '<tenant_id>'
STORAGE_ALLOWED_LOCATIONS = ('azure://<account>.blob.core.windows.net/<container>/<path>/',
'azure://<account>.blob.core.windows.net/<container>/<path>/')
[ STORAGE_BLOCKED_LOCATIONS = ('azure://<account>.blob.core.windows.net/<container>/<path>/',
'azure://<account>.blob.core.windows.net/<container>/<path>/ʼ) ]
Actual code:
CREATE STORAGE INTEGRATION azure_int TYPE  EXTERNAL_STAGE STORAGE_PROVIDER = 'AZURE' ENABLED  TRUE AZURE_TENANT_ID =
'e5b37e62-c6cc-42ca-bd7405b3ddd02784' STORAGE_ALLOWED_LOCATIONS =
('azure://bw2023snowflakedata.blob.core.windows.net/bw2023snowflakedatacontainer/') -- [ STORAGE_BLOCKED_LOCATIONS =
('azure://<account>.blob.core.windows.net/<container>/<path>/', 'azure://<account>.blob.core.windows.net/<container>/<path>/') ] ;
CREATE STORAGE INTEGRATION
◼ Step 2 Grant Snowflake Access to the Storage Locations
CREATE STORAGE INTEGRATION
◼ Step 2 Grant Snowflake Access to the Storage Locations
CREATE STORAGE INTEGRATION
◼ Step 2 Grant Snowflake Access to the Storage Locations
CREATE STORAGE INTEGRATION
GRANT CREATE STAGE ON SCHEMA BWSCHEMA TO ROLE ACCOUNTADMIN;

GRANT USAGE ON INTEGRATION azure_int TO ROLE ACCOUNTADMIN;

create or replace file format csv_superstore


TYPECSV SKIP_HEADER1 FIELD_DELIMITER',ʼ
TRIM_SPACEFALSE FIELD_OPTIONALLY_ENCLOSED_BY'"'
REPLACE_INVALID_CHARACTERSTRUE DATE_FORMATAUTO
TIME_FORMATAUTO TIMESTAMP_FORMATAUTO;

create replace stage az_stg_superstore


storage_integration = azure_int
URL = 'azure://bw2023snowflakedata.blob.core.windows.net/bw2023snowflakedatacontainer/ʼ
file_format = csv_superstore;

list @az_stg_superstore;

select $1,$2,$3 from @az_stg_superstore (file_format⇒'csv_superstoreʼ)

create or replace TABLE BWDATABASE.BWSCHEMA.SUPERSTORE_NEW (


ROWID NUMBER38,0,
ORDERID VARCHAR16777216,
ORDERDATE DATE, SHIPDATE DATE,
SHIPMODE VARCHAR16777216,
CUSTOMERID VARCHAR16777216,
CUSTOMERNAME VARCHAR16777216,
CREATE STORAGE INTEGRATION
◼ Generate a shared access signature SAS) token
◼ Step 1: Generate the SAS token
◼ Step 2: Create an external stage
SHARED ACCESS SIGNATURE (SAS) TOKEN
◼ Generate a shared access signature SAS) token
◼ Step 1: Generate the SAS token
◼ Step 2: Create an external stage
SHARED ACCESS SIGNATURE (SAS) TOKEN
CREATE OR REPLACE STAGE bw_azure_stage_sas
URL='azure://bw2023snowflakedata.blob.core.windows.net/bw2023snowflakedatacontainer/'
CREDENTIALS=(AZURE_SAS_TOKEN='sv=2022-11-02&ss=bfqt&srt=co&sp=rwdlacupiytfx&se=202
4-06-30T10:26:10Z&st=2024-05-21T02:26:10Z&spr=https&sig=jRhgc2t%2Br9kAkDoJvmqpJgys%2
FtELpc3IoQJAK%2BS3wA0%3D') ENCRYPTION=(TYPE='NONE') FILE_FORMAT =
csv_superstore;

list @bw_azure_stage_sas

delete from superstore_new

create table superstore_new_sas as select * from superstore_new

COPY INTO superstore_new_sas from @bw_azure_stage_sas;


LOADING DATA FROM LOCAL FILE SYSTEM
CREATING INTERNAL STAGES – USER STAGE
◼ Staging the Data Files
◼ User Stage
◼ User stages are referenced using @~; e.g. use LIST @~ to list the files in a user stage.
◼ Unlike named stages, user stages cannot be altered or dropped.
◼ User stages do not support setting file format options. Instead, you must specify file format and copy options as part of the
COPY INTO <table> command.
◼ E.g. PUT file://C\data\data.csv @~/staged;
◼ LIST @~;
◼ COPY INTO mytable from @~/staged FILE_FORMAT = FORMAT_NAME = 'my_csv_format');
CREATING INTERNAL STAGES – TABLE STAGE
◼ Staging the Data Files
◼ Table Stage
◼ Table stages have the same name as the table; e.g. a table named mytable has a stage referenced as %mytable.
◼ Unlike named stages, table stages cannot be altered or dropped.
◼ Table stages do not support transforming data while loading it (i.e. using a query as the source for the COPY command).
◼ E.g. PUT file://C\data\data.csv %mytable;
◼ LIST %mytable;
◼ COPY INTO mytable FILE_FORMAT = TYPE  CSV FIELD_DELIMITER = '|' SKIP_HEADER  1;
CREATING INTERNAL STAGES – NAMED STAGE
◼ Staging the Data Files
◼ Named Stage
◼ Users with the appropriate privileges on the stage can load data into any table.
◼ Because the stage is a database object, the security/access rules that apply to all objects apply. The privileges to use a
stage can be granted or revoked from roles. In addition, ownership of the stage can be transferred to another role.
◼ E.g. PUT file://C:\data\data.csv @my_stage;
◼ LIST @my_stage;
◼ COPY INTO mytable from @my_stage;
REMOVING INTERNAL STAGED FILES
◼ Files that were loaded successfully can be deleted from the stage during a load by specifying the PURGE
copy option in the COPY INTO <table> command.
◼ Load files from a table’s stage into the table and purge files after loading. By default, COPY does not purge loaded files from
the location. To purge the files after loading:
◼ that all files successfully loaded into the table a Set PURGE=TRUE for the table to specify re purged after loading:
◼ ALTER TABLE mytable SET STAGE_COPY_OPTIONS = (PURGE = TRUE);
◼ COPY INTO mytable;
◼ You can also override any of the copy options directly in the COPY command:
◼ COPY INTO mytable PURGE = TRUE;

◼ After the load completes, use the REMOVE command to remove the files in the stage.
◼ REMOVE @mystage/path1/subpath2;
◼ REMOVE @%orders;
◼ RM @~ pattern='.*jun.*';
DIRECTORY TABLES
◼ A directory table is an implicit object layered on a stage (not a separate database object) and is
conceptually similar to an external table because it stores file-level metadata about the data files in the
stage. A directory table has no grantable privileges of its own.
◼ This example retrieves all metadata columns in a directory table for a stage named mystage:
◼ SELECT * FROM DIRECTORY(@mystage);
DATA UNLOADING – EXTERNAL STAGE AZURE
DATA UNLOADING – EXTERNAL STAGE - AZURE
◼ Unload using Storage integration:
◼ Unload using stage storage:
◼ Create storage integration
CREATE or replace STORAGE INTEGRATION azure_int TYPE = EXTERNAL_STAGE STORAGE_PROVIDER = 'AZURE' ENABLED = TRUE AZURE_TENANT_ID =
'fda27961-1c4d-48ca-b563-4da0bc461de1' STORAGE_ALLOWED_LOCATIONS =
('azure://bw2023snowflakedata.blob.core.windows.net/bw2023snowflakedatacontainer/')
◼ Create stage
CREATE OR REPLACE STAGE az_ext_unload_stage_superstore
URL='azure://bw2023snowflakedata.blob.core.windows.net/bw2023snowflakedatacontainer/snowoutput/' STORAGE_INTEGRATION = azure_int FILE_FORMAT =
csv_superstore;
◼ Copy data
COPY INTO @az_ext_unload_stage_superstore from SUPERSTORE_FRMUSERSTG;
◼ Unload without using stage:
◼ Create storage integration
CREATE or replace STORAGE INTEGRATION azure_int TYPE = EXTERNAL_STAGE STORAGE_PROVIDER = 'AZURE' ENABLED = TRUE AZURE_TENANT_ID =
'fda27961-1c4d-48ca-b563-4da0bc461de1' STORAGE_ALLOWED_LOCATIONS =
('azure://bw2023snowflakedata.blob.core.windows.net/bw2023snowflakedatacontainer/')
◼ Copy data
COPY INTO 'azure://bw2023snowflakedata.blob.core.windows.net/bw2023snowflakedatacontainer/snowoutput/d1/' from SUPERSTORE_FRMUSERSTG
storage_integration = azure_int;
◼ Unload using SAS token
◼ Create SAS token – as shown in previous slide
◼ Create stage –
CREATE OR REPLACE STAGE bw_azure_stage_sas
URL='azure://bw2023snowflakedata.blob.core.windows.net/bw2023snowflakedatacontainer/'
CREDENTIALS=(AZURE_SAS_TOKEN='sv=2022-11-02&ss=bfqt&srt=co&sp=rwdlacupiytfx&se=2024-06-30T10:26:10Z&st=2024-05-
21T02:26:10Z&spr=https&sig=jRhgc2t%2Br9kAkDoJvmqpJgys%2FtELpc3IoQJAK%2BS3wA0%3D')
ENCRYPTION=(TYPE='NONE')
FILE_FORMAT = csv_superstore;
◼ Copy data
COPY INTO @ bw_azure_stage_sas from SUPERSTORE_FRMUSERSTG;
DATA UNLOADING – VIA INTERNAL STAGE
DATA UNLOADING – VIA INTERNAL STAGE
◼ Unloading Data to a Named Internal Stage
CREATE OR REPLACE STAGE SUPERSTORE_FRMUSERSTG_UNLD_NMED
FILE_FORMAT = csv_superstore_new;
COPY INTO @SUPERSTORE_FRMUSERSTG_UNLD_NMED/unload/ from SUPERSTORE;
GET @SUPERSTORE_FRMUSERSTG_UNLD_NMED/unload/data_0_0_0.csv.gz file://C:\data\unload;

◼ Unloading data via table stage


◼ COPY INTO @%mytable/unload/ from mytable FILE_FORMAT = (FORMAT_NAME = 'my_csv_unload_format'
COMPRESSION = NONE);
◼ GET @%mytable/unload/data_0_0_0.csv file://C:\data\unload;

◼ Unloading Data to Your User Stage


◼ COPY INTO @~/unload/ from SUPERSTORE_FRMUSERSTG FILE_FORMAT = (FORMAT_NAME = 'csv_superstore'
COMPRESSION = NONE);
◼ GET @~/unload/data_0_0_0.csv file://~/;
SNOWPIPE
◼ Snowpipe enables loading data from files as soon as theyʼre available in a stage. This means you can
load data from files in micro-batches, making it available to users within minutes, rather than manually
executing COPY statements on a schedule to load larger batches.
◼ Different mechanisms for detecting the staged files are available:
◼ Automating Snowpipe using cloud messaging
◼ Calling Snowpipe REST endpoints


SNOWPIPE
SNOWPIPE
SNOWPIPE
-- Step -1 create storage integration CREATE or replace STORAGE INTEGRATION azure_int_new TYPE = EXTERNAL_STAGE STORAGE_PROVIDER = 'AZURE' ENABLED = TRUE AZURE_TENANT_ID = 'fda27961-1c4d-48ca-b563-4da0bc461de1'
STORAGE_ALLOWED_LOCATIONS = ('azure://bw2023snowflakedata.blob.core.windows.net/bw2023snowflakedatacontainer/');DESC STORAGE INTEGRATION azure_int_new; GRANT CREATE STAGE ON SCHEMA BWSCHEMA TO ROLE
ACCOUNTADMIN;GRANT USAGE ON INTEGRATION azure_int_new TO ROLE ACCOUNTADMIN;-- Step -2 Create file formatcreate or replace file format csv_superstore TYPE=CSV SKIP_HEADER=1 FIELD_DELIMITER=','
TRIM_SPACE=FALSE FIELD_OPTIONALLY_ENCLOSED_BY='"' REPLACE_INVALID_CHARACTERS=TRUE DATE_FORMAT=AUTO TIME_FORMAT=AUTO TIMESTAMP_FORMAT=AUTO;-- Step -3 Create table structurecreate or replace
TABLE BWDATABASE.BWSCHEMA.SUPERSTORE_fromsnowpipe ( ROWID NUMBER(38,0), ORDERID VARCHAR(16777216), ORDERDATE DATE, SHIPDATE DATE, SHIPMODE VARCHAR(16777216), CUSTOMERID
VARCHAR(16777216), CUSTOMERNAME VARCHAR(16777216),SEGMENT VARCHAR(16777216), COUNTRYREGION VARCHAR(16777216), CITY VARCHAR(16777216), STATE VARCHAR(16777216), POSTALCODE
NUMBER(38,0), REGION VARCHAR(16777216),PRODUCTID VARCHAR(16777216), CATEGORY VARCHAR(16777216), SUBCATEGORY VARCHAR(16777216), PRODUCTNAME VARCHAR(16777216), SALES NUMBER(38,4),
QUANTITY NUMBER(38,0), DISCOUNT NUMBER(38,2), PROFIT NUMBER(38,4));-- Step -4 Create stagecreate or replace stage az_stg_superstore_snowpipestorage_integration = azure_int_new file_format = csv_superstore; list
@az_stg_superstore_snowpipe; drop NOTIFICATION INTEGRATION AZ_NOTIFICATION_INT; CREATE or replace NOTIFICATION INTEGRATION AZ_NOTIFICATION_INT ENABLED =TRUE TYPE =QUEUE
NOTIFICATION_PROVIDER=AZURE_STORAGE_QUEUE AZURE_STORAGE_QUEUE_PRIMARY_URI = 'https://bw2023snowflakedata.queue.core.windows.net/storagequeues' AZURE_TENANT_ID = 'fda27961-1c4d-48ca-b563-4da0bc461de1';
desc NOTIFICATION INTEGRATION AZ_NOTIFICATION_INT; CREATE OR REPLACE STAGE bw_azure_stage_sas URL='azure://bw2023snowflakedata.blob.core.windows.net/bw2023snowflakedatacontainer/'
CREDENTIALS=(AZURE_SAS_TOKEN='sv=2022-11-02&ss=bfqt&srt=co&sp=rwlaciytfx&se=2024-06-29T12:39:17Z&st=2024-05-26T04:39:17Z&spr=https&sig=7JreCeW87wEVBmZdQrcNAJ%2FR9fV%2FJ64L3sV7SYaI5M8%3D')
ENCRYPTION=(TYPE='NONE') FILE_FORMAT = csv_superstore; -- Step -5 Create snowpipecreate pipe az_superstore_snowpipe_new auto_ingest = TRUE integration = 'AZ_NOTIFICATION_INT' as copy into
SUPERSTORE_fromsnowpipe from @bw_azure_stage_sas; -- Step -6 select count(*) from SUPERSTORE_fromsnowpipe;
SNOWPIPE - STREAMING
WORKING WITH SEMI STRUCTURE DATA
◼ Semi-structured Data Types:
◼ The following Snowflake data types can contain Semi-structure data:
◼ VARIANT (can contain any other data type). E.g. {“key1” : “value1”, “key2”: “value2”}
◼ ARRAY (can directly contain VARIANT, and thus indirectly contain any other data type, including itself). E.g. [1,2,3]
◼ OBJECT (can directly contain VARIANT, and thus indirectly contain any other data type, including itself). E.g.
{
"outer_key1": {
"inner_key1A": "1a",
"inner_key1B": "1b“
},
"outer_key2": {
"inner_key2": 2
}
}
◼ Querying Semi-structured Data:
◼ Snowflake supports operators for:
◼ Accessing an element in an array.
◼ Retrieving a specified value from a key-value pair in an OBJECT.
◼ Traversing the levels of a hierarchy stored in a VARIANT.
WORKING WITH SEMI STRUCTURE DATA
◼ Querying Semi-structured Data:
◼ Snowflake supports operators for:
◼ Accessing an element in an array.
◼ select my_array_column[2] from my_table;
◼ select my_array_column[0][0] from my_table;
◼ select array_slice(my_array_column, 5, 10) from my_table;
◼ Inserting data in arrary datatype
◼ INSERT INTO array_example (array_column)
◼ SELECT ARRAY_CONSTRUCT(12, 'twelve', NULL);

◼ Retrieving a specified value from a key-value pair in an OBJECT.


◼ select my_variant_column['key1'] from my_table;
◼ Inserting data to object
◼ INSERT INTO object_example (object_column)
◼ SELECT OBJECT_CONSTRUCT('thirteen', 13::VARIANT, 'zero', 0::VARIANT);

◼ Traversing the levels of a hierarchy stored in a VARIANT.


WORKING WITH SEMI STRUCTURE DATA
◼ Querying Semi-structured Data:
◼ Snowflake supports operators for:
◼ Traversing the levels of a hierarchy stored in a VARIANT.
◼ Using : (colon)
◼ Using Dot Notation
◼ Using Bracket notation
◼ Retrieving single instance of repeating element
◼ Explicitly casting values
◼ Using FLATTEN function

◼ Data inserted using PARSE_JSON


SNOWFLAKE DATA SHARING AND COLLABORATION
DATA SHARING AND COLLABORATION
◼Secure Data Sharing lets you share selected objects in a database in your account with other Snowflake
accounts. You can share the following Snowflake objects:
• Databases
• Tables
• Dynamic tables
• External tables
• Iceberg tables
• Secure views
• Secure materialized views
• Secure user-defined functions UDFs
◼ Important: All database objects shared between accounts are read-only (i.e. the objects cannot be
modified or deleted, including adding or modifying table data).
DATA SHARING AND COLLABORATION
READER ACCOUNTS FOR THIRDPARTY ACCESS
Data sharing is only supported between
Snowflake accounts. As a data provider,
you might want to share data with a
consumer who does not already have a
Snowflake account or is not ready to
become a licensed Snowflake customer.

To facilitate sharing data with these


consumers, you can create reader
accounts. Reader accounts (formerly
known as “read-only accounts”) provide a
quick, easy, and cost-effective way to
share data without requiring the
consumer to become a Snowflake
customer.

Each reader account belongs to the


provider account that created it. As a
provider, you use shares to share
databases with reader accounts;
however, a reader account can only
consume data from the provider account
that created it. Refer to the following
diagram:
SHARING DATA SECURELY ACROSS REGIONS AND CLOUD
PLATFORMS
SHARING DATA SECURELY ACROSS REGIONS AND CLOUD
PLATFORMS
ZEROCOPY CLONING
ZEROCOPY CLONING
TYPE OF ENVIRONMENTS IN PROJECT
◼DEV For development, unit testing
◼QA/TEST – for testing – integration testing, component testing.
◼UAT – user acceptance testing, Business user
◼Prod –
◼DR – for recovery or failure
DATA PROTECTION LIFE CYCLE
ZEROCOPY CLONING
AWS
DATA UNLOADING – EXTERNAL STAGE AWS
DATA UNLOADING – EXTERNAL STAGE - AZURE
◼ COPY via named stage:
CREATE OR REPLACE STAGE my_ext_unload_stage URL='s3://unload/files/'
STORAGE_INTEGRATION = s3_int
FILE_FORMAT = my_csv_unload_format;
COPY INTO @my_ext_unload_stage/d1 from mytable;

◼ Unloading Data Directly into an S3 Bucket


◼ COPY INTO s3://mybucket/unload/ from mytable storage_integration = s3_int;
DATA UNLOADING – EXTERNAL STAGE
◼ COPY via named stage:
CREATE OR REPLACE STAGE my_ext_unload_stage URL='s3://unload/files/'
STORAGE_INTEGRATION = s3_int
FILE_FORMAT = my_csv_unload_format;
COPY INTO @my_ext_unload_stage/d1 from mytable;

◼ Unloading Data Directly into an S3 Bucket


◼ COPY INTO s3://mybucket/unload/ from mytable storage_integration = s3_int;
DATA UNLOADING – VIA INTERNAL STAGE
DATA UNLOADING – VIA INTERNAL STAGE
◼ Unloading Data to a Named Internal Stage
CREATE OR REPLACE STAGE my_unload_stage
FILE_FORMAT = my_csv_unload_format;
COPY INTO @mystage/unload/ from mytable;

◼ GET @mystage/unload/data_0_0_0.csv.gz file://C:\data\unload;

◼ Unloading data via table stage


◼ COPY INTO @%mytable/unload/ from mytable FILE_FORMAT = (FORMAT_NAME = 'my_csv_unload_format'
COMPRESSION = NONE);
◼ GET @%mytable/unload/data_0_0_0.csv file://C:\data\unload;

◼ Unloading Data to Your User Stage


◼ COPY INTO @~/unload/ from mytable FILE_FORMAT = (FORMAT_NAME = 'my_csv_unload_format' COMPRESSION =
NONE);
◼ GET @~/unload/data_0_0_0.csv file://C:\data\unload;
SNOWPIPE
◼ Snowpipe enables loading data from files as soon as theyʼre available in a stage. This means you can
load data from files in micro-batches, making it available to users within minutes, rather than manually
executing COPY statements on a schedule to load larger batches.
◼ Different mechanisms for detecting the staged files are available:
◼ Automating Snowpipe using cloud messaging
◼ Calling Snowpipe REST endpoints


SNOWPIPE
SNOWPIPE - STREAMING

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy