0% found this document useful (0 votes)
316 views

Snowflake Prctice1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
316 views

Snowflake Prctice1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

➢ What is Snowflake & Why

1.What is snowflake & why snowflake.


• Snowflake is a cloud computing-based data warehousing solution. Founded
in 2012
• Snowflake offers data storage and analytics services.
• It runs an amazons3, Microsoft azure, and google cloud platform.
• Snowflake does not have their own infrastructure
• It also helps to Transformation, create data pipes, Create visual dashboard
ETC.
• Snowflake do data recovery, backup, Cloning, sharing, masking.
• Easy Integration with data visualization / reporting tools.
• Pay for what you use model.
2. What is data base??
• A database is a place or medium in which we store the data in a systematic
and organized manner.. A database is usually controlled by a database
management system (DBMS).
3.What is Datawarehouse?
• A data warehouse is an enterprise system used for the analysis and
reporting of structured and semi-structured data from multiple sources.
• compute clusters to process queries for the stored data

➢ Snowflake architecture
I. Database Storage
II. Query Processing
III. Cloud Services.

Database storage: - Where the actual data will be store.


a) Stores query data query results.
b) Data will be stored in columnar format.
c) Data will be stored in micro partitions.
d) We can define cluster keys on large tables for better performance.

Query Process: - It’s like a kind CPU to process the all queries. Its contain
(Virtual warehouses)
a) This is actual processing unit of snowflake.
b) Snowflake process quires using “Virtual warehouses”
c) Virtual WH are considered as the muscle of the system.
d) We Can scale up and scale down easily
e) Auto resume & auto suspend available.

Could Services: - Handle all kind of data management – Its like a brain of
system.

a) Collection of services that coordinate activities across snowflake.


b) Authentication & Access control.
c) Infrastructure manager.
d) Optimizer
e) Metadata Manager
f) Security
g) Manages all Services tasks like Snow pipe, Tasks, Materialized view
maintenance ETC.
Snowflake Architecture Diagram
Interview question

1. How the data will be stored??


Ans- Data will be stored in micro partition in the columnar format.
2. How the queries are executing??
Ans- Using the virtual warehouses.

➢ Snowflake Virtual Warehouses

1. What Is Virtual WH??


it provides the computational power for Snowflake. Users can choose different
sizes based on workload needs and can scale them up or down.
Interview Question-
❖ What are the virtual warehouses you’re using in the project and
sizes??
Ans- We are using different size of VW they differ based on the req and
environment. For example, in production I’m using VW “L SIZE” in testing “small
size” in dev “extra small size”.
❖ What is the size of WH, how many number of compute resource it
may contain??

“Xsmall (Xs)” size VW contain 1 EC2 server.


“Small (S)” size VW contain “2” EC2 server.
“Medium” (M)size VW contain “4” EC2 server.
“Large (L)” size VW contain 8 EC2 server.
“XLarge “(XL)” size VW contain 16 EC2 server.
“2XLarge “(2XL)” size VW contain 32 EC2 server.
Like that we have up 6X large.
❖ When you’re going to scaleup your VW SIZE (Scale up or Vertical
scaling )??

• Whenever the data size increased or Number of records increased


or complexity of quires then present size VW not performing well
then will go for a next available size.
(Scaleup Means - Increasing the number VW size )
(Resizing VW – Increase the size of the VW if your quires are
taking too long or data loading is slow.) You can increase the VW
anytime using the Web UI or SQL interface
❖ What is scale out or Horizontal scaling??
• Increasing the number of clusters to avoid the Queries are
going into que.
(snowflake offer up to 10 clusters we can Increase)
Multi cluster WH- (Scale out means – increasing the number of clusters)
Note- 1. Whenever data size increased then we can increase VW. means sizes.
2. Whenever quires going on que then we can increase clusters.

➢ Snowflake- Micro partitions and Data


Clustering
1. What are micro partitions.

• Micro partitioning Is automatically performed on all snowflake


tables
• Snowflake has implemented a powerful and unique or
partitioning called micro-partitioning.
• Micro partitioning is small in size (50 to 500Mb)
• Snowflake is columnar-based and horizontally partitioned.
2. What is clustering.
• Clustering is a key factor in query performance, it reduces
the scanning of micro partitioning.
• We can define cluster keys on multiple columns in a SINGLE
table up to 4columns
• We can modify the cluster keys based on our
requirements. This is called as re-clustering.
We have two ways to apply cluster key’s:-
1. We can apply the cluster keys at the time of creating table

Syntax:-

Create table my_table


(
Type number,
Name string,
Country string,
Date date,
)
Cluster By (Column_name);
2. Modifying the cluster keys on existing tables using alter command

Syntax:-

Alter table My_table cluster by ( cloumns_names);

3. Choosing the cluster keys


• Columns frequently used in filter conditions (Where clause )
• Columns using a join keys
• Frequently Using functions or expressions,
Like Year(date), substring (med_cd_1,6)
Note –
✓ Define cluster keys on large tables and don’t on small tables
✓ Don’t define cluster keys on more than 4 columns.

➢ Snowflake Pricing
Snowflake cost dependents on: -
1. Snowflake edition
-Standard -$2.7/Credit
-Enterprise -$4/Credit
-Business critical -$5.4/Credit
-VPS- Depends on Organization
2. Region where snowflake account created
3. Cloud platform where snowflake account hosted
4. Virtual warehouse size.
Types of cost in snowflake
1. Storage Cost
-On demand storage - (Postpaid)
-Capacity or fixed storage – (Prepaid)
-On demand storage u can pay as you use /pay as you go
- Capacity storage you have to buy before only
2. Compute cost
-Snowflake edition
-Region and cloud provider
-warehouse size
-Number of clusters

3. Cloud service cost .

How to choose storage type: -


-When you’re not sure about your data size, start with on demand
-once your sure switch to capacity storage.

➢ Snowflake Data Loading


➢ Load Types

1. Bulk Loading Using Copy command


2. Continues loading using snow pipe.
Bulk Loading (when we want like past data (yesterday) data then go for bulk
loading)
- Loading batches data from files which are already in cloud storage -means
(external stages)
- We have to create storage integration objects to extract data from cloud
storages.

Or
- Copying the data files from a local machine to internal stage (snowflake)
before loading the data into table (Using ‘PUT’ command we can push the
files into internal stage from our local desktop, Linux and UNIX servers)
from here we can load the date into tables using Copy command.
- Bluck loading used VW
- Using the copy command
Continues loading using snow pipe (when we want live or real time data then we
can go for snow pipe) Example -Crick buzz app
- Snow pipe (a serverless data ingestion service) automates loading data into
snowflake from sources like AWS S3, GCP, and Aure blob storage. Snow pipe
supports continuous, real-time or batch loading. This eliminates manual
data loading and keeps your data up-to-date.
What is the difference between Snow pipe and Snowflake streaming pipe?
- Snow pipe loads data from cloud storage, like cloud data storage, S3, and
other storage options. Snowflake streaming loads data directly from
sources via the streaming API and client SDK.

Copy command syntax

Copy into tablename


From @stage
File_format= (csv, txt, etc)
Files = (Filename1, filename2..);
(Or)

Pattern= ‘.*filepattern.*’
Other_optional_props;

Interview question
1. What is copy command??
Ans- To load the data from external or internal stage to snowflake table.
Copy command supports only below files and file formats
Supported files
-Local Environment
-AWS S3
- GCP
- Microsoft Azure
Supported formats
-Delaminated files (CSV, TSV, etc)
-Json, Avro, Parquet, XML and ORC

Other ways to load data by using ETL tools below


- Matellion
- Data stage
- Informatica
- HEVO
- Azure data factory
- Azure synapse etc…
Simple transformations during data loading
• While loading data into a table there are some transformation changes, we
can do.
- If Columns not in order
- Column omission
- -string operations
- Other functions
- Sequence number
- Auto increment fields

➢ Snowflake External Stages


1. What is stage
• Stage is a location of files, that can be external or internal to snowflake

2. Types of stages
• External stage (@)
• Internal stage
-User internal stage
- Table internal stage
-Named internal stage
3. What is mean by external stage
• An external stage is the external cloud storage location where data files
are stored.
-AWS S3
- GCP
-Azure Blob storage.

➢ Snowflake Integration with AWS

Steps for AWS integration to snowflake

-Create AWS free account


- select or click on s3 bucket
-Click on create bucket
-after creating bucket click on s3 bucket

- Create a folder like (csv, json etc..)


- Select folder which u want to upload files
- Click on upload
- Add files
- For CONNECTING snowflake Account to s3 bucket we need integrate with
‘IAM service ‘
- SELECT Integrated account management service (IAM)
- SELECT LEFT SIDE option called “ROLES”
- Click on create role
- Select AWS account
- Give the access to s3 bucket

-Give the role name


-Click on create role
- Click on created Role
- copy the ARN code
arn:aws:iam::654654607075:role/aws_s3_snowflake_integration

This arn we have give to snowflake then snowflake generate another arn
that arn we have to give below trust relationship arn
(ARN- amazon resource names , s3- simple storage service, IAM-integrated
account management )
-go to snowflake
- Create integration storage object in snowflake
-go back to s3bucket copy bucket name
-Execute the inte query
Integration syntax:-
create or replace storage integration (AWS_S3_INTEGRATION)
TYPE= EXTERNAL_STAGE
STORAGE_PROVIDER= S3
ENABLED =TRUE
STORAGE_AWS_ROLE_ARN=
"arn:aws:iam::654654607075:role/aws_s3_snowflake_integration"
STORAGE_ALLOWED_LOCATIONS=('s3://raj143s3test/csv')
COMMENT= 'integ with aws s3 bucket';
-to get arn id for execute this query - (desc integration
AWS_S3_INTEGRATION;

Copy the arn id


-click on trust relationship
-click on edit trust policy and paste the arn id

-Click on update policy (done with integration).


-Then do as asusal.

➢ Snowflake – How to login to snowsql


-Go to google and search for “connecting to snowsql snowflake” get snowflake
document and search for account identifier which u have downloaded like aws
.gcp
-Install snowsql
-Get the account locater in admin tab (https://oy90717.ap-southeast-
1.snowflakecomputing.com)- OY90717

-Account identifier- OY90717.ap-southeast-1


- snowsql connect command - (snowsql - a <account_identifier> -u
<user_name>)
-(snowsql -a OY90717.ap-southeast-1 -u AREKALLUNAGARAJU)

-goto command prompt ENTER the command

➢ Snowflake Internal stages snowsql


1.Internal stages
- User stage
- Table stage
- Named stage

Note-Why we use internal stage means when we have the files into our
location severs like windows, linux and unix then we use internal stages.

User stage – Notation- (@~)

- A user stage Is allocated to each user for storing files.


- To store files that are staged and managed by a single user but can be
loaded into multiple tables
- User stages cannot be altered or dropped
- This option is not appropriate if multiple users require access to the file.

Table stage – Notation (@%)

-A table stage is available for each table created in snowflake


-Table stages have the same name as the table.
Example- Table names mytbale has a stage refrenced as @%mytable
-To store files that are staged and managed by one or more user but only
loaded into a single table.
-Table staged can not be dropped & altered.

Named internal stage -Notation- (@)

-A named internal stage is a database object created in schema.


-to store files that are staged and managed by one or more users and
loaded into one or more tables.
-Create stages using CREATE STAGE COMMAND (Similar to external
stages)
Syntax for create stage - Create or replace stage <DATABASENAME>.
<SCHEMA>.<STAGE_NAME>;
2. Staging data files from a local file system

Linux or mac files using PUT command

PUT file:///data/.csv @~/staged;

PUT file:///data/.csv @%mytable;

PUT file:///data/.csv @my_stage;

Windows files using PUT command

PUT file://c:\data\data.csv @~/staged;

PUT file://c:\data\data.csv @%mytable;

PUT file://c:\data\data.csv @my_stage;

List the Files

List @~;

list@%mytable;

list@my_stage;

➢ Snowflake -Copy Command Options

Copy command syntax:-

Copy into <table_name>


From <@extrernal_stage>

File_format= ( <file_format_name>)

Files= (<’file_name1’, <‘filename2’>);

Copy COMMAND options :-

- Validation_mode

- Return_failed_only

- ON_ERROR

- FORCE

- SIZE_LIMIT

- Truncate_columns

- ENFROCE_LENGTH

- PURGE

- Load_UNCERTAIN_FILES

1. Validation Mode:- (Important)

Syntax-

Copy into <table_name>

From @externalstage

File_format=(<file_format_name)
Files= (<file_names>)

Validation_mode= RETURN_N_ROWS/RETURN_ERRORS/RETURN_ALL_ERRORS;

-Validate the data files instead of loading them into the table.

- RETURN_N_ROWS- Displays first N records and fails at the first error record.

- RETURN_ERRORS- Gives all errors in the files.

- RETURN_ALL_ERRORS- Gives all errors from previously loaded files if we have

used ON_ERROR=Continue

2. RETURN_FAILED_ONLY

Syntax:-

COPY INTO TABLE NAME

FROM @EXTERNALSTAGE

FILE_FORMAT= (file_format_name)

Files= (File_names)

RETURN_FAILED_ONLY= True/ false;

- Specifies whether to return only files that have failed to load in the

statement result.
- Default is false.

3. ON_ERROR (Important )

Syntax-

COPY INTO TABLENAME

FROM @STAGE

FILE_FORMAT=(FILE_FORMAT_NAME)

FILES= (FILE_NAMES)

ON_ERROR= CONITUE/SKIP_FILE/SKIP_FILE_(NUM-Means specify the

number )/SKIP_FILE_NUM%/ABORT_STATEMNET;

-Continue -To skip error records and load remaining records

- Skip_file- To skip the files that contain errors.

- Skip_File_num- skip a files when the number of errors found in the file is

equal to exceeds the specify number.

-Skip_file_num% -skip a file when then percentage of error rows found in

the files exceeds the specified percentage.

- default is Abort_statement-abort the load operation if any errors found in

a data file.

4. FORCE
Syntax

Copy into tablename

From @stage

File_format=(file_format_name)

Files= (‘file_names’)

FORCE= TRUE/FALSE;

-To load all the files, regardless of whether they have been loaded

previously.

-Default is false, if we don’t specify this property copy command will not

fail but it skips loading the data.

5. Size_Limit

Syntax-

Copy into tablename

From @stage

File_format=(file_format_name)

Files=(‘file_names’)

Size_limit= <number>;

- Specify maximum size in bytes of data loaded in that command

- When the threshold is exceeded, the copy operation stops loading.


- In case of multiple files of same pattern also it will stop after the size limit,

That means some files can be loaded fully and one will be loaded partially.

5.Truncate_columns or Enforce_Length –(Important)

Suyntax-

Copy into tablename

From @stage

File_format= (file_name_format)

Files=(‘file_name’)

Truncate_columns = TRUE/ FALSE -Default is false;

or Enforce_length = TRUE/ FALSE -Default is True;

- Specifies whether to truncate text strings that exceeds the target column

length.

- Default is false, that means if u don’t specify this option, and if text strings

that exceeds the target column length, then copy command will fail.

6.Purge

Syntax-

COPY INTO TABELNAME

FROM @STAGE

FILE_FORMAT=(FILE_FORMAT_NAME)
FILES= (‘FILE-NAMES’)

PURGE= TRUE/FALSE;

-Specify whether to remove data files from the stage automatically after the

data is loaded successfully.

-Default is false

7. Load_uncertain_files

Suntax-

Copy into tablename

From @stage

File_format=(file_format_name)

Files= (‘file_names’)

Load_uncertain_files= TRUE/FALSE;

-Specifies to load files for which the load status is unknown. The copy

command skips these files by default.

Note- The load status is unknown if all the following conditions are true: -

-The files LAST_MODIFIED date is older than 64 days.

-The initial set of data was loaded into the table more than 64 days earlier.
-If the file was already loaded successfully into the table, this event

occurred more than 64 days earlier.

➢ Snowflake -Loading semi-structured

data -JSON

❖ Json files are

- Semi structured

- Can contains arrays

- Can be nested

- Can be combination of arrays and nested

- Can contain millions of records.

❖ Process of loading

-create stage objects that points json files in azure /aws

-create stage table to store the raw data

-Create with single variant column


-Copy data to stage table by using copy command

- now parse the raw data based on the json file content

1.Normal json

2. Json with arrays

3. Json with nested

4.Json with alltypes

- Load this parsed data to a table

➢ Snowflake- Snow pipe working session

❖ Snow pipe syntax :-

Create or replace pipe <pipe_name>

Auto_INGEST= [TRUE|FALSE]

AS

<Copy_statement>

❖ Snowpipes allows below mentions DDL commands:-

-CREATE PIPE-To create a pipe

-ALTER PIPE – To alter pipe or to pause or resume pipe


Syntax- Alter pipe pipe_name set pipe_execution_paused =

true|false

-DROP PIPE – TO DROP THE PIPE

-DESCRIBE PIPE – To describe the pipe properties, to get ARN

-SHOW PIPES – To see the all pipes

Notification setup: -

Select s3bucket – select properties -drop down and select notification

alert.

➢ Loading data from azure blob storage

❖ Steps to load data from azure

-You should have snowflake trial account

-You should have azure trial account

-Create storage account and containers in azure

-Upload the source files to these containers

-Create storage integration between snowflake and azure


-create stage objects using the storage integration object

-Use copy commands to extract the data from file and load in

snowflake tables

Practical steps ;-

-Choose storage account

-Create storage account

-Click on storage account which is you created and select container

-Create container

-click on container which is you created

-click on upload and upload the files from local machine

-Create storage integration between azure and snowflake

-Get the azure tenant is from azure active directory (Microsoft entra

id)

-Get the storage allowed location from containers which folders u

want to upload and click on container right side 3 dots and click on

properties and replace” http with azure”.

-Give the consent from snowflake to azure running this syntax

“Desc storage integration azure_int”


Copy that link and enter in chrome and click on enter

-Copy multi-tenant app name and go to storage account and select

access control (IAM) AND select role assignment and add the role

(storage blob data contributor) click on the role and click on select

members and add multi tenant app name then as usual do the

process.

Synatx-

//create storage integration object

create or replace storage integration Azure_inte

type = External_storage

storage_privider= azure

enables = true

azure_tenant_id = <'azure_tenant_id'>

storage_allowed_location= <'folders url'>


➢ Troubleshooting snow pipe

Below three steps main thing do check.

Step1- check the pipe status

Step2- view the copy history for the table

Step3- Validate the data files

Step1- Check the pipe status

To check the pipe status syntax-

select system$pipe_status(‘pipe_name);

Last received message time stamp:-

-specify the timestamp of the last event message received from the

message queue.

-If the time stamp id earlier the expected, this indicates an issue with

service configuration (ex- amazon sqs).

-verify whether any setting was changed in your service configuration.

Last forwarded message time stamp:-

-specifies the time stamp of the last “create object” event message that was

forwarded to the pipe.


-If this value is not similar to above time stamp, Then there is mismatch

between the cloud storage path where the new data files are created and

the path specified in the snowflake stage object.

-verify the paths and correct them.

Step2- View the copy history

- Copy history shows the history of all file load and errors if any.

- View the copy history by using below query.

Syntax:-

Select * from table (Informtaion_schema.copy_history)

(table_name=> ‘table_name’

Start_time=>timestamp or expression)

);

Step3- Validate the data files.

- If the load operations encounters errors in the data files, The copy history

table function describes the first error encountered in each file.

- To validate the data files , query the validate_pipe_load table.

Syntax:-
Select * from table (Information_schema.validate_pipe_load

( pipe_name=> pipe_nme

Start_time=> timestamp or expression ));

Managing pipes:-

-use DESC pipe_name command to see the pipe properties and the copy

command

-use show pipes command to see all the pipes

- we can pause / resume pipes with pipe_execution_paused= true/false

-it is best practice to pause and resume pipes before and after performing

below actions

1.when modifying the stage object

2.when modifying file format object if stage is using

3.when modifying copy command

- To modify the copy command, recreating pipe is the only possible way

-when you recreate a pipe. All the loads history will be dropped.

➢ Unloading data working session


❖ Unloading process :-

- The process for unloading data into files is the same as the loading process,

expect in reverse.

Step-1

-use the COPY INTO <LOCATION> Command to copy the data from

snowflake database table into one or more files in a snowflake or external

stage.

Step-2

- Download the files from stages

1.From a snowflake stage use the GET command to download the data files.

2.From s3 use the interface /tools provided by amazon s3 to get the data

files.

3.From azure use the interface/tools provided by Microsoft azure to get the

data files.

Syntax for un loading: -

Copy into @stage

From table_name

<options>

Unloading Options: -
-Overwrite- True|FALSE – Specifies to overwrite existing files

-Single – TRUE|FALSE- Specifies whether to generate a single file or

multiple files

-MAX_FILE_SIZE- <NUM> -Maximum file size

-INCULDE_QUERY_ID – TRUE|FALSE- Specifies whether to uniquely identify

unloaded files by including a universally unique identifier.

-Detailed_output- TRUE|FALSE-shows the path and name for each file, its

size, and the number of rows that were unloaded to the file.

➢ Snowflake- Caching

Caching-
-Cache is a temporary storage location that stores copies of files or data so that
they can be accessed faster in future.
-Cache plays vital role in saving costs and speeding up results.
-Improve query performance
Types of cache:-
1.Query Result cache- query result stores up to 24hrs
2.Local disc cache - It is available until VW up and Running, it will cleared when
we suspend the VW.
3.Remote disc cache –Permanent storage location.
Query result:-
-Query results store the data up to 24 hrs
- When it will work means when we are running the FIRST TIME IT WILL SCAN
REMOTE DISC AND SECOND TIME WE RUN queries then it’s not scanned the
permanent storage location it will scan only query result cache and give the result.
-We should run the same queries then only it will work .
Local disc cache :-
- It is available until VW up and running, it will clear when we suspend the VW
- It will work if queries are not same, we can run any changed queries to get the
result faster
-when we run first time this will fetch the data from remote disc and store the ssd
memory.

➢ Snowflake-Time Travel & Fail safe

Time travel:- (is nothing but restoring the historical data )

- A feature that allows users to access historical data as of a specific point in


time.
- It's useful for recovering data that might have been modified or deleted
accidentally.
- No need to enable time travel, it is automatically enabled.
Retention period:-
- Retention is the key component of time travel
- For standard edition, the retention period is 1 day, can set it between 0 to
1day
- For enterprise and higher edition, it is 90 days, can set it between 0 to 90
days.
- Default is 1day, we can set this period at the time of creating object or can
alter later.
- We can change the retention period by using ALTER COMMAND.
- Alter table table_name set data_retention_time_in_days=30;
- Higher retention period, The higher storage cost.
Querying historical data :- 3ways to query historical data
1. At specified time stamp
Synatx-

Select * from table_name before (timestamp=>’Fri, 01 May 2015


16:20:00 -0700’::timestamp_tz);

2. At some time ago- Like 5mints ago data


Syntax-
Select * from table_name at (offset=>-60*5);

3. Before executing any statement /query (That means before what ever
the data present in tables we can get it back by running query id)
Synatx-
Select * from table_name before (statement=> <query_id);

Failsafe:-Fail safe is nothing but once the time travel retention period is
over the data will stored in fail safe area.

-Fail safe provides 7days of period to recover the historical data may be.
-This will start immediately one retention period completed
-We can’t query and restore the fail-safe data.
-For restoring fail safe data, we need contact with snowflake support
team to restore and it will take few hours or few days.
- Once the fail-safe period completed no other ways to restore data.
➢ Snowflake-Zero copy cloning
-snowflake allows you to create clones on tables, schemas and data bases in
seconds.
- We can maintain multiples copies of same data with no additional cost, so
this is called zero_copy_cloning.

- Two popular use cases of cloning in real time those are: -


1.Cloning prod data into DEV/Test environment for our unit testing in the
lower environment.
2.Taking BACKUP OF DATA

Cloning Syntax:-

Create or replace ‘CLONED_TABLE_NAME’ clone ‘source_TABEL_NAME’;

-Here Clone object type can be


1.Database
2.Schema
3.Tables
4.Stage
5.File format
6.Task
7.Stream

➢ Snowflake -Table Types

-Snowflake supports three types of tables


1.Permanent Tables – Whatever the tables we created in snowflake
those tables are permanent until we drop and time travel period is
90days of permanent tables and fail safe available.

2.Transient Tables- Similar like permanent tables but retention period is


only 1 day and no fail-safe period available for this tables. It is useful
when we don’t require any data protection.
Syntax;- create transient table table_name;
3.Temporary Tables- This tables like just use and through it us useful for
DEV work. Retention period 1day and no fail-safe, Once the session ends
tables get dropped.
Syntax- create temporary table table_name;

Points to remember :-
- We can’t convert any type of tables to other type.
- We can create transient databases and schemas.
- We can create temp tables with same name as perm/tarns tables. IF we
query with that table name , It fetch the data from temp tables in that
session
How to find the table types?
-Look at the “kind” filed in show tables properties.

➢ Snowflake- External Tables

Metadata of external tables :-


Value – A variant type of column that presents a single row in the external file.
METADATA$FILENAME:- A pseudo column that identifies the name if each staged
data file included in the external table, including its path in the stage.
METADATA$FILE_ROW_NUMBER- A pseudo column that shows the row number
for each record in a staged data file.
Creating External Tables:-
1.Create file format object
2.Create stage object to cloud storage location
3.Create the external table
Note- If there are multiple files of same pattern, we can just create one external
table and it can store the details of all similar files.

External Tables syntax:-


Create or replace external table <table_name>
(columns)
With_location= <external_stage>
File_format= <fiel_format_object>;

Example:-
Create or replace external table sample_ext
(id int as (value:C1::INT),
Name varchar(20) as (value:C2::VARCHAR),
DEPT INT AS (VALUE:C3::INT))
With_location= @MYS3STAGE
File_format= MYS3CSV;
➢ Snowflake -Task (Scheduling queries)

❖ What is Task:-
-We use task for scheduling SQL quires & stored procedures.
-Task can be combined with streams for implementing the continues change data
captures.
-We can maintain a DAG(Directly ACYCLIC Graph) of tasks to keep the
dependencies between tasks.
-Task req compute resources to execute SQL code we can choose either of
1.snowfalke managed compute resources (serverless)
2.User managed (VW).

Task Syntax:-
Create or replace task task_name
Warehouse= <warehouse_name>
Schedule = <time or cron>
After dep_task_name
As
Sql statement

❖ Altering Task:-

Alster task sytax-


Alter task task_dept add after task_emp;
Alter task task_dept remove after task-emp;
Alter task emp_task set schedule = ‘5 minutes ‘;
Alter task emp_task suspend;
Sample task syntax:-
//task to run a query every 10 min
Create or replace task <task_name.
Warehouse =sample wh
Schedule =’10 minute ‘
As insert into customers (create_data)values (current_timestamp);

//task to call a stored procedure every day at 9:30 UTC


CREATE OR REPLACE TASK CUSTOMER_INSERT
WAREHOUSE = SAMPLE WH
SCHEDULE = ‘USING CRON 30 9 ***UTC’
AS
CALL_PROC_EMPL_ DATA_LOAD();
❖ DAG -Directed acyclic Graph.
-To maintain dependencies between task
-A root task followed by child tasks
-Jut schedule root task, child task will be executed in order.

DAG OF TASK:-

Task -A
CREATE OR REPLACE TASK TASK_A
WAREHOUSE= SAMPLEWH
SCHEDULE = ‘USING CRON 30 9*** UTC’
AS
<SQL QUERY1>;

Task -B
CREATE OR REPLACE TASK TASK_B
WAREHOUSE= SAMPLEWH
After task_A
AS
<SQL QUERY2>;

Task -C
CREATE OR REPLACE TASK TASK_C
WAREHOUSE= SAMPLEWH
After task_A
AS
<SQL QUERY3>;

Task -D
CREATE OR REPLACE TASK TASK_D
WAREHOUSE= SAMPLEWH
AS
<SQL QUERY3>;

ALTER TASK TASK_D ADD AFTER TASK_B;


ALTER TASK TASK_D ADD AFTER TASK_C;

❖ Task History
-We can check task history from information schema table
TASK_HISTORY.

//To see all task history with last executed task first
Select * from table(information_schema.task_history())
Order by scheduled_time desc;

//To see results of a specific task in last 6 hours


Select * from table(information_schema.task_history(
Schedule_time_range_start=>dateadd(‘hour,-6, current_timestamp()),
Task_name=>’taskname’));

//to see results in given time period


Select * from table(information_schema.task_history(
Schedule_time_range_start=> to_timestamp_Ltz(‘2023-12-03
12:00:00:000 -00700’)
Schedule_time_range_start=> to_timestamp_Ltz(‘2023-12-03
13:00:00:000 -700’)));

❖ Troubleshoot task
-If your task is not running as per schedule, check below things

Step-1 :- Verify the task status- it should be like “started “ , if it is in “suspended”


state, resume it by using alter command
Step-2 :- Check the task history - and check execution state. If it is in “failed”
state, take the query id and check the failure reason.

Step-3 :- Verify the permissions granted- to the task owner, Owner should have
access to database, schema, tables and warehouses.
Step-4 :- Verify the condition (Only for streams ): check
system$stream_has_data, stream may not have data changes to process.

➢ Snowflake streams

❖ What is stream ??
- A stream object records DML changes made to tables including inserts, updates
and deletes.
- We call this process as change data capture (CDC)
- Streams are combined with tasks to set continues data pipeline.
- snowpipe +Stream+Task -> Continues data load

❖ Metadata of streams.
-METADATA$ACTION : Indicates the DML Operation (insert, delete ) recorded.
-METADATA$ISUPDATE:- Indicates whether the operation was part of an update
statement, updates to rows in the source object are represented as a pair of
delete and insert records in the stream with a metadata column
metadata$isupadte values set to true.
-METADATA$ROW_ID :- Specifies the unique and immutable id for the row, which
can be used to track changes to specific rows over time.
❖ Consuming data from stream
-We can merge statement for consuming the changes from stream and applying
the same on target tables.

//To identify INSERT records


Where metadata$action = ‘INSERT’ AND METADATA$ISUPDATE = ‘FALSE’ ;

//To identify update records


Whare metadata$action =’update’ AND METADATA$IS UPDATE =’true’;

//To identify delete records


Where metadata$action= ‘delete’ and metadata$isupdate = ‘false’;
Note- If we want consume these changes to multiple tables then we have to
create multiple streams.

❖ Types of streams

1. Standard steams :- A standard stream track all DML changes to the source
object, including inserts, updates and deletes (including table truncates ).
Syntax- Create or replace stream MY-STREAM ON TABLE MY_TBALE;
2. Append_only Streams :- An append only stream track row inserts only.
Update and delete operations (Including table truncates) are not recorded.
Syntax:- create or replace stream MY_STREAM ON TABLE MY_TABLE
APPEND_ONLY=TRUE ;
3. Insert_only Streams:- Supported for external tables. It will track row insert
only. They do not record delete operations.
Syntax:- create or replace stream my_stream on external_table my_table
Insert_only= True;

➢ Snowflake -Views and materialized


views
View or normal views :- it is kind of virtual table
-A view allows the result of a query to be accessed as if it were a table. The
query is specified in the CREATE VIEW statement. Views serve a variety of
purposes, including combining, segregating, and protecting data.
- normal views allows Anyone can see the view defination
Views syntax:-
Create or replace view view_name
As
Select statement;

Types of views :-
1. Non-Materialized views (Normal Views)
2. Secure views
3. Materialized views.
Secure views:-
-A secure views does not allows Unauthorized users to see the definition of the
view.
-Users cant see the underlaying sql query
Advantage of secure views;_
1.can protect the data by not exposing to other users
2.I don’t want the users to see underlaying tables present in our data base.
Syntax;-
Create or replace secure view view_name
As
Select statement ;
Materialized view :-
- A martialized views stores precomputed result set.
- Querying materialized views gives the better performance then quiring the
base tables.
- Can create a single table, can’t build on multiple tables.
- Designed for improved query performance when we are using same dataset
repeatedly.
- Available in enterprise edition and higher.
Syntax;-
Create or replace materialized view view_name
As
Sql statement;

When to create materialized views :-


➢ Create materialized views when all of the following are true:
1.The query result from the view don’t change often.
2.The result of the view are used often.
3.The query consumes a lot of resources means query takes longer time for
processing like aggregating data.
➢ Create a regular view when any of the following are true.
1. The results of the view change often.
2. The results are not used often
3. The query is simple
4. The query contains multiple tables.
Limitations of materialized views:-
-Can query a single table only
- does not support joins, including self joins
- does not support all aggregate and windowing functions
-when the base table is altered or dropped, the materialized view suspended.
- materialized view cannot query.
1.another materialized view
2. A normal view
3. A UDF (USER DEFINED FUNCTION).

➢ Snowflake – Dynamic Data Masking


Column level security :-
- We can apply the masking on column level security by using dynamic
masking To protect sensitive data like customers PH, BANK balance info etc
- Column level security includes two features
1.Dynamic data masking
2.External tokenization.
Dynamic data masking :- Is the process of hiding data by masking with other
characters. We can create masking policies to hide the data present in columns.
External tokenization:- is the process of hiding sensitive data by replacing it with
cypher text.

❖ Masking policy :-
- Is the way of hiding sensitive data from unauthorized access
- Masking policies are schema level object.
- Same masking policy can be applied on multiple columns.
❖ Dynamic data masking
- Whatever the Sensitive data in existing tables snowflake will not be
modified, but whenever we run the query, it will apply the masking
dynamically and displays the masked data this is called dynamic masking.
- Data can be masked partially
- Unauthorized users can operate the data but they can’t view the data.
- Mostly masking policies applied based on the roles.

❖ Creating masking policy


- Masking policy can be created by using CREATE MASKING POLICY command.
//Based on the role
synatx
Create masking policy employee_ssn_mask as (“column_name”,
“datatype)return string->
Case
when current_role () in (‘payroll’)then coulm_name else ‘****’
end;
//based on some condition
Syntax
Creating masking policy email_visbility as
(email varchar, visibility string) returns varchar ->
Case
When current_role()= ‘admin’ then email
When visibility = ‘public’ then email
Else ‘***masked***’
End;
❖ Applying masking policy
- After creating masking policies, we can apply them wherever we have
requirement to protect the data or hide the data.
- Masking policy only applied on column level.
- We can apply same policy on multiple columns from multiple tables and
views
//setting or applying masking policy
Alter table table_name modify column column_name
Set masking policy policy_name;

Alter table public.employee


Modify column ssn set masking policy employee_ssn_mask
Modify coloumn email set masking policy email_visibility using (email,
visibility);
❖ Removing masking policy
//unsetting or removing masking policy
Alter table public.employee
Modify column ssn UNSET MASKING POLICY;
Alter table public.employee
Modify column ssn UNSET MASKING POLICY;
Modify column email UNSET MASKING POLICY;
❖ Altering and dropping policies
//altering or modigying
Alter masking policy policy_name set body -><case statement>;
Alter masking policy policy_name rename to new_policy_name;

//dropping
Drop masking policy policy_name;

Limitations:-
1. Before dropping masking policies, we should unset them.
2. Data type of input and output values must be same

Interview question on data masking


1.How will you apply column level security in snowflake??
Ans:- By using dynamic data masking we can apply column level security where I
can create masking policies and I can apply those masking polices on columns of a
table or view.
2.How can you create masking policies??
Ans:-For creating masking policy we have syntax called “create masking policy
policy_name and I have to provide this input parameters and return type I have to
provide then the case condition I have to write.
3.How you apply the masking policy??
Ans:- We have to use alter command “alter table table_name modify column
column_name set masking policy policy_name

➢ Snowflake- Data sharing

Data sharing: -First we need to create share object to share the data
- Data can be shared securely
- We can share the data to other snowflake users and to non-snowflake
users.
- For non-snowflake users we have to create reader account and share the
data
- Provider- who is sharing the data by creating share object
- Consumer – who is consuming or using shared data.
Below mentioned objects we can shares with other account: -
-Tables
-External tables
-Secure views
-Secure materialized views
-Secure UDF’s

➢ Snowflake -Stored procedure


Stored procedure: -
- Stored procedures allow you to write procedural code that includes sql
statement, conditional statements, looping statements and cursors.
- From a stored procedure you can return a single valuer or tabular data.
- Supports branching and looping
- Dynamically creating a sql statement and execute it.
➢ Snowflake-Data sampling

Data sampling:-
• selecting some part of data or subset of records from a table
• Query Building and testing.
• Data analysis or understanding.
• Useful in dev env where we use small warehouses and occupy less
storage
Sampling method types:-
1. Bernoulli or row – it will fetch 10% 4million = 4 lakh rows
2. System or block - it will fetch data from 10% of 600 = 60 micro
partitioning.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy