Snowflake Interview Question
Snowflake Interview Question
5. I am having stored procedure named as test where I am returning error as true or false , so
how to call these store proc.?
6. Which cloud service you are using along with snowflake?
7. UNION and UNIOn ALL
Snowflake Union: Combines the result set queries and removes
duplicates if any.
Snowflake Union ALL: Combines the result set queries without
removing duplicates if any
SELECT C1, C2, C3 FROM TABLE_1 UNION SELECT C1, C2, C3 FROM
TABLE_2;
8. I want to read the data from external table, how you will red it.
Select value:”column name” as “anything” from tablename.
9. Online editor:
https://www.programiz.com/sql/online-compiler/
select c.customer_id,c.first_name, c.last_name,o.item,o.amount,s.status
from Customers c
left join Orders o on c.customer_id=o.customer_id
left join Shippings s on c.customer_id=s.customer;
10. http://teachmehana.com/row-store-vs-column-store/
11. Which snowflake version you worked I mean standard, enterprised or business
11. [{“Employee_ID”:1901,”Employee_Name”:”James”,”salary”:$9000}]
12. [{“Employee_ID”:1901,”Employee_Name”:”James”,”salary”:$9000,expenses:{“Fast_Food”:
$200,”Hotelling”:$50}}]
11. how does caching works when underline table get updated in snowflake?
Ans: cache will not used here, snowflake directly connect to database storage layer.
13. When we update or delete the records then changes made at storage level, so when ever we
run select query after delete or update the records its connect to database storage layer via
compute layer and it charge you.
14. Does the chache shared across multiple uesrs? -- yes
15. For how long the query result is chached?—24 hrs
16. Does any charges for storing cache?—No it’s a in memory storage
17. Does the aws has same service as snowflake? Yes redshift spectrum where we query the
data directly from s3
18. Which sql version does snowflake support? It support standard sql version i.e ANSI
19. Which are the could platform supported by snowflake? AWS, AZURE, GCP
20. Which are the ETL tool use with snowflake? AWS glue, Apache airflow, Hevo
Data,informatica
21. In Snowflake, Zero-copy cloning is an implementation that enables us to generate a copy of
our tables, databases, schemas without replicating the actual data. To carry out zero-copy in
Snowflake, we have to use the keyword known as CLONE.
22. What is snowflake time travel?
Ans: Snowflake Time Travel tool allows us to access the past data at any moment in the
specified period.
23. Explain Fail-safe in Snowflake?
Ans: Fail-safe is a modern feature that exists in Snowflake to assure data security. Fail-safe
plays a vital role in the data protection lifecycle of the Snowflake. Fail-safe provides seven
days of additional storage even after the time travel period is completed.
https://snowflakemasters.in/snowflake-interview-questions/
24. Data at snowflake stage is actual data or metadata? It’s a metadata, it reference to the data
present at s3
25. Does Snowflake use Indexes? NO, snowflake as of now not uses index but yes snowflake
adding indexes in its new version—24-11-2022
26. Is Snowflake OLTP or OLAP? Actually snowflake is developed for OLAP database system(for
analysing historical data) but subject to usage you can use it for OLTP
27. How many records get displayed by default when we use select query? 10,000
28. https://medium.com/@sanket.prabhu34/commonly-asked-snowflake-interview-questions-
e3863732c53f
29. explain Column-level Security in snowflake?—it’s a masking
https://docs.snowflake.com/en/user-guide/security-column-intro.html#what-are-masking-
policies
30. can we execute stored procedure one after other rather than scheduled based?
31. https://www.edureka.co/blog/interview-questions/sql-query-interview-questions
32. https://www.interviewbit.com/sql-interview-questions/
33. https://www.kdnuggets.com/2020/11/5-tricky-sql-queries-solved.html
34. 250 SQL Server Interview Questions And Answers For Experienced (codingcompiler.com)
35. Write a query to find the duplicate
fName lName salary id
Neil lee 2000 1
adam young 5000 2
john meloni 2500 3
adam young 1900 4
john meloni 6500 5
Arnold brent 4500 6
36. Difference Between Snowflake Stored Procedure and UDFs - SP vs UDFs - DWgeek.com
37. What are indexes and what are its type. What is clustered and non clustered index types,
what is unique index
38. Write a query to get 2nd highest salary using common table expression(CTE)
39. Parsing json in snowflake from external stage
--==============================================================================
--18-12-2023
Clone related questions:-
1) What is cloning?
2) How to clone a table, schema and
database?
3) Why cloning is called zero copy cloning?
4) In which situation we have used
cloning in our project?
5) Does cloned object will have storage?
6) When we make any changes to cloned
object, does it get reflected the base table and vice versa.?
7) A new record inserted in the cloned
table – Does it have storage or not. If storage is there then where it will
store the data?
8) Can we perform DML operations on
cloned tables? How will it impact original tables?
--22-12-2023
What is data chaching in snowflake? What are the types of data chaching?
What is the difference between time travel and failsafe?
Explain Snowflake architecture
--03-01-2024
What is difference between delete and truncate?
What is difference between execute as caller and owner?
how we can manage metadata in snowflake?
What is difference between STUFF() and REPLACE()?
How we can check who deletes the particular table in SQL SERVER?
What is cluster key in snowflake?
What is the max file size which we can upload into stage?
What is the data ingestion? what are the ingestion techniques?
Suppose we have A and B tables and we are applying CROSS join then how the result looks like?
In which case you have used horizontal scaling in your current project?
what is RBAG?
Data Mart: A data mart is a simple form of data warehouse that is designed to serve a specific
business unit or team. It contains a subset of an organization's data and is optimized for the queries
and reports needed by that particular team.
Data Warehouse: A data warehouse is a large, centralized repository of data that is used for
reporting and data analysis. It is designed to handle the massive volumes of data that businesses
generate, and it is optimized for querying and analysis rather than transaction processing.
Data Lake: A data lake is a large, centralized repository of raw, unstructured data. It is designed to
handle the diverse and ever-growing volume of data that businesses generate, and it is optimized for
storing and processing large volumes of data in its native format.
Time Travel and Syntax: Time travel is a feature in some databases that allows you to query the
database as it existed at some point in the past. In Snowflake, you can use the SYSTEM$ASOF
function to query data as it existed at a specific time. The syntax is:
Loading specific columns: To load only 5 columns from a stage that has 1000 columns, you can use
the COPY INTO command with the HEADER = TRUE and COLUMN_NAMES = ('col1', 'col2', ..., 'col5')
options. For example:
FROM @my_stage
HEADER = TRUE
Extracting changed records: To extract the changed records (excluding duplicates) between two files
received on different days, you would need to first load both files into separate tables. Then, you can
use the INTERSECT and MINUS operators to find the records that are in one table but not the other.
For example:
MINUS
MINUS
INTERSECT
Transient Table: A transient table is a table that is stored in the temporary tablespace and is
automatically dropped when the session ends. It is useful for storing temporary results or for use in
transient data processing.
Temporary Table: A temporary table is a table that is stored in the temporary tablespace and is
visible only to the current session. It is useful for storing temporary results or for use in transient
data processing.
Usage of transient table: I have used transient tables in Snowflake for storing temporary results
while performing complex data transformations. This allows me to improve the performance of the
query by storing intermediate results in memory, rather than re-computing them for each step of
the transformation.
Max size of data warehouse: The maximum size of a data warehouse in Snowflake depends on the
edition and the size of the underlying cloud infrastructure
Max records of data processed: I have processed up to billions of records in a single query in
Snowflake. The exact number depends on the complexity of the query and the resources available in
the data warehouse.
Secure View: A secure view is a virtual table that is defined by a SELECT statement, but is not stored
as a physical table. It is used to provide access to data in a secure manner by allowing users to query
the data without granting them direct access to the underlying tables.
Snowflake being relatively new most of the interview questions would be based on how you have
implemented and used it in your project.
3. Data loading include copy command, how do you create stages, various stage -
table/internal/external.
6. Scenarios like how do you handle duplicates as integrity constraints are not enforced in SF.
12. If you have worked on migration what are challenges faced and how did you overcome it.
19. Snowpipe
20.Data Unloading.
22. How does normal views and materials views differ in Snowflake
There can be various scenario based question based on what you explain about your project and
how knowledgeable interviewer is.
With Snowflake you need to have basic to intermediate knowledge on cloud technology and
orchestration will be add on.
When to use star schema and when to use snowflake schema for designing?
--2024-04-15
10.) What are aggregate function and when do we use them? explain with few example.
25.) What is the difference between UNION and UNION ALL in SQL?
--2024-04-15
1. What is the data flow, and how many layers are in our projects?
3. What are alternative methods for loading data into Snowflake without using JSON functions?
6. What is a stream in Snowflake, and what are the columns present in a stream?
14. How do you move 100 GB of data into SF? Describe the steps you would follow.
15. What is the maximum size of a file that can be loaded into an S3 bucket?
17. How can you create a table in Oracle with a time/travel retention period to go back before
12 days?
19. Have you worked with Snowpipe? If so, describe your experience in creating and using
Snowpipe.
20. Explain the concept of a Merge statement in the context of a relational database.
21. What ETL (Extract, Transform, Load) tool would you recommend for data integration tasks,
and what are the key features that make it suitable?
22. Can you explain the key features and advantages of using Snowflake as a cloud data
warehouse platform?
23. What are the key components and architectural considerations when designing a data
solution using Snowflake as the underlying data warehouse?
24. How does Snowflake handle caching, and what role does it play in optimizing query
performance?
25. What strategies or best practices can be employed to enhance the performance of a
Snowflake data warehouse?
26. What mechanisms does Snowflake provide for ensuring fail-safe operations, especially in the
context of data processing and storage?
27. What steps should be followed when extracting data from Snowflake as a source for a data
integration or ETL process?
28. Can you elaborate on the internal storage architecture of Snowflake and how data is
organized within the platform?
29. What methods or tools can be used to schedule and automate tasks or jobs within
Snowflake?
30. What is normalization in the context of database design, and why is it important?
31. Explain the concept of third normalization in database design and its significance.
32. Query to find the second date from the given dates (2020-01-23, 2020-02-21):
33. Provide a SQL query to find the second date from a set of given dates, considering a specific
condition.
34. How can you extract the fourth character from each value in a specific column (col_a)
containing city names like Chennai, AP, and Mumbai?
35. What steps are involved in creating a table in Snowflake, and how can you insert data from a
file into that table?
36. What are the advantages and use cases for Snow SQL, Snowflake's SQL-based query
language?
37. Explain the methods or procedures to recover or retrieve records that have been
accidentally deleted from a Snowflake table.
40. Can we perform any DML in the clone table, and what happens to storage in Snowflake?
44. What is the max size of the VARIANT data type in Snowflake?
47. How will you optimize a query even when it has a cluster key, and the query is still taking
time with a large volume of data in Snowflake?
48. If a file with the same filename gets loaded into S3 in Snowflake, what will happen?
49. If a file with the same filename gets loaded into S3 after deleting the old one, will the
duplicate data get loaded, and how will it load?
61. How to eliminate the entire row duplicate in the flat file of Snowflake?
62. How to retain one unique record and delete duplicates in Snowflake tables?
64. What are the limitations we have in using SQL language in Snowflake Java scripts?
65. How to integrate and share data between Unix and Snowflake using SnowSQL on Unix?
Round1:
Interview Questions
4. Difference between Redshift and Snowflake. What are the difficulties you faced in Redshift
compared with Snowflake? (Because i have experience in both)
8. How did you handle the situation when you were unable to complete a task within the timeline?
10. What is the maximum number of records you have worked with?
11. What are the techniques you used to tune SQL queries?
12. Tell me the list of transformations that you used in your project.
Query section:
2. How did you identify the current month and previous month salaries of employees? Assume we
have a table with dates and salaries.
4. Can you explain how virtual warehouses affect the scalability, performance, and cost management
of data processing tasks?
5. Can you discuss how Snowflake’s compatibility with ANSI SQL standards influences the querying
and data manipulation capabilities?
6. Can you explain Snowflake's approach to data security, specifically its always-on encryption?
7. Can you explain Snowflake's support for both ETL and ELT processes?
8. What are all ETL tools you have used with Snowflake?
9. Can you explain how the advanced feature Snowpipe is used for continuous data ingestion?
15. Can you describe the impact of the different states of virtual warehouses on query performance?
17. How do you build a Snowflake task that calls a Stored Procedure?
18. You have a JSON data column in a table storing customer feedback with specific keys. Write a
query to extract and display the feedback text and timestamp for a specific customer_id.
Fact tables and dimension tables are two key components in a dimensional modeling approach used
in data warehousing.
1. Fact Tables:
• Fact tables contain the quantitative data, also known as facts, that are typically
numerical values representing business transactions or events.
• They are usually large tables and store information such as sales amounts, quantities
sold, or revenues.
• Fact tables often have foreign keys that reference the primary keys of dimension
tables, creating relationships between them.
2. Dimension Tables:
• Dimension tables contain descriptive attributes or context for the data stored in the
fact table.
• They provide the necessary context to interpret the data in the fact table.
• Examples of dimension tables include product, customer, time, and location tables.
In summary, fact tables store quantitative data about business processes, while dimension tables
provide the context or descriptive attributes related to that data. They are linked through foreign
key relationships to create a comprehensive and understandable data model for analysis and
reporting purposes.