Most Frequently Asked Azure Data Factory Interview Questions

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Most Frequently Asked Azure Data Factory Interview Questions

A Z URE BE G I NNE R C LO UD C O M PUT I NG D AT A E NG I NE E RI NG D AT A W A RE HO US E I NT E RVI E W Q UE S T I O NS

Introduction

Azure data factory (ADF) is a cloud-based data ingestion and ETL (Extract, Transform, Load) tool. The
data-driven workflow in ADF orchestrates and automates data movement and data transformation. Azure
data factory helps organizations across the globe in making critical business decisions by collecting data
from various sources such as e-commerce websites, supply chains, logistics, healthcare, etc., transforming
that data into a usable and trusted resource using multiple operations like filtering, concatenation, sorting,
etc., and loads that data into a destination store.

Source: https://github.com/mspnp/azure-data-factory-sqldw -elt-pipeline

Learning Objectives:

In this article, we will

1. Understand what Azure Data Factory is.

2. Gain knowledge about different types of activities supported by Azure Data Factory.

3. Look into some scenario-based questions on ADF.

4. Learn data store credentials can be secured in ADF.

5. Acknowledge the various data stores and file formats supported by Azure Data Factory.

This article was published as a part of the Data Science Blogathon.

Table of Contents

1. Q1. What is Azure Data Factory?


2. Q2. How Azure Data Factory Makes the Process of Creating a Data Pipeline Easy?
3. Q3. What are the Different Types of Activities Supported by Azure Data Factory?
4. Q4. Solve the Project Scenario based on Question 1.
5. Q5. What are Annotations in Azure Data Factory?
. Q6. Solve the Project Scenario based on Question 2.
7. Q7. How Can Users Secure Their Data Store Credentials in ADF?
. Q8. State the Difference Between Pipeline Parameters and Variables in ADF.
9. Q9. Name Some Data Stores and File Formats Supported by Azure Data Factory.
10. Q10. Which Activity of Azure Data Factory can be Used to Copy Data From Azure Blob Storage to Azure
SQL?

Q1. What is Azure Data Factory?

ADF is a cloud-based data ingestion and ETL (Extract, Transform, Load) Azure service.ADF helps
organizations across the globe in making critical business decisions by building complex ETL processes
and scheduled event-driven workflows to process data which later can be used by various reporting tools
for storytelling purposes.

Source: docs.microsoft.com

Q2. How Azure Data Factory Makes the Process of Creating a Data
Pipeline Easy?

ADF makes the process of creating a data pipeline easy by providing built-in connectors for data ingestion
and orchestration, giving various activity options to perform operations such as copying data, for-each
loop, look-up, etc., validating, publishing and monitoring pipelines, continuous integration, and continuous
deployment support to the pipelines.

Q3. What are the Different Types of Activities Supported by Azure Data
Factory?

Below are the different types of activities supported by ADF:

1. Data Movement Activities: Activities used to move data from one data source to another in a data
pipeline are known as Data movement activities. For example, copy activity can be used to copy data from
ADLS to Azure SQL.
2. Data Transformation Activities: Activities used to perform data transformation in a data pipeline are
known as Data transformation activities. Data Flow Activity, Azure Functions Activity, Databricks Notebook
Activity, etc., are examples of data transformation activities.

3. Control Activities: Activities used to build conditional, sequential, or iterative conditional logic in a data
pipeline are known as control activities. Lookup Activity, Until Activity, For-Each Activity, etc., are examples
of control activities.

Source: docs.microsoft.com

Q4. Solve the Project Scenario based on Question 1.

Your data team is building an ETL pipeline for a client. You want to generate output files from Azure Data
Factory which are optimized for read-heavy analytical workloads and suppor t the columnar format. What
should be the file format of output files?

The generated output files should have Parquet format as Parquet stores data in columns and are
optimized for read-heavy analytical workloads.

Q5. What are Annotations in Azure Data Factory?

Annotations are additional informative tags that help in filtering and searching data factory resources such
as datasets, pipelines, linked services, etc. For example, if you are working as a team lead for a large data
processing project for a client ABC that uses ADF containing 10 pipelines. To avoid confusion in the data
processing sequence, we can label each pipeline with its primary purpose: ingest, transform, or load using
annotations. When we are monitoring pipelines, these annotations must be available to perform searching,
grouping, and filtering.

Q6. Solve the Project Scenario based on Question 2.

A data science company handles data processing for different clients. Your team is building an ADF
pipeline to move user logs generated based on users’ activities on an e-commerce platform from an
ADLS container to a database inside Azure Synapse dedicated SQL pool. The user logs are stored in
container users in the following folder structure./user/{YYYY}/{MM}/{DD}/{HH}/{mm}
The earliest folder is /user/2021/01/02/00/00. The latest folder is /user/2021/01/17/01/45.
How would you configure the pipeline to trigger so that existing data must be loaded every 30 minutes,
and up to two minutes delay in data arrival must be included in the time at which the data should have
arrived?We can configure the pipeline to trigger using a tumbling window trigger with Recurrence: 30
minutes, Star t time: 2021-01-01T00:00, and Delay: 2 minutes to achieve the above scenario.
Q7. How Can Users Secure Their Data Store Credentials in ADF?

Users can secure their data store credentials in ADF by storing them in Azure Key Vault or encrypting them
with certificates. Azure Key Vault is an Azure service used to securely store API keys, data store
credentials, passwords, etc., to prevent unauthorized access. Developers can easily import or create keys,
authorize users to access the key vault, and configure and manage the keys using Azure Key Vault.

Q8. State the Difference Between Pipeline Parameters and Variables in


ADF.

Pipeline parameters are created using the “Parameters” tab in the pipeline and cannot be modified while a
pipeline is running.

Source: learn.microsoft.com

Pipeline variables can be modified and set using Set variable activity during a pipeline run.

Source: learn.microsoft.com

Q9. Name Some Data Stores and File Formats Supported by Azure Data
Factory.

Azure Data Factory supports various data stores such as Azure SQL, Azure Storage, Azure Databricks,
HBase, Hive, Impala, MariaDB, Oracle, Cassandra, Amazon S3, MongoDB Atlas, etc. ADF supports various
file formats such as Parquet, Avro, JSON, Delta, Excel, XML, Delimited text format, etc.

Q10. Which Activity of Azure Data Factory can be Used to Copy Data
From Azure Blob Storage to Azure SQL?

Copy activity can be used in ADF to copy data from Azure Blob Storage to Azure SQL. Copy activity is used
to copy data from between different data sources. Copy Activity reads data from the source store,
performs column mapping, data compression/decompression based on data type and input and output
dataset format, and writes the data into the destination data store.
Source: learn.microsoft.com

Conclusion

Azure Data Factory (ADF) is a cloud-based data ingestion and ETL (Extract, Transform, Load) Azure
service. The data-driven workflow in ADF orchestrates and automates data movement and data
transformation. ADF helps developers to build complex ETL processes and scheduled event-driven
workflows to process data which later can be used by various reporting tools for storytelling purposes.
Below are some key points from the above article:

1. We have seen how ADF makes the process of creating a data pipeline easy.

2. We learned about approaches by which users can secure their data store credentials in ADF.

3. We have seen the differences between pipeline parameters and variables in ADF.

4. We got an understanding of how we can copy data from Azure Blob Storage to Azure SQL using ADF.

5. Apart from this, we also saw some scenario-based questions on ADF.

The media shown in this ar ticle is not owned by Analytics Vidhya and is used at the Author’s discretion.

Article Url - https://www.analyticsvidhya.com/blog/2023/02/most-frequently-asked-azure-data-factory-


interview-questions/

Chaitanya Shah

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy