0% found this document useful (0 votes)

133 views37 pages

Use Delta Lake in Azure Synapse Analytics

Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark and big data workloads. It adds relational database semantics like tables, queries, and data modification, along with features like ACID transactions, schema enforcement, and support for both batch and streaming data. Delta Lake tables can be queried and modified using Spark SQL and are compatible with other big data technologies.

Uploaded by

jaspreet singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

133 views37 pages

Use Delta Lake in Azure Synapse Analytics

Uploaded by

jaspreet singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Perform data engineering

with Azure Synapse Apache

Spark Pools
Use Delta Lake in Azure Synapse Analytics

© Copyright Microsoft Corporation. All rights reserved.

Analyze data with Apache Spark in Azure Synapse Analytics

Agenda
Transform data with Apache Spark in Azure Synapse Analytics

Use Delta Lake in Azure Synapse Analytics

© Copyright Microsoft Corporation. All rights reserved.

Use Delta Lake in Azure Synapse Analytics

© Copyright Microsoft Corporation. All rights reserved.

What is Delta Lake?
• Delta lake is an open-source storage layer that brings ACID (atomicity, consistency, isolation, and
durability) transactions to Apache Spark and big data workloads.

Learn more >> Delta Lake Official Website © Copyright Microsoft Corporation. All rights reserved.
What is Delta Lake?
• Delta lake is an open-source storage layer that brings ACID (atomicity, consistency, isolation, and
durability) transactions to Apache Spark and big data workloads.
• Delta lake is a type of data lake that adds additional features, such as ACID transactions, schema
enforcement, and lineage tracking. These features make Delta Lakes more reliable and easier to
manage than traditional data lakes. Delta Lakes is also a good choice for streaming data
applications.
• A Delta Lake is a storage layer designed to run on top of an existing data lake and improve its
reliability, security, and performance. Delta Lakes support ACID transactions, scalable metadata,
unified streaming, and batch data processing.

The current version of Delta Lake included with Azure Synapse has language support for Scala,
PySpark, and .NET and is compatible with Linux Foundation Delta Lake.

Learn more >> Delta Lake Official Website © Copyright Microsoft Corporation. All rights reserved.
What is Delta Lake?
So, the Delta Lakes are open-source storage layer that adds relational database
semantics to Spark
• Relational tables that support querying and data modification

© Copyright Microsoft Corporation. All rights reserved.

What is Delta Lake?
So, the Delta Lakes are open-source storage layer that adds relational database
semantics to Spark
• Relational tables that support querying and data modification
• Support for ACID* transactions
*ACID is an acronym that refers to the set of 4 key properties that define a transaction:
• Atomicity: Multiple operations can be grouped into a single logical entity. Each statement in a transaction (to read,
write, update or delete data) is treated as a single unit. Either the entire statement is executed, or none of it is
executed. This property prevents data loss and corruption from occurring if, for example, if your streaming data
source fails mid-stream.
• Consistency: ensures that transactions only make changes to tables in predefined, predictable ways.
Transactional consistency ensures that corruption or errors in your data do not create unintended
consequences for the integrity of your table.
• Isolation: when multiple users are reading and writing from the same table all at once, isolation of their
transactions ensures that the concurrent transactions don't interfere with or affect one another. Each request
can occur as though they were occurring one by one, even though they're actually occurring simultaneously.
• Durability: ensures that changes to your data made by successfully executed transactions will be saved, even
in the event of system failure.
If a database operation has the above ACID properties, it can be called an ACID transaction, and data storage
systems that apply these operations are called transactional systems.
© Copyright Microsoft Corporation. All rights reserved.
What is Delta Lake?
So, the Delta Lakes are open-source storage layer that adds relational database
semantics to Spark
• Relational tables that support querying and data modification
• Support for ACID* transactions

ACID (Atomicity, Consistency, Isolation, Durability) transactions are fundamental concepts in database systems that
ensure data integrity and reliability. Here's a simplified example of an ACID transaction:
Let's consider a banking application where a user transfers money from one account to another. We'll ensure the
transaction adheres to the ACID properties:
1. Atomicity: The entire transaction must be completed successfully, or none of it should be applied.
2. Consistency: The database must transition from one consistent state to another consistent state after a
transaction. For example, if the total sum of all accounts' balances must remain the same before and after the
transaction.
3. Isolation: Transactions should appear isolated from each other. Even if multiple transactions are executing
concurrently, the result should be the same as if they were executed serially.
4. Durability: Once a transaction is committed, its effects are permanent, even in the case of system failure.

© Copyright Microsoft Corporation. All rights reserved.

What is Delta Lake?
So, the Delta Lakes are open-source storage layer that adds relational database
semantics to Spark
• Relational tables that support querying and data modification
• Support for ACID* transactions
• Data versioning and Time Travel
• Support for batch and streaming data
• Standard formats and interoperability

© Copyright Microsoft Corporation. All rights reserved.

What is Delta Lake?
So, the Delta Lakes are open-source storage layer that adds relational database
semantics to Spark
• Relational tables that support querying and data modification
• Support for ACID* transactions
• Data versioning and Time Travel
• Support for batch and streaming data
• Standard formats and interoperability

[Good to know]
Please note that you don’t need to use Delta Lake to manipulate or query data in Spark using SQL. You can create
tables in the Hive metastore with data files in CSV or Parquet format and query them with SELECT statements.
However, Delta Lake saves the data in Parquet format with additional metadata that enables tracking of transactions
and versioning, providing an experience much more similar to a relational database system like SQL Server. Most
new “data lakehouse” solutions built on Spark are based on Delta Lake, enabling you to combine the flexibility of
file-based data lake storage with the transactional data integrity capabilities of a relational data warehouse.

© Copyright Microsoft Corporation. All rights reserved.

Create Delta Lake tables

Create a Delta Lake table from a dataframe

df = spark.read.load("/data/mydata.csv", format="csv", header=True)

delta_table_path = "/delta/mydata”

df.write.format("delta").save(delta_table_path)

[Good to know]
Delta Lake tables are just like any other metastore tables except that the delta file format is
used to save the data. This results in a folder structure that not only contains the data files, but
also metadata files that enable transactional functionality.

Learn more >> How to create and append to Delta Lake tables with pandas
© Copyright Microsoft Corporation. All rights reserved.
Create Delta Lake tables

Make conditional updates

To update a delta lake table using the API, you create a DeltaTable object based on the path where the data
files are stored, and use its methods to perform data updates, inserts, and deletes.

from delta.tables import *

from pyspark.sql.functions import *
spark: pyspark.sql.session.SparkSession
deltaTable = DeltaTable.forPath(spark, delta_table_path)
From the previous slides:
deltaTable.update( delta_table_path = "/delta/mydata”
condition = "Category == 'Accessories'",
set = { "Price": "Price * 0.9" }) deltaTable is a DeltaTable object

Learn more >> Table deletes, updates, and merges

Delta Lake supports several statements to facilitate deleting data from and updating data in Delta tables.

© Copyright Microsoft Corporation. All rights reserved.

Create Delta Lake tables
If you want to access the data that you overwrote, you can query a snapshot of the table before you overwrote the first set of
data using the versionAsOf option.

From the previous slides:

Query a previous version (Time Travel) delta_table_path = "/delta/mydata”

df = spark.read.format("delta").option("versionAsOf", 0).load(delta_table_path)

The “Time Travel” feature of Delta Lake takes advantage of the metadata, which tracks transactions in the table.
This enables you to retrieve and compare different versions of the same row based on the sequence of
modifications made to the table or a given point in time.

© Copyright Microsoft Corporation. All rights reserved.

Create catalog tables

• So far you’ve worked with delta tables by loading data from the folder containing
the parquet files on which the table is based.

• You can define catalog tables that encapsulate the data and provide a named table
entity that you can reference in SQL code.

• Spark supports two kinds of catalog tables for delta lake.

• Managed tables
• External tables

© Copyright Microsoft Corporation. All rights reserved.

Create catalog tables
Catalog tables are how the metastore defines relational tables “on top of” file locations.
• They’re not unique to Delta Lake (you can create managed and external tables for Parquet and CSV
formats too)
• [but] increasingly Delta Lake is the preferred way to build relational semantics into a data lakehouse
solution on Spark.

© Copyright Microsoft Corporation. All rights reserved.

Key difference between managed and internal tables

© Copyright Microsoft Corporation. All rights reserved.

Key difference between managed and internal tables

• Managed tables
• Defined without a specific location – files are created in metastore folder [stored in the default
metastore file system location]
• [tightly bound to the files] >> Dropping the table deletes the files
• External tables
• Defined with a specific file location
• Dropping the table does not delete the files

© Copyright Microsoft Corporation. All rights reserved.

Create catalog tables

df.write.format("delta").option("path","/mydata").saveAsTable("MyExternalTable")

spark.sql("CREATE TABLE MyExternalTable USING DELTA LOCATION '/mydata'")

%%sql
CREATE TABLE MyExternalTable
USING DELTA
LOCATION '/mydata'

© Copyright Microsoft Corporation. All rights reserved.

Use Delta Lake with streaming data

Delta Lake is deeply integrated with Spark Structured Streaming through readStream and writeStream.

Learn more >> Table streaming reads and writes

Delta Lake is deeply integrated with Spark Structured Streaming through readStream and writeStream.

Delta Lake overcomes many of the limitations typically associated with streaming systems and files, including:
• Maintaining “exactly-once” processing with more than one stream (or concurrent batch jobs)
• Efficiently discovering which files are new when using files as the source for a stream

Learn more >> Table streaming reads and writes

Delta Lake is deeply integrated with Spark Structured Streaming through readStream and writeStream.

The key point is that Spark Structured Streaming provides a way to handle a stream of real-time data by
using Dataframe semantics – essentially you can query a stream of data like a boundless dataframe that is
perpetually receiving new data.

Learn more >> Table streaming reads and writes

Delta Lake is deeply integrated with Spark Structured Streaming through readStream and writeStream.

Streaming support in Delta Lake builds on this by enabling you to treat a table as a source of streaming
data for a Spark Structured Streaming dataframe, or as a sink (target) to which a Spark Structured Streaming
dataframe writes its data.

Learn more >> Table streaming reads and writes

© Copyright Microsoft Corporation. All rights reserved.
Use Delta Lake with streaming data
Use Delta Lake table as a streaming source
from pyspark.sql.types import *
from pyspark.sql.functions import *

stream_df = spark.readStream.format("delta").option("ignoreChanges", "true").load("/delta/internetorders")

stream_df.show()

© Copyright Microsoft Corporation. All rights reserved.

Use Delta Lake with streaming data
Use Delta Lake table as a streaming source
from pyspark.sql.types import *
from pyspark.sql.functions import *

stream_df = spark.readStream.format("delta").option("ignoreChanges", "true").load("/delta/internetorders")

stream_df.show()

Use Delta Lake table as a streaming sink

from pyspark.sql.types import *
from pyspark.sql.functions import *
#Define your schema if it's known
stream_df = (rather than relying on Spark to
spark.readStream.schema(jsonSchema).option("maxFilesPerTrigger", 1).json(inputPath) infer the schema) e.g.:

table_path = '/delta/devicetable’ jsonSchema =

StructType([StructField("time",
checkpoint_path = '/delta/checkpoint’ TimestampType(), True),
StructField("id", IntegerType(), True),
delta_stream = stream_df.writeStream.format("delta").option("checkpointLocation", StructField("value", StringType(), True)])
checkpoint_path).start(table_path)

Learn more >> Structured streaming
Use Delta Lake in a SQL pool
Until now, we’ve focused on working with Delta Lake tables using Spark.

In Azure Synapse Analytics, you can also work with Delta Lake tables using SQL in a SQL pool.

• The OPENROWSET function enables you to read the content of Delta Lake files by providing the URL
to your root folder.

Learn more >> Query Delta Lake files using serverless SQL pool in Azure Synapse Analytics

Use Delta Lake in a SQL pool
Until now, we’ve focused on working with Delta Lake tables using Spark.

In Azure Synapse Analytics, you can also work with Delta Lake tables using SQL in a SQL pool.

• The OPENROWSET function enables you to read the content of Delta Lake files by providing the URL
to your root folder.

Query delta table files using OPENROWSET

SELECT *
FROM
OPENROWSET(
BULK 'https://mystore.dfs.core.windows.net/files/delta/mytable/', Folder location
FORMAT = 'DELTA' Specifying DELTA format
) AS deltadata

Learn more >> Query Delta Lake files using serverless SQL pool in Azure Synapse Analytics

Use Delta Lake in a SQL pool

You can also use native SQL SELECT statements to query Delta Lake tables in the Spark metastore.

• By default, metastore tables are defined in a database named default; but you can create additional
databases in the metastore just as you can in SQL Server and query the tables they contain by using
the specific database name.

Query delta tables in Spark metastore databases

USE default;

SELECT * FROM MyDeltaTable;

Exercise: Use Delta Lake in Azure Synapse Analytics

Use the hosted lab environment

provided, or view the lab
instructions at the link below:

https://aka.ms/mslearn-delta-lake

Knowledge check
Which of the following descriptions best fits Delta Lake?
q A Spark API for exporting data from a relational database into CSV files
q A relational storage layer for Spark that supports tables based on Parquet files
q A synchronization solution that replicates data between SQL pools and Spark pools

You've loaded a Spark dataframe with data, that you now want to use in a Delta Lake table. What
format should you use to write the dataframe to storage?
q CSV
q PARQUET
q DELTA

What feature of Delta Lake enables you to retrieve data from previous versions of a table?
q Spark Structured Streaming
q Time Travel
q Catalog Tables

Perform data engineering with Azure Synapse Apache Spark Pools

https://aka.ms/mslearn-spark

What is Delta Lake? [A Good Read]

Delta Lake is an open-source data management system that runs on top

of Apache Spark. It aims to bring atomicity, consistency, isolation,
durability (ACID) transactions closer to the data lake, which was
previously challenging. Delta Lake improves the reliability, consistency,
and scalability of data lakes, making them suitable for use cases not
previously possible. Delta Lake also supports schema enforcement,
ensuring your data conforms to pre-defined schemas. This helps
maintain data quality and consistency, reducing the risk of errors and
inconsistencies in downstream processes.

Delta Lakes vs Data lakehouse >> A Good Read

© Copyright Microsoft Corporation. All rights reserved.
Key differences between Data Lakehouse and Delta Lake
1. Architecture: Data Lakehouse is a hybrid architecture that combines the best of data
lake and data warehouse capabilities. Delta Lake, on the other hand, is a data management
system running on Apache Spark.
2. Reliability: Although data lakes are highly scalable and flexible, they are not known for
their reliability. Delta Lake, on the other hand, adds ACID transactions to data lakes, making
them much more reliable.
3. Consistency: Data lakes are designed to store data in its raw form, which can make it
difficult to ensure consistency. Delta Lake supports schema enforcement, which ensures
your data conforms to a predefined schema, helping you maintain consistency and reduce
the risk of errors.
4. Processing: Data Lake Houses provides SQL-based interfaces that simplify data access
for analysts and data scientists. On the other hand, Delta Lake is designed to work with
Apache Spark, a powerful processing engine capable of handling large amounts of data
and complex analytics workloads.

Delta Lakes vs Data lakehouse >> A Good Read

Advanced Data Engineering With Databricks
No ratings yet
Advanced Data Engineering With Databricks
154 pages
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
From Everand
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
AJIT DASH
2/5 (2)
Data Lake Bootcamp: Building Reliable Data Lakes
No ratings yet
Data Lake Bootcamp: Building Reliable Data Lakes
29 pages
Example Copy of ETS Service Catalog Template
100% (1)
Example Copy of ETS Service Catalog Template
13 pages
Rules of Report Writing Self Study Answers PDF
No ratings yet
Rules of Report Writing Self Study Answers PDF
2 pages
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Analyzing Website Rankings and Recommendations To Improve Its Position
100% (1)
Analyzing Website Rankings and Recommendations To Improve Its Position
6 pages
Importing Huawei™ CHR Data For Analysis With Aexio's XEUS Pro
100% (1)
Importing Huawei™ CHR Data For Analysis With Aexio's XEUS Pro
5 pages
What is Delta Lake
No ratings yet
What is Delta Lake
3 pages
14_DeltaLake
No ratings yet
14_DeltaLake
72 pages
Delta Lake on Azure Databricks
No ratings yet
Delta Lake on Azure Databricks
18 pages
Mastering Delta Lake: Optimizing Data Lakes for Performance and Reliability
From Everand
Mastering Delta Lake: Optimizing Data Lakes for Performance and Reliability
Robert Johnson
No ratings yet
cloud2
No ratings yet
cloud2
3 pages
Apache Spark Week-5 PDF
No ratings yet
Apache Spark Week-5 PDF
9 pages
A Quick Technical Guide to Delta Lake
No ratings yet
A Quick Technical Guide to Delta Lake
10 pages
Databricks For The SQL Developer: Gerhard Brueckl
No ratings yet
Databricks For The SQL Developer: Gerhard Brueckl
40 pages
Databricks
No ratings yet
Databricks
81 pages
The Delta Lake Series Lakehouse 012921
100% (1)
The Delta Lake Series Lakehouse 012921
19 pages
Details of Delta Lake Tutorial
67% (3)
Details of Delta Lake Tutorial
43 pages
Databricks Delta tables
No ratings yet
Databricks Delta tables
60 pages
Databricks_Class_1_PPT
No ratings yet
Databricks_Class_1_PPT
8 pages
LakeHouse Architecture
No ratings yet
LakeHouse Architecture
23 pages
The Benefits of Delta Lake and Lakehouse Architecture
No ratings yet
The Benefits of Delta Lake and Lakehouse Architecture
3 pages
Lakehouse With Delta Lake Deep Dive
100% (2)
Lakehouse With Delta Lake Deep Dive
64 pages
001 Delta-Lake
No ratings yet
001 Delta-Lake
10 pages
Introduction to data lakes
No ratings yet
Introduction to data lakes
6 pages
Dev Ops
No ratings yet
Dev Ops
4 pages
Intro To Data Engineering Databricks Webinar 13may
No ratings yet
Intro To Data Engineering Databricks Webinar 13may
59 pages
AZURE_ETL__1741608374
No ratings yet
AZURE_ETL__1741608374
14 pages
Clase 2 A
No ratings yet
Clase 2 A
12 pages
Delta Lake
No ratings yet
Delta Lake
2 pages
Delta Table and Pyspark Interview Questions
100% (1)
Delta Table and Pyspark Interview Questions
14 pages
De Mod 3 Manage Data With Delta Lake
No ratings yet
De Mod 3 Manage Data With Delta Lake
16 pages
Delta Lake Most Asked Questions PDF
No ratings yet
Delta Lake Most Asked Questions PDF
3 pages
Concise Oracle Database For People Who Has No Time
From Everand
Concise Oracle Database For People Who Has No Time
Billy Aung Myint
No ratings yet
Delta Lake
No ratings yet
Delta Lake
12 pages
Delta Lake in Fintech: Enhancing Data Lake Reliability With Acid Transactions
No ratings yet
Delta Lake in Fintech: Enhancing Data Lake Reliability With Acid Transactions
11 pages
(English (Auto-Generated) ) Building End-to-End Delta Pipelines On GCP (DownSub - Com)
No ratings yet
(English (Auto-Generated) ) Building End-to-End Delta Pipelines On GCP (DownSub - Com)
24 pages
Datalakes
No ratings yet
Datalakes
18 pages
Data Engineering - Session 03
No ratings yet
Data Engineering - Session 03
26 pages
Welcome to the Age of $10_month Lakehouses
No ratings yet
Welcome to the Age of $10_month Lakehouses
29 pages
Open Table Format - Delta Lake
No ratings yet
Open Table Format - Delta Lake
10 pages
Databricks Unity Catalog - Jan 2024
No ratings yet
Databricks Unity Catalog - Jan 2024
55 pages
DeltaLake Databricks
No ratings yet
DeltaLake Databricks
5 pages
8 Steps For A Developer To Learn Apache Spark and Delta Lake PDF
No ratings yet
8 Steps For A Developer To Learn Apache Spark and Delta Lake PDF
35 pages
O Reilly Data Lake Bootcamp Day 11694182865124
No ratings yet
O Reilly Data Lake Bootcamp Day 11694182865124
46 pages
Mastering Apache Iceberg: Managing Big Data in a Modern Data Lake
From Everand
Mastering Apache Iceberg: Managing Big Data in a Modern Data Lake
Robert Johnson
No ratings yet
??????? ???????? ????? ???? ????? ????
No ratings yet
??????? ???????? ????? ???? ????? ????
57 pages
Introduction to Oracle Database Administration
From Everand
Introduction to Oracle Database Administration
Ying Wang
5/5 (1)
Introduction to Microsoft SQL Server
From Everand
Introduction to Microsoft SQL Server
Eric Frick
No ratings yet
2019C2 - Data Lakes Ebook
No ratings yet
2019C2 - Data Lakes Ebook
37 pages
2019C2 - Data Lakes Ebook
No ratings yet
2019C2 - Data Lakes Ebook
37 pages
Cloud Training
No ratings yet
Cloud Training
14 pages
Comparison_of_Data_Lakes_and_Delta_Lakes
No ratings yet
Comparison_of_Data_Lakes_and_Delta_Lakes
2 pages
Data Lake
No ratings yet
Data Lake
26 pages
Data Engg
No ratings yet
Data Engg
19 pages
Data Governance On Unity Catalog - Jul 2024
No ratings yet
Data Governance On Unity Catalog - Jul 2024
56 pages
On Data Lake Architectures Andmetadata Management
No ratings yet
On Data Lake Architectures Andmetadata Management
24 pages
s10844-020-00608-7
No ratings yet
s10844-020-00608-7
24 pages
Alex Gorelik What Is A Data Lake OReilly Media Inc. 2020
No ratings yet
Alex Gorelik What Is A Data Lake OReilly Media Inc. 2020
82 pages
Comparison_of_Data_Lakes_and_Delta_Lakes
No ratings yet
Comparison_of_Data_Lakes_and_Delta_Lakes
2 pages
Database Datalake
No ratings yet
Database Datalake
2 pages
Introduction To Data Lake
No ratings yet
Introduction To Data Lake
1 page
Advanced Data Lakehouse Concepts_new
No ratings yet
Advanced Data Lakehouse Concepts_new
25 pages
Create A Lake Database
No ratings yet
Create A Lake Database
12 pages
SMT - Procurement Lesson 11
No ratings yet
SMT - Procurement Lesson 11
44 pages
Assignment 3 - Logistics
No ratings yet
Assignment 3 - Logistics
2 pages
Transportation - Assignment - 2
No ratings yet
Transportation - Assignment - 2
2 pages
Introduction To Car Loan
No ratings yet
Introduction To Car Loan
39 pages
Transportation Assignment-1
No ratings yet
Transportation Assignment-1
3 pages
Mahindra Finance Education Loan
100% (1)
Mahindra Finance Education Loan
21 pages
worksheet 3.1 (1)
No ratings yet
worksheet 3.1 (1)
7 pages
TTL 1 - Activity 5-6 Roble 1
No ratings yet
TTL 1 - Activity 5-6 Roble 1
3 pages
FTTH ODN - Optical Fiber Networks Overview
100% (1)
FTTH ODN - Optical Fiber Networks Overview
35 pages
How To Mount and Unmount File System
No ratings yet
How To Mount and Unmount File System
8 pages
Quick Configuration Guide For The ILM Store: Version For Microsoft Azure
No ratings yet
Quick Configuration Guide For The ILM Store: Version For Microsoft Azure
32 pages
10+ Proven It Consultant Interview Questions (+answers)
No ratings yet
10+ Proven It Consultant Interview Questions (+answers)
5 pages
1335862
No ratings yet
1335862
8 pages
1.1 Laravel PDF
No ratings yet
1.1 Laravel PDF
10 pages
Top 25 .NET Core Interview Questions and Answers - InterviewPrep
No ratings yet
Top 25 .NET Core Interview Questions and Answers - InterviewPrep
15 pages
Now You Can Find Us at Quiz Oracle
No ratings yet
Now You Can Find Us at Quiz Oracle
35 pages
Sony Sheet TV Repair
No ratings yet
Sony Sheet TV Repair
1 page
Design and Applications of Emerging Computer Systems 1st Edition Weiqiang Liu - The latest ebook is available, download it today
No ratings yet
Design and Applications of Emerging Computer Systems 1st Edition Weiqiang Liu - The latest ebook is available, download it today
68 pages
K_IPMD-EIA-748-Refresh-IPMD-Update-April-2017
No ratings yet
K_IPMD-EIA-748-Refresh-IPMD-Update-April-2017
7 pages
ANT Device Profile Heart Rate Monitor
No ratings yet
ANT Device Profile Heart Rate Monitor
21 pages
Assignment 2 A172
No ratings yet
Assignment 2 A172
19 pages
ch01 1
No ratings yet
ch01 1
16 pages
Kallam Haranadhareddy Institute of Technology: Presentation By: Modukuri John Jaya Prakash 188X1A0524
No ratings yet
Kallam Haranadhareddy Institute of Technology: Presentation By: Modukuri John Jaya Prakash 188X1A0524
18 pages
e Hospital
No ratings yet
e Hospital
11 pages
Inrico Products Catalog
No ratings yet
Inrico Products Catalog
22 pages
Kendra DG
No ratings yet
Kendra DG
517 pages
Concurrent License Administrator's Guide: IBM SPSS Statistics
No ratings yet
Concurrent License Administrator's Guide: IBM SPSS Statistics
28 pages
Advanced Algorithms Lab Manual
No ratings yet
Advanced Algorithms Lab Manual
19 pages
Com 2008 Iss27 Rozar
No ratings yet
Com 2008 Iss27 Rozar
6 pages
Excel Macro Save PDF File Name
No ratings yet
Excel Macro Save PDF File Name
2 pages
Basic Configuration OLT
No ratings yet
Basic Configuration OLT
30 pages
Lists in HTML
No ratings yet
Lists in HTML
6 pages
7410mkii - 7420mkii Manual
No ratings yet
7410mkii - 7420mkii Manual
29 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Use Delta Lake in Azure Synapse Analytics

Uploaded by

Use Delta Lake in Azure Synapse Analytics

Uploaded by

Perform data engineering

with Azure Synapse Apache

© Copyright Microsoft Corporation. All rights reserved.

Use Delta Lake in Azure Synapse Analytics

© Copyright Microsoft Corporation. All rights reserved.

© Copyright Microsoft Corporation. All rights reserved.

© Copyright Microsoft Corporation. All rights reserved.

© Copyright Microsoft Corporation. All rights reserved.

© Copyright Microsoft Corporation. All rights reserved.

© Copyright Microsoft Corporation. All rights reserved.

© Copyright Microsoft Corporation. All rights reserved.

© Copyright Microsoft Corporation. All rights reserved.

Create a Delta Lake table from a dataframe

Make conditional updates

from delta.tables import *

Learn more >> Table deletes, updates, and merges

© Copyright Microsoft Corporation. All rights reserved.

From the previous slides:

© Copyright Microsoft Corporation. All rights reserved.

• Spark supports two kinds of catalog tables for delta lake.

© Copyright Microsoft Corporation. All rights reserved.

© Copyright Microsoft Corporation. All rights reserved.

Key difference between managed and internal tables

© Copyright Microsoft Corporation. All rights reserved.

Key difference between managed and internal tables

© Copyright Microsoft Corporation. All rights reserved.

spark.sql("CREATE TABLE MyExternalTable USING DELTA LOCATION '/mydata'")

© Copyright Microsoft Corporation. All rights reserved.

Learn more >> Table streaming reads and writes

Learn more >> Table streaming reads and writes

Learn more >> Table streaming reads and writes

Learn more >> Table streaming reads and writes

stream_df = spark.readStream.format("delta").option("ignoreChanges", "true").load("/delta/internetorders")

© Copyright Microsoft Corporation. All rights reserved.

stream_df = spark.readStream.format("delta").option("ignoreChanges", "true").load("/delta/internetorders")

Use Delta Lake table as a streaming sink

table_path = '/delta/devicetable’ jsonSchema =

© Copyright Microsoft Corporation. All rights reserved.

© Copyright Microsoft Corporation. All rights reserved.

Query delta table files using OPENROWSET

© Copyright Microsoft Corporation. All rights reserved.

Query delta tables in Spark metastore databases

SELECT * FROM MyDeltaTable;

© Copyright Microsoft Corporation. All rights reserved.

Use the hosted lab environment

© Copyright Microsoft Corporation. All rights reserved.

© Copyright Microsoft Corporation. All rights reserved.

Perform data engineering with Azure Synapse Apache Spark Pools

© Copyright Microsoft Corporation. All rights reserved.

Delta Lake is an open-source data management system that runs on top

Delta Lakes vs Data lakehouse >> A Good Read

Delta Lakes vs Data lakehouse >> A Good Read

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.