Incremental Data Ingestion From Files
Incremental Data Ingestion From Files
u COPY INTO
u Auto Loader
u 2 mechanisms:
u COPY INTO
u Auto loader
u SQL command
u Structured Streaming
u Exactly-once guarantees
u Fault tolerance
spark.readStream
.format("cloudFiles")
.option("cloudFiles.format", <source_format>)
.load('/path/to/files’)
.writeStream
.option("checkpointLocation", <checkpoint_directory>)
.table(<table_name>)
spark.readStream
.format("cloudFiles")
.option("cloudFiles.format", <source_format>)
.option("cloudFiles.schemaLocation", <schema_directory>)
.load('/path/to/files’)
.writeStream
.option("checkpointLocation", <checkpoint_directory>)
.option("mergeSchema", “true”)
.table(<table_name>)
Derar Alhussein © Udemy | Databricks Certified Data Engineer Associate - Preparation
COPY INTO vs. Auto Loader