0% found this document useful (0 votes)
144 views

Azure Data Factory Monitoring Best Practices

Uploaded by

Vijai b
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
144 views

Azure Data Factory Monitoring Best Practices

Uploaded by

Vijai b
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Azure Data Factory Monitoring

Azure Data Factory monitoring Best practices

• As best practice for the Azure Data factory monitoring, log needs to be captured systematically.
• By default, Azure kept the logs for pipeline run maximum up to 45 days. Hence after 45 days your
adf logs will not be accessible thereafter.
• Configure your diagnostic logs to a storage account for auditing or manual inspection. You can
use the diagnostic settings to specify the retention time in days.
• We should configure the log analytics workspace to analyze the logs using the queries.
• Add the Azure Data factory Monitoring service pack from the Azure Marketplace.
(https://azuremarketplace.microsoft.com/en-us/marketplace/apps/microsoft.azuredatafactoryanaly
tics?tab=overview)
• It will provide a one click monitoring solution across the data factories. It has a built-in dashboard
as well for quick access to adf log metrics.
Main Types of Alerts

• Metric Alerts
• Metric alert rules are specified with a target criteria for a metric within the resource to be monitored. Based
on the condition, notifications are sent to an action group when the alert rule fires.

• Here are a few more attributes of metric alerts:


• azure data factory alerts -
• Monitoring and alerting happens for the current state of time.
• Mostly based on performance, usage, and status (Success/Failure/Cancelled) driven metrics.
• Can be checked from the Metrics section in the Azure portal.
Log Analytics Alerts

• These alerts are triggered based on log searches that are automatically run at periodic internals. Advanced
alerting for non-fatal errors, warnings, and business logic errors can be created in Azure Monitor and Log
Analytics.
• A few more attributes of log analytics alerts are:
• Provides long term storage of logs (default ADF logging period is 45 days) which can enable more
sophisticated analytics.
• For example, trending analysis using historical comparisons of pipeline performance and activities.
• Ability to merge various metrics and observe the relationship between them.
• Able to analyze all types of logs including any custom logs written for a specific business case.
• Can be checked from the Log Analytics workspace in the Azure portal.
Azure Data Factory Alerts

• We can implement native ADF Alerts as part of an Azure implementation for a client.
• We can create two alert rules: one to monitor pipeline failures and the other for trigger failures.

• Metric: Failed Pipeline Run


• Severity: 0 to 4
• Dimension: Select the Pipelines and Failure Type to be associated
• Alert Logic: Greater than or Equal to 1(threshold count) based on Count aggregation
• Evaluation Period: Over the last 1 min
• Frequency: Every 1 min
• Notification: Configure Email/SMS/Voice and use an action group for notifications to be sent
Approach
• Enabling Azure Monitor plus Log Analytics
• Developing Azure Data Factory monitoring tool with SDK
• https://docs.microsoft.com/en-us/azure/data-factory/monitor-progra
mmatically

• https://www.bluegranite.com/blog/monitoring-azure-data-factory-v2-
using-power-bi
Custom logs of your data pipelines and how to build a Data Catalog

• ID: Id of the log. This will be an auto generated value (See Constraint).


• ID_TRACK_PROCESSING: Id (in track_processing table) of the table to ingest that triggered the execution of the job.
• SCHEMA_NAME & TABLE_NAME: Schema and table name of the table being inserted/processed.
• PRIMARY_KEYS: In case that the table has Primary Keys and these are being used to perform the Merge.
• STATUS: Process status (Success or Failed).
• RUN_DT: Timestamp of when the job was started.
• TIME_TAKEN: Time needed by the job to finish.
• CREATED_BY_ID: To identify the tool that created the log (Azure Data Factory in our example).
• CREATED_TS: Timestamp of when the log was created.
• DATABRICKS_JOB_URL: URL in which the code and stages of every step of the execution can be found.
• DATAFACTORY_JOB_URL: URL of the ADF pipeline that identified the job as finished.
• LAST_DSTS: Latest timestamp of the table.
• LIVE_ROWS: Number of rows of the table after the execution of the job.
• REPLICATION_ROWS: Number of rows inserted/processed in the latest execution (If FULL LOAD, it will be equal than
LIVE_ROWS).
• COLUMNS: Structure (column names and types) of the table after the ingestion job.
Data Catalog
Pricing
• https://azure.microsoft.com/en-in/pricing/details/monitor/

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy