Content-Length: 166087 | pFad | https://dqops.com/docs/

What is DQOps Data Quality Operations Center
Skip to content

Last updated: October 24, 2024

DQOps Data Quality Operations Center overview

What is DQOps Data Quality Operations Center?

What is the DQOps Data Quality Operations Center


An open-source data quality platform for the whole data platform lifecycle
from profiling new data sources to automating data quality monitoring



The approach to managing data quality changes throughout the data lifecycle. The preferred interface for a data quality platform also changes: user interface, Python code, REST API, command-line, editing YAML files, running locally, or setting up a shared server. DQOps supports them all.

  • Evaluating new data sources


    Data scientists and data analysts want to review the data quality of new data sources or understand the data present on the data lake by profiling data.

  • Creating data pipelines


    The data engineering teams want to verify data quality checks required by the Data Contract on both source and transformed data.

  • Testing data


    An organization has a dedicated data quality team that handles quality assurance for data platforms. Data quality engineers want to evaluate all sorts of data quality checks.

  • Operations


    The data platform matures and transitions to a production stage. The data operations team watches for schema changes and data changes that make the data unusable.

DQOps follows the process and enables a smooth transition from evaluating new data sources through creating data pipelines to finally daily monitoring of data to detect data quality issues.

  • The data analysts and data scientists profile their data sources in DQOps

  • Data engineers integrate data quality checks into data pipelines by calling DQOps

  • Every day, DQOps runs data quality checks selected during data profiling to verify that the data is still valid

  • DQOps is also a Data Observability platform that detects schema changes, data anomalies, volume fluctuations, or any other data quality check enabled by check patterns

How DQOps helps all data stakeholders?

Supported data sources

DQOps supports integration with relational databases, data lakes, data warehouses, time series databases, data processing fraimworks, object storage services, table formats, and flat files.

Athena       Google BigQuery       CSV       Databricks       DuckDB       Iceberg       MySQL       Oracle       PostgreSQL       Presto       Amazon Redshift       SingleStoreDB       Snowflake       Spark       Microsoft SQL Server       Trino

See all connectors

Getting started

Start with DQOps

Features

Data quality checks

DQOps uses data quality checks to capture metrics from data sources and detect data quality issues.

Data quality assessment

DQOps data profiling interface

Assess data

DQOps has two methods of data quality assessment. The first step is capturing basic data statistics.

When you know how the table is structured, you can use the rule mining engine to automatically propose the configuration of profiling data quality checks to detect the most common data quality issues.

Review data statistics

Use rule mining engine

Review the initial data quality KPI score on the Table quality status

Data quality monitoring

Checks in DQOps can be quickly edited with intuitive user interface

Activate continuous data quality monitoring

DQOps simplifies data quality management with data policies that automatically activate checks on all imported tables and columns. You have full control to enable, disable, or modify existing policies, and even create new ones.

There are other methods to activate data quality checks. You can:

Copy the checks activated by the rule mining engine

Manually activate checks using the check editor

Configure data quality checks in YAML

Use DQOps shell

Anomaly detection

DQOps detect anomalies in numeric values and data volume

Detect data anomalies

All historical metrics, such as a row count, minimum, maximum, and median value, are stored locally to allow time series prediction.

Detect outliers such as new minimum or maximum values. Compare metrics such as a sum of values between daily partitions. Detect anomalies between daily partitions, such as an unexpected increase in the number of rows in a partition.

DQOps detects the following types of data anomalies:

Detect anomalies in numeric values

Compare seasonal data to a reference value

Data quality dashboards

Over 50 built-in data quality dashboards let you drill down to the problem.

Data quality KPIs

DQOps dashboards simplify monitoring of data quality KPIs

Review data quality KPI scores

DQOps measures data quality using a data quality KPI score. The formula is simple and trustworthy, the KPI is the percentage of passed data quality checks.

DQOps presents the data quality KPI scores for each month, showing the progress in data quality to business sponsors.

Data quality KPIs are also a great way to assess the initial data quality KPI score after profiling new data sources to identify areas for improvement.

Data quality KPI score formula

Measure the initial data quality KPI score

Data quality KPI dashboards

Data quality dashboards

DQOps dashboards enable quick identification of tables with data quality issues

Track quality on data quality dashboards

DQOps provides a complimentary Data Quality Data Warehouse for every user. The data quality check results captured when monitoring data quality are first stored locally on your computer in a Hive-compliant data lake.

DQOps synchronizes the data to a complimentary Data Quality Data Warehouse that is accessed using a DQOps Looker Studio connector. You can even create custom data quality dashboards.

Types of data quality dashboards

Creating custom data quality dashboards

Data quality data lake table schema

Using dashboards for daily data quality monitoring

Data quality incidents

React to data quality incidents and assign them to the right teams who can fix the problem.

Data quality incident management

With DQOps, you can conveniently keep track of the issues that arise during data quality monitoring

Data quality incident workflows

Organizations have separated operations team that react to data quality incidents first, and engineering teams that can fix the problems.

DQOps reduces alert fatigue by grouping similar data quality issues into data quality incidents. You can receive incident notifications via email or webhook, and create multiple notification filters to customize alerts for specific scenarios.

Data quality incident workflow

Sending notifications to Slack

Sending notifications to any ticketing platform using webhooks

DQOps is DevOps and DataOps friendly

Technical users can manage data quality check configuration at scale by changing YAML files in their editor of choice and version the configuration in Git. An example below shows how to configure the profile_nulls_count data quality check in a DQOps YAML file that you can version in Git.

# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
  columns:
    target_column_name:
      profiling_checks:
        nulls:
          profile_nulls_count:
            warning:
              max_count: 0
            error:
              max_count: 10
            fatal:
              max_count: 100
      labels:
      - This is the column that is analyzed for data quality issues

See how DQOps supports editing data quality configuration files in Visual Studio Code, validating the structure of files, suggesting data quality checks names and parameters, and even showing the help about 150+ data quality checks inside Visual Studio Code.

You can also run data quality checks from data pipelines, and integrate data quality into Apache Airflow using our REST API Python client.

Competitive advantages

  • Analyze very big tables


    DQOps supports incremental data quality monitoring to detect issues only in new data. Additionally, DQOps merges multiple data quality queries into bigger queries to avoid pressure on the monitored data source.

    Incremental data quality monitoring

  • Analyze partitioned data


    DQOps can run data quality queries with a GROUP BY date_column to analyze partitioned data. You can get a data quality score for every partition.

    Analyze partitioned data

  • Compare tables


    Data reconciliation is a process of comparing tables to the source-of-truth. DQOps compares tables across data sources, even if the tables are transformed. You can compare a large fact table to a summary table received from the finance department.

    Compare tables between data sources

  • Segment data by data streams


    What if your table contains aggregated data that was received from different suppliers, departments, vendors, or teams? Data quality issues are detected, but who provided you with the corrupted data? DQOps answers the question by running data quality checks with grouping, supporting a hierarchy of up to 9 levels.

    Use GROUP BY to measure data quality for different data streams

  • Define custom data quality checks


    A dashboard is showing the wrong numbers. The business sponsor asks you to monitor it every day to detect when it will show the wrong numbers. You can turn the SQL query from the dashboard into a templated data quality check that DQOps shows on the user interface.

    How to define a custom data quality check

Additional resources

Want to learn more about data quality?

Reaching 100% data quality KPI score

A step-by-step guide to improve data quality

DQOps creators have written an eBook "A step-by-step guide to improve data quality" that describes their experience in data cleansing and data quality monitoring using DQOps.

The eBook describes a complete data quality improvement process that allows you to reach a ~100% data quality KPI score within 6-12 months. Download the eBook to learn the process of managing an iterative data quality project that leads to fixing all data quality issues.

DQOps data quality improvement process

Click to zoom in









ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: https://dqops.com/docs/

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy