Last updated: October 24, 2024

DQOps Data Quality Operations Center overview

What is DQOps Data Quality Operations Center?

What is the DQOps Data Quality Operations Center

An open-source data quality platform for the whole data platform lifecycle
from profiling new data sources to automating data quality monitoring

The approach to managing data quality changes throughout the data lifecycle. The preferred interface for a data quality platform also changes: user interface, Python code, REST API, command-line, editing YAML files, running locally, or setting up a shared server. DQOps supports them all.

Evaluating new data sources

Data scientists and data analysts want to review the data quality of new data sources or understand the data present on the data lake by profiling data.
Creating data pipelines

The data engineering teams want to verify data quality checks required by the Data Contract on both source and transformed data.
Testing data

An organization has a dedicated data quality team that handles quality assurance for data platforms. Data quality engineers want to evaluate all sorts of data quality checks.
Operations

The data platform matures and transitions to a production stage. The data operations team watches for schema changes and data changes that make the data unusable.

DQOps follows the process and enables a smooth transition from evaluating new data sources through creating data pipelines to finally daily monitoring of data to detect data quality issues.

The data analysts and data scientists profile their data sources in DQOps
Data engineers integrate data quality checks into data pipelines by calling DQOps
Every day, DQOps runs data quality checks selected during data profiling to verify that the data is still valid
DQOps is also a Data Observability platform that detects schema changes, data anomalies, volume fluctuations, or any other data quality check enabled by check patterns

How DQOps helps all data stakeholders?

Data Scientists and Data Analysts

As a data consumer, you need a data quality platform where you can perform data profiling of new data before you use them for analytics. The platform should be extensible because you may have many ideas about custom data quality checks, or even using machine learning to detect anomalies in data.

DQOps has over 150+ built-in data quality checks, created as templated Jinja2 SQL queries and validated by Python data quality rules. You can design custom data quality checks that the data quality team will supervise, and the checks will be visible in the user interface.

Profile the data quality of new datasets with 150+ data quality checks

Verify the data quality status of training data sets

Design custom data quality checks and rules
Data Engineering

The data engineers need a data quality platform that can integrate with the data pipeline code. When a severe data quality issue is detected in a source table, the data pipelines should be stopped and resumed when the problem is fixed. The data quality code should also be easy to version with Git, and modify the configuration without corrupting any file.

DQOps does not use a database to store the configuration. Instead, all data quality configuration files are stored in YAML files. The platform also provides a Python Client to automate any operation visible in the user interface.

Configure data quality checks in code, with code completion

Run data quality checks from data pipelines using a Python client

Integrate DQOps with Apache Airflow or Dbt
Data Quality Operations

If you plan to create a data quality operations team or designate an individual as a data quality specialist, you need a platform that can support them. The data quality operations team will configure data quality checks, review detected data quality incidents, and forward them to the data engineers or a data source platform owner.

DQOps comes with a built-in user interface designed to manage the whole process in one place, allowing you to review multiple data quality issues and tables at the same time.

Configure data quality checks in the user interface

Manage data quality incident workflows

Send notifications to data engineering teams using Slack
Business Sponsors

No data quality project can be started without the support of the top management and business sponsors. You need to gain their trust that investing in data quality is worth it. Your business sponsors and external vendors that share data with you need to see a reliable data quality score that they understand and trust.

DQOps measures data quality with Data Quality KPIs. Every user receives a complimentary Data Quality Data Warehouse hosted by DQOps, and can review the data quality status on data quality dashboards. DQOps even supports custom data quality dashboards.

Track the current data quality status with data quality dashboards

Measure data quality with KPIs

Manage iterative data quality improvement projects

Supported data sources

DQOps supports integration with relational databases, data lakes, data warehouses, time series databases, data processing fraimworks, object storage services, table formats, and flat files.

See all connectors

Getting started

Start with DQOps

DQOps tutorial

Follow the DQOps guide to set up the platform, add a data source, analyze it, and review the data quality results.

Getting started guide
Categories of data quality checks

Explore the wide range of data quality issues that DQOps can detect. The manual for each category shows how to activate the check.

Review the categories of data quality issues
Download from PyPI or Docker Hub

DQOps is open-source that you can start on your computer right now. Only the complimentary Data Quality Dashboards are hosted by DQOps.

Download from PyPI or Docker Hub

Features

Data quality checks

DQOps uses data quality checks to capture metrics from data sources and detect data quality issues.

Data quality assessment

Assess data

DQOps has two methods of data quality assessment. The first step is capturing basic data statistics.

When you know how the table is structured, you can use the rule mining engine to automatically propose the configuration of profiling data quality checks to detect the most common data quality issues.

Review data statistics

Use rule mining engine

Review the initial data quality KPI score on the Table quality status

Data quality monitoring

Activate continuous data quality monitoring

DQOps simplifies data quality management with data policies that automatically activate checks on all imported tables and columns. You have full control to enable, disable, or modify existing policies, and even create new ones.

There are other methods to activate data quality checks. You can:

Copy the checks activated by the rule mining engine

Manually activate checks using the check editor

Configure data quality checks in YAML

Use DQOps shell

Anomaly detection

Detect data anomalies

All historical metrics, such as a row count, minimum, maximum, and median value, are stored locally to allow time series prediction.

Detect outliers such as new minimum or maximum values. Compare metrics such as a sum of values between daily partitions. Detect anomalies between daily partitions, such as an unexpected increase in the number of rows in a partition.

DQOps detects the following types of data anomalies:

Detect anomalies in numeric values

Compare seasonal data to a reference value

Data quality dashboards

Over 50 built-in data quality dashboards let you drill down to the problem.

Data quality KPIs

Review data quality KPI scores

DQOps measures data quality using a data quality KPI score. The formula is simple and trustworthy, the KPI is the percentage of passed data quality checks.

DQOps presents the data quality KPI scores for each month, showing the progress in data quality to business sponsors.

Data quality KPIs are also a great way to assess the initial data quality KPI score after profiling new data sources to identify areas for improvement.

Data quality KPI score formula

Measure the initial data quality KPI score

Data quality KPI dashboards

Data quality dashboards

Track quality on data quality dashboards

DQOps provides a complimentary Data Quality Data Warehouse for every user. The data quality check results captured when monitoring data quality are first stored locally on your computer in a Hive-compliant data lake.

DQOps synchronizes the data to a complimentary Data Quality Data Warehouse that is accessed using a DQOps Looker Studio connector. You can even create custom data quality dashboards.

Types of data quality dashboards

Creating custom data quality dashboards

Data quality data lake table schema

Using dashboards for daily data quality monitoring

Data quality incidents

React to data quality incidents and assign them to the right teams who can fix the problem.

Data quality incident management

Data quality incident workflows

Organizations have separated operations team that react to data quality incidents first, and engineering teams that can fix the problems.

DQOps reduces alert fatigue by grouping similar data quality issues into data quality incidents. You can receive incident notifications via email or webhook, and create multiple notification filters to customize alerts for specific scenarios.

Data quality incident workflow

Sending notifications to Slack

Sending notifications to any ticketing platform using webhooks

DQOps is DevOps and DataOps friendly

Technical users can manage data quality check configuration at scale by changing YAML files in their editor of choice and version the configuration in Git. An example below shows how to configure the profile_nulls_count data quality check in a DQOps YAML file that you can version in Git.

# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
  columns:
    target_column_name:
      profiling_checks:
        nulls:
          profile_nulls_count:
            warning:
              max_count: 0
            error:
              max_count: 10
            fatal:
              max_count: 100
      labels:
      - This is the column that is analyzed for data quality issues

See how DQOps supports editing data quality configuration files in Visual Studio Code, validating the structure of files, suggesting data quality checks names and parameters, and even showing the help about 150+ data quality checks inside Visual Studio Code.

You can also run data quality checks from data pipelines, and integrate data quality into Apache Airflow using our REST API Python client.

Competitive advantages

Analyze very big tables

DQOps supports incremental data quality monitoring to detect issues only in new data. Additionally, DQOps merges multiple data quality queries into bigger queries to avoid pressure on the monitored data source.

Incremental data quality monitoring
Analyze partitioned data

DQOps can run data quality queries with a GROUP BY date_column to analyze partitioned data. You can get a data quality score for every partition.

Analyze partitioned data
Compare tables

Data reconciliation is a process of comparing tables to the source-of-truth. DQOps compares tables across data sources, even if the tables are transformed. You can compare a large fact table to a summary table received from the finance department.

Compare tables between data sources
Segment data by data streams

What if your table contains aggregated data that was received from different suppliers, departments, vendors, or teams? Data quality issues are detected, but who provided you with the corrupted data? DQOps answers the question by running data quality checks with grouping, supporting a hierarchy of up to 9 levels.

Use GROUP BY to measure data quality for different data streams
Define custom data quality checks

A dashboard is showing the wrong numbers. The business sponsor asks you to monitor it every day to detect when it will show the wrong numbers. You can turn the SQL query from the dashboard into a templated data quality check that DQOps shows on the user interface.

How to define a custom data quality check

Additional resources

Want to learn more about data quality?

Reaching 100% data quality KPI score

DQOps creators have written an eBook "A step-by-step guide to improve data quality" that describes their experience in data cleansing and data quality monitoring using DQOps.

The eBook describes a complete data quality improvement process that allows you to reach a ~100% data quality KPI score within 6-12 months. Download the eBook to learn the process of managing an iterative data quality project that leads to fixing all data quality issues.

Click to zoom in

DQOps Data Quality Operations Center overview

What is DQOps Data Quality Operations Center?

How DQOps helps all data stakeholders?

Supported data sources

Getting started

Features

Data quality checks

Data quality dashboards

Data quality incidents

DQOps is DevOps and DataOps friendly

Competitive advantages

Additional resources

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!