Last updated: January 14, 2025
DQOps YAML file definitions
The definition of YAML files used by DQOps to configure the data sources, monitored tables, and the configuration of activated data quality checks.
ConnectionYaml
Connection definition for a data source connection that is covered by data quality checks.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
api_version |
DQOps YAML schema version | string | dqo/v1 | ||
kind |
File type | enum | source table sensor provider_sensor rule check settings file_index connection_similarity_index dashboards default_schedules default_checks default_table_checks default_column_checks default_notifications |
source | |
spec |
Connection specification object with the connection parameters to the data source | ConnectionSpec |
ConnectionSpec
Data source (connection) specification.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
provider_type |
Database provider type (required). | enum | bigquery clickhouse databricks db2 duckdb hana mariadb mysql oracle postgresql presto questdb redshift snowflake spark sqlserver teradata trino |
||
bigquery |
BigQuery connection parameters. Specify parameters in the bigquery section. | BigQueryParametersSpec | |||
snowflake |
Snowflake connection parameters. Specify parameters in the snowflake section or set the url (which is the Snowflake JDBC url). | SnowflakeParametersSpec | |||
postgresql |
PostgreSQL connection parameters. Specify parameters in the postgresql section or set the url (which is the PostgreSQL JDBC url). | PostgresqlParametersSpec | |||
duckdb |
DuckDB connection parameters. Specify parameters in the duckdb section or set the url (which is the DuckDB JDBC url). | DuckdbParametersSpec | |||
redshift |
Redshift connection parameters. Specify parameters in the redshift section or set the url (which is the Redshift JDBC url). | RedshiftParametersSpec | |||
sqlserver |
SQL Server connection parameters. Specify parameters in the sqlserver section or set the url (which is the SQL Server JDBC url). | SqlServerParametersSpec | |||
presto |
Presto connection parameters. Specify parameters in the presto section or set the url (which is the Presto JDBC url). | PrestoParametersSpec | |||
trino |
Trino connection parameters. Specify parameters in the trino section or set the url (which is the Trino JDBC url). | TrinoParametersSpec | |||
mysql |
MySQL connection parameters. Specify parameters in the mysql section or set the url (which is the MySQL JDBC url). | MysqlParametersSpec | |||
oracle |
Oracle connection parameters. Specify parameters in the oracle section or set the url (which is the Oracle JDBC url). | OracleParametersSpec | |||
spark |
Spark connection parameters. Specify parameters in the spark section or set the url (which is the Spark JDBC url). | SparkParametersSpec | |||
databricks |
Databricks connection parameters. Specify parameters in the databricks section or set the url (which is the Databricks JDBC url). | DatabricksParametersSpec | |||
hana |
HANA connection parameters. Specify parameters in the hana section or set the url (which is the HANA JDBC url). | HanaParametersSpec | |||
db2 |
DB2 connection parameters. Specify parameters in the db2 section or set the url (which is the DB2 JDBC url). | Db2ParametersSpec | |||
mariadb |
MariaDB connection parameters. Specify parameters in the mariadb section or set the url (which is the MariaDB JDBC url). | MariaDbParametersSpec | |||
clickhouse |
ClickHouse connection parameters. Specify parameters in the clickhouse section or set the url (which is the ClickHouse JDBC url). | ClickHouseParametersSpec | |||
questdb |
QuestDB connection parameters. Specify parameters in the questdb section or set the url (which is the QuestDB JDBC url). | QuestDbParametersSpec | |||
teradata |
Teradata connection parameters. Specify parameters in the teradata section or set the url (which is the Teradata JDBC url). | TeradataParametersSpec | |||
parallel_jobs_limit |
The concurrency limit for the maximum number of parallel SQL queries executed on this connection. | integer | |||
default_grouping_configuration |
Default data grouping configuration for all tables. The configuration may be overridden on table, column and check level. Data groupings are configured in two cases: (1) the data in the table should be analyzed with a GROUP BY condition, to analyze different datasets using separate time series, for example a table contains data from multiple countries and there is a 'country' column used for partitioning. a static dimension is assigned to a table, when the data is partitioned at a table level (similar tables store the same information, but for different countries, etc.). (2) a static dimension is assigned to a table, when the data is partitioned at a table level (similar tables store the same information, but for different countries, etc.). | DataGroupingConfigurationSpec | |||
schedules |
Configuration of the job scheduler that runs data quality checks. The scheduler configuration is divided into types of checks that have different schedules. | CronSchedulesSpec | |||
auto_import_tables |
Configuration of CRON schedule used to automatically import new tables in regular intervals. | AutoImportTablesSpec | |||
schedule_on_instance |
Limits running scheduled checks (started by a CRON job scheduler) to run only on a named DQOps instance. When this field is empty, data quality checks are run on all DQOps instances. Set a DQOps instance name to run checks on a named instance only. The default name of the DQOps Cloud SaaS instance is "cloud". | string | |||
incident_grouping |
Configuration of data quality incident grouping. Configures how failed data quality checks are grouped into data quality incidents. | ConnectionIncidentGroupingSpec | |||
comments |
Comments for change tracking. Please put comments in this collection because YAML comments may be removed when the YAML file is modified by the tool (serialization and deserialization will remove non tracked comments). | CommentsListSpec | |||
labels |
Custom labels that were assigned to the connection. Labels are used for searching for tables when filtered data quality checks are executed. | LabelSetSpec | |||
advanced_properties |
A dictionary of advanced properties that can be used for e.g. to support mapping data to data catalogs, a key/value dictionary. | Dict[string, string] |
BigQueryParametersSpec
BigQuery connection parameters.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
source_project_id |
Source GCP project ID. This is the project that has datasets that will be imported. | string | |||
jobs_create_project |
Configures the way how to select the project that will be used to start BigQuery jobs and will be used for billing. The user/service identified by the credentials must have bigquery.jobs.create permission in that project. | enum | create_jobs_in_source_project create_jobs_in_default_project_from_credentials create_jobs_in_selected_billing_project_id |
||
billing_project_id |
Billing GCP project ID. This is the project used as the default GCP project. The calling user must have a bigquery.jobs.create permission in this project. | string | |||
authentication_mode |
Authentication mode to the Google Cloud. | enum | google_application_credentials json_key_content json_key_path |
||
json_key_content |
JSON key content. Use an environment variable that contains the content of the key as ${KEY_ENV} or a name of a secret in the GCP Secret Manager: ${sm://key-secret-name}. Requires the authentication-mode: json_key_content. | string | |||
json_key_path |
A path to the JSON key file. Requires the authentication-mode: json_key_path. | string | |||
quota_project_id |
Quota GCP project ID. | string |
SnowflakeParametersSpec
Snowflake connection parameters.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
account |
Snowflake account name, e.q. <account>, <account>-<locator>, <account>.<region> or <account>.<region>.<platform>.. Supports also a ${SNOWFLAKE_ACCOUNT} configuration with a custom environment variable. | string | |||
warehouse |
Snowflake warehouse name. Supports also a ${SNOWFLAKE_WAREHOUSE} configuration with a custom environment variable. | string | |||
database |
Snowflake database name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
user |
Snowflake user name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
password |
Snowflake database password. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
role |
Snowflake role name. Supports also ${SNOWFLAKE_ROLE} configuration with a custom environment variable. | string | |||
properties |
A dictionary of custom JDBC parameters that are added to the JDBC connection string, a key/value dictionary. | Dict[string, string] |
PostgresqlParametersSpec
Postgresql connection parameters.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
host |
PostgreSQL host name. Supports also a ${POSTGRESQL_HOST} configuration with a custom environment variable. | string | |||
port |
PostgreSQL port number. The default port is 5432. Supports also a ${POSTGRESQL_PORT} configuration with a custom environment variable. | string | |||
database |
PostgreSQL database name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
user |
PostgreSQL user name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
password |
PostgreSQL database password. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
options |
PostgreSQL connection 'options' initialization parameter. For example setting this to -c statement_timeout=5min would set the statement timeout parameter for this session to 5 minutes. Supports also a ${POSTGRESQL_OPTIONS} configuration with a custom environment variable. | string | |||
sslmode |
Sslmode PostgreSQL connection parameter. The default value is disabled. | enum | disable allow prefer require verify-ca verify-full |
||
postgresql_engine_type |
Postgresql engine type. Supports also a ${POSTGRESQL_ENGINE} configuration with a custom environment variable. | enum | postgresql timescale |
||
properties |
A dictionary of custom JDBC parameters that are added to the JDBC connection string, a key/value dictionary. | Dict[string, string] |
DuckdbParametersSpec
DuckDB connection parameters.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
read_mode |
DuckDB read mode. | enum | in_memory files |
||
files_format_type |
Type of source files format for DuckDB. | enum | csv json parquet avro iceberg delta_lake |
||
database |
DuckDB database name for in-memory read mode. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
properties |
A dictionary of custom JDBC parameters that are added to the JDBC connection string, a key/value dictionary. | Dict[string, string] | |||
csv |
Csv file format specification. | CsvFileFormatSpec | |||
json |
Json file format specification. | JsonFileFormatSpec | |||
parquet |
Parquet file format specification. | ParquetFileFormatSpec | |||
avro |
Avro file format specification. | AvroFileFormatSpec | |||
iceberg |
Iceberg file format specification. | IcebergFileFormatSpec | |||
delta_lake |
Delta Lake file format specification. | DeltaLakeFileFormatSpec | |||
directories |
Virtual schema name to directory mappings. The path must be an absolute path. | Dict[string, string] | |||
storage_type |
The storage type. | enum | local s3 azure gcs |
||
aws_authentication_mode |
The authentication mode for AWS. Supports also a ${DUCKDB_AWS_AUTHENTICATION_MODE} configuration with a custom environment variable. | enum | iam default_credentials |
||
azure_authentication_mode |
The authentication mode for Azure. Supports also a ${DUCKDB_AZURE_AUTHENTICATION_MODE} configuration with a custom environment variable. | enum | connection_string credential_chain service_principal default_credentials |
||
user |
DuckDB user name for a remote storage type. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
password |
DuckDB password for a remote storage type. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
region |
The region for the storage credentials. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
tenant_id |
Azure Tenant ID used by DuckDB Secret Manager. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
client_id |
Azure Client ID used by DuckDB Secret Manager. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
client_secret |
Azure Client Secret used by DuckDB Secret Manager. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
account_name |
Azure Storage Account Name used by DuckDB Secret Manager. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string |
CsvFileFormatSpec
Csv file format specification for querying data in the csv format files.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
all_varchar |
Option to skip type detection for CSV parsing and assume all columns to be of type VARCHAR. | boolean | |||
allow_quoted_nulls |
Option to allow the conversion of quoted values to NULL values. | boolean | |||
auto_detect |
Enables auto detection of CSV parameters. | boolean | |||
compression |
The compression type for the file. By default this will be detected automatically from the file extension (e.g., t.csv.gz will use gzip, t.csv will use none). Options are none, gzip, zstd. | enum | none auto gzip zstd snappy lz4 |
||
no_compression_extension |
Whether the compression extension is present at the end of the file name. | boolean | |||
dateformat |
Specifies the date format to use when parsing dates. | string | |||
decimal_separator |
The decimal separator of numbers. | string | |||
delim |
Specifies the string that separates columns within each row (line) of the file. | string | |||
escape |
Specifies the string that should appear before a data character sequence that matches the quote value. | string | |||
filename |
Whether or not an extra filename column should be included in the result. | boolean | |||
header |
Specifies that the file contains a header line with the names of each column in the file. | boolean | |||
hive_partitioning |
Whether or not to interpret the path as a hive partitioned path. | boolean | |||
ignore_errors |
Option to ignore any parsing errors encountered - and instead ignore rows with errors. | boolean | |||
new_line |
Set the new line character(s) in the file. Options are '\r','\n', or '\r\n'. | enum | cr lf crlf |
||
quote |
Specifies the quoting string to be used when a data value is quoted. | string | |||
sample_size |
The number of sample rows for auto detection of parameters. | long | |||
skip |
The number of lines at the top of the file to skip. | long | |||
timestampformat |
Specifies the date format to use when parsing timestamps. | string |
JsonFileFormatSpec
Json file format specification for querying data in the json format files.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
auto_detect |
Whether to auto-detect detect the names of the keys and data types of the values automatically. | boolean | |||
compression |
The compression type for the file. By default this will be detected automatically from the file extension (e.g., t.json.gz will use gzip, t.json will use none). Options are 'none', 'gzip', 'zstd', and 'auto'. | enum | none auto gzip zstd snappy lz4 |
||
no_compression_extension |
Whether the compression extension is present at the end of the file name. | boolean | |||
convert_strings_to_integers |
Whether strings representing integer values should be converted to a numerical type. | boolean | |||
dateformat |
Specifies the date format to use when parsing dates. | string | |||
filename |
Whether or not an extra filename column should be included in the result. | boolean | |||
format |
Json format. Can be one of ['auto', 'unstructured', 'newline_delimited', 'array']. | enum | auto unstructured newline_delimited array |
||
hive_partitioning |
Whether or not to interpret the path as a hive partitioned path. | boolean | |||
ignore_errors |
Whether to ignore parse errors (only possible when format is 'newline_delimited'). | boolean | |||
maximum_depth |
Maximum nesting depth to which the automatic schema detection detects types. Set to -1 to fully detect nested JSON types. | long | |||
maximum_object_size |
The maximum size of a JSON object (in bytes). | long | |||
records |
Can be one of ['auto', 'true', 'false']. | enum | auto true false |
||
sample_size |
The number of sample rows for auto detection of parameters. | long | |||
timestampformat |
Specifies the date format to use when parsing timestamps. | string |
ParquetFileFormatSpec
Parquet file format specification for querying data in the parquet format files.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
binary_as_string |
Parquet files generated by legacy writers do not correctly set the UTF8 flag for strings, causing string columns to be loaded as BLOB instead. Set this to true to load binary columns as strings. | boolean | |||
filename |
Whether or not an extra filename column should be included in the result. | boolean | |||
file_row_number |
Whether or not to include the file_row_number column. | boolean | |||
hive_partitioning |
Whether or not to interpret the path as a hive partitioned path. | boolean | |||
union_by_name |
Whether the columns of multiple schemas should be unified by name, rather than by position. | boolean | |||
compression |
The compression type for the file. | enum | none auto gzip zstd snappy lz4 |
||
no_compression_extension |
Whether the compression extension is present at the end of the file name. | boolean |
AvroFileFormatSpec
Csv file format specification for querying data in the Avro format files.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
filename |
Whether or not an extra filename column should be included in the result. | boolean |
IcebergFileFormatSpec
Iceberg file format specification for querying data in the csv format files.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
allow_moved_paths |
The option ensures that some path resolution is performed, which allows scanning Iceberg tables that are moved. | boolean |
DeltaLakeFileFormatSpec
DeltaLake file format specification for querying data in the csv format files.
RedshiftParametersSpec
Redshift connection parameters.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
host |
Redshift host name. Supports also a ${REDSHIFT_HOST} configuration with a custom environment variable. | string | |||
port |
Redshift port number. The default port is 5432. Supports also a ${REDSHIFT_PORT} configuration with a custom environment variable. | string | |||
database |
Redshift database name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
redshift_authentication_mode |
The authentication mode for AWS. Supports also a ${REDSHIFT_AUTHENTICATION_MODE} configuration with a custom environment variable. | enum | iam default_credentials user_password |
||
user |
Redshift user name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
password |
Redshift database password. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
properties |
A dictionary of custom JDBC parameters that are added to the JDBC connection string, a key/value dictionary. | Dict[string, string] |
SqlServerParametersSpec
Microsoft SQL Server connection parameters.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
host |
SQL Server host name. Supports also a ${SQLSERVER_HOST} configuration with a custom environment variable. | string | |||
port |
SQL Server port number. The default port is 1433. Supports also a ${SQLSERVER_PORT} configuration with a custom environment variable. | string | |||
database |
SQL Server database name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
user |
SQL Server user name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
password |
SQL Server database password. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
disable_encryption |
Disable SSL encryption parameter. The default value is false. You may need to disable encryption when SQL Server is started in Docker. | boolean | |||
authentication_mode |
Authenticaiton mode for the SQL Server. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | enum | sql_password active_directory_password active_directory_service_principal active_directory_default |
||
properties |
A dictionary of custom JDBC parameters that are added to the JDBC connection string, a key/value dictionary. | Dict[string, string] |
PrestoParametersSpec
Presto connection parameters.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
host |
Presto host name. Supports also a ${PRESTO_HOST} configuration with a custom environment variable. | string | |||
port |
Presto port number. The default port is 8080. Supports also a ${PRESTO_PORT} configuration with a custom environment variable. | string | |||
database |
Presto database name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
user |
Presto user name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
password |
Presto database password. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
properties |
A dictionary of custom JDBC parameters that are added to the JDBC connection string, a key/value dictionary. | Dict[string, string] |
TrinoParametersSpec
Trino connection parameters.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
trino_engine_type |
Trino engine type. Supports also a ${TRINO_ENGINE} configuration with a custom environment variable. | enum | trino athena |
||
host |
Trino host name. Supports also a ${TRINO_HOST} configuration with a custom environment variable. | string | |||
port |
Trino port number. The default port is 8080. Supports also a ${TRINO_PORT} configuration with a custom environment variable. | string | |||
user |
Trino user name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
password |
Trino database password. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
aws_authentication_mode |
The authentication mode for AWS Athena. Supports also a ${ATHENA_AWS_AUTHENTICATION_MODE} configuration with a custom environment variable. | enum | iam default_credentials |
||
athena_region |
The AWS Region where queries will be run. Supports also a ${ATHENA_REGION} configuration with a custom environment variable. | string | |||
catalog |
The catalog that contains the databases and the tables that will be accessed with the driver. Supports also a ${TRINO_CATALOG} configuration with a custom environment variable. | string | |||
athena_work_group |
The workgroup in which queries will run. Supports also a ${ATHENA_WORK_GROUP} configuration with a custom environment variable. | string | |||
athena_output_location |
The location in Amazon S3 where query results will be stored. Supports also a ${ATHENA_OUTPUT_LOCATION} configuration with a custom environment variable. | string | |||
properties |
A dictionary of custom JDBC parameters that are added to the JDBC connection string, a key/value dictionary. | Dict[string, string] |
MysqlParametersSpec
MySql connection parameters.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
host |
MySQL host name. Supports also a ${MYSQL_HOST} configuration with a custom environment variable. | string | |||
port |
MySQL port number. The default port is 3306. Supports also a ${MYSQL_PORT} configuration with a custom environment variable. | string | |||
database |
MySQL database name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
user |
MySQL user name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
password |
MySQL database password. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
sslmode |
SslMode MySQL connection parameter. | enum | DISABLED PREFERRED REQUIRED VERIFY_CA VERIFY_IDENTITY |
||
single_store_db_parameters_spec |
Single Store DB parameters spec. | SingleStoreDbParametersSpec | |||
mysql_engine_type |
MySQL engine type. Supports also a ${MYSQL_ENGINE} configuration with a custom environment variable. | enum | mysql singlestoredb |
||
properties |
A dictionary of custom JDBC parameters that are added to the JDBC connection string, a key/value dictionary. | Dict[string, string] |
SingleStoreDbParametersSpec
Single Store DB connection parameters.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
load_balancing_mode |
SingleStoreDB Failover and Load-Balancing Modes for Single Store DB. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | enum | none sequential loadbalance |
||
host_descriptions |
SingleStoreDB Host descriptions. Supports also a ${SINGLE_STORE_HOST_DESCRIPTIONS} configuration with a custom environment variable. | List[string] | |||
schema |
SingleStoreDB database/schema name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
use_ssl |
Force enables SSL/TLS on the connection. Supports also a ${SINGLE_STORE_USE_SSL} configuration with a custom environment variable. | boolean |
OracleParametersSpec
Oracle connection parameters.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
host |
Oracle host name. Supports also a ${ORACLE_HOST} configuration with a custom environment variable. | string | |||
port |
Oracle port number. The default port is 1521. Supports also a ${ORACLE_PORT} configuration with a custom environment variable. | string | |||
database |
Oracle database name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
user |
Oracle user name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
password |
Oracle database password. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
initialization_sql |
Custom SQL that is executed after connecting to Oracle. This SQL script can configure the default language, for example: alter session set NLS_DATE_FORMAT='YYYY-DD-MM HH24:MI:SS' | string | |||
properties |
A dictionary of custom JDBC parameters that are added to the JDBC connection string, a key/value dictionary. | Dict[string, string] |
SparkParametersSpec
Apache Spark connection parameters.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
host |
Spark host name. Supports also a ${SPARK_HOST} configuration with a custom environment variable. | string | |||
port |
Spark port number. The default port is 10000. Supports also a ${SPARK_PORT} configuration with a custom environment variable. | string | |||
user |
Spark user name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
password |
Spark database password. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
properties |
A dictionary of custom JDBC parameters that are added to the JDBC connection string, a key/value dictionary. | Dict[string, string] |
DatabricksParametersSpec
Databricks connection parameters.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
host |
Databricks host name. Supports also a ${DATABRICKS_HOST} configuration with a custom environment variable. | string | |||
port |
Databricks port number. The default port is 443. Supports also a ${DATABRICKS_PORT} configuration with a custom environment variable. | string | |||
catalog |
Databricks catalog name. Supports also a ${DATABRICKS_CATALOG} configuration with a custom environment variable. | string | |||
user |
(Obsolete) Databricks user name. Supports also a ${DATABRICKS_USER} configuration with a custom environment variable. | string | |||
password |
(Obsolete) Databricks database password. Supports also a ${DATABRICKS_PASSWORD} configuration with a custom environment variable. | string | |||
http_path |
Databricks http path to the warehouse. For example: /sql/1.0/warehouses/<warehouse instance id>. Supports also a ${DATABRICKS_HTTP_PATH} configuration with a custom environment variable. | string | |||
access_token |
Databricks access token the warehouse. Supports also a ${DATABRICKS_ACCESS_TOKEN} configuration with a custom environment variable. | string | |||
initialization_sql |
Custom SQL that is executed after connecting to Databricks. | string | |||
properties |
A dictionary of custom JDBC parameters that are added to the JDBC connection string, a key/value dictionary. | Dict[string, string] |
HanaParametersSpec
Sap Hana connection parameters.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
host |
Hana host name. Supports also a ${HANA_HOST} configuration with a custom environment variable. | string | |||
port |
Hana port number. The default port is 30015. Supports also a ${HANA_PORT} configuration with a custom environment variable. | string | |||
instance_number |
Hana instance number. Supports also a ${HANA_INSTANCE_NUMBER} configuration with a custom environment variable. | string | |||
user |
Hana user name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
password |
Hana database password. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
properties |
A dictionary of custom JDBC parameters that are added to the JDBC connection string, a key/value dictionary. | Dict[string, string] |
Db2ParametersSpec
IBM DB2 connection parameters.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
db2_platform_type |
DB2 platform type. Supports also a ${DB2_PLATFORM} configuration with a custom environment variable. | enum | luw zos |
||
host |
DB2 host name. Supports also a ${DB2_HOST} configuration with a custom environment variable. | string | |||
port |
DB2 port number. The default port is 50000. Supports also a ${DB2_PORT} configuration with a custom environment variable. | string | |||
database |
DB2 database name. Supports also a ${DB2_DATABASE} configuration with a custom environment variable. | string | |||
user |
DB2 user name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
password |
DB2 database password. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
properties |
A dictionary of custom JDBC parameters that are added to the JDBC connection string, a key/value dictionary. | Dict[string, string] |
MariaDbParametersSpec
MariaDB connection parameters.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
host |
MariaDB host name. Supports also a ${MARIADB_HOST} configuration with a custom environment variable. | string | |||
port |
MariaDB port number. The default port is 3306. Supports also a ${MARIADB_PORT} configuration with a custom environment variable. | string | |||
database |
MariaDB database name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
user |
MariaDB user name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
password |
MariaDB database password. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
properties |
A dictionary of custom JDBC parameters that are added to the JDBC connection string, a key/value dictionary. | Dict[string, string] |
ClickHouseParametersSpec
ClickHouse connection parameters.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
host |
ClickHouse host name. Supports also a ${CLICKHOUSE_HOST} configuration with a custom environment variable. | string | |||
port |
ClickHouse port number. The default port is 30015. Supports also a ${CLICKHOUSE_PORT} configuration with a custom environment variable. | string | |||
database |
ClickHouse instance number. Supports also a ${CLICKHOUSE_DATABASE_NAME} configuration with a custom environment variable. | string | |||
user |
ClickHouse user name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
password |
ClickHouse database password. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
properties |
A dictionary of custom JDBC parameters that are added to the JDBC connection string, a key/value dictionary. | Dict[string, string] |
QuestDbParametersSpec
QuestDB connection parameters.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
host |
QuestDB host name. Supports also a ${QUESTDB_HOST} configuration with a custom environment variable. | string | |||
port |
QuestDB port number. The default port is 8812. Supports also a ${QUESTDB_PORT} configuration with a custom environment variable. | string | |||
database |
QuestDB database name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
user |
QuestDB user name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
password |
QuestDB database password. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
properties |
A dictionary of custom JDBC parameters that are added to the JDBC connection string, a key/value dictionary. | Dict[string, string] |
TeradataParametersSpec
Teradata connection parameters.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
host |
Teradata host name. Supports also a ${TERADATA_HOST} configuration with a custom environment variable. | string | |||
port |
Teradata port number. The default port is 1025. Supports also a ${TERADATA_PORT} configuration with a custom environment variable. | string | |||
user |
Teradata user name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
password |
Teradata database password. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. | string | |||
properties |
A dictionary of custom JDBC parameters that are added to the JDBC connection string, a key/value dictionary. | Dict[string, string] |
DataGroupingConfigurationSpec
Configuration of the data groupings that is used to calculate data quality checks with a GROUP BY clause. Data grouping levels may be hardcoded if we have different (but similar) tables for different business areas (countries, product groups). We can also pull data grouping levels directly from the database if a table has a column that identifies a business area. Data quality results for new groups are dynamically identified in the database by the GROUP BY clause. Sensor values are extracted for each data group separately, a time series is build for each data group separately.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
level_1 |
Data grouping dimension level 1 configuration. | DataGroupingDimensionSpec | |||
level_2 |
Data grouping dimension level 2 configuration. | DataGroupingDimensionSpec | |||
level_3 |
Data grouping dimension level 3 configuration. | DataGroupingDimensionSpec | |||
level_4 |
Data grouping dimension level 4 configuration. | DataGroupingDimensionSpec | |||
level_5 |
Data grouping dimension level 5 configuration. | DataGroupingDimensionSpec | |||
level_6 |
Data grouping dimension level 6 configuration. | DataGroupingDimensionSpec | |||
level_7 |
Data grouping dimension level 7 configuration. | DataGroupingDimensionSpec | |||
level_8 |
Data grouping dimension level 8 configuration. | DataGroupingDimensionSpec | |||
level_9 |
Data grouping dimension level 9 configuration. | DataGroupingDimensionSpec |
DataGroupingDimensionSpec
Single data grouping dimension configuration. A data grouping dimension may be configured as a hardcoded value or a mapping to a column.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
source |
The source of the data grouping dimension value. The default source of the grouping dimension is a tag. The tag should be assigned when there are many similar tables that store the same data for different areas (countries, etc.). It can be the name of the country if the table or partition stores information for that country. | enum | tag column_value |
||
tag |
The value assigned to the data quality grouping dimension when the source is 'tag'. Assign a hard-coded (static) value to the data grouping dimension (tag) when there are multiple similar tables storing the same data for different areas (countries, etc.). This can be the name of the country if the table or partition stores information for that country. | string | |||
column |
Column name that contains a dynamic data grouping dimension value (for dynamic data-driven data groupings). Sensor queries will be extended with a GROUP BY {data group level colum name}, sensors (and alerts) will be calculated for each unique value of the specified column. Also a separate time series will be tracked for each value. | string | |||
name |
Data grouping dimension name. | string |
CronSchedulesSpec
Container of all monitoring schedules (cron expressions) for each type of checks. Data quality checks are grouped by type (profiling, whole table checks, time period partitioned checks). Each group of checks can be further divided by time scale (daily, monthly, etc). Each time scale has a different monitoring schedule used by the job scheduler to run the checks. These schedules are defined in this object.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
profiling |
Schedule for running profiling data quality checks. | CronScheduleSpec | |||
monitoring_daily |
Schedule for running daily monitoring checks. | CronScheduleSpec | |||
monitoring_monthly |
Schedule for running monthly monitoring checks. | CronScheduleSpec | |||
partitioned_daily |
Schedule for running daily partitioned checks. | CronScheduleSpec | |||
partitioned_monthly |
Schedule for running monthly partitioned checks. | CronScheduleSpec |
AutoImportTablesSpec
Specification object configured on a connection that configures how DQOps performs automatic schema import by a CRON scheduler.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
schema_filter |
Source schema name filter. Accepts filters in the form of s, s and s to restrict import to selected schemas. | string | |||
table_name_contains |
Source table name filter. It is a table name or a text that must be present inside the table name. | string | |||
schedule |
Schedule for importing source tables using a CRON scheduler. | CronScheduleSpec |
ConnectionIncidentGroupingSpec
Configuration of data quality incident grouping on a connection level. Defines how similar data quality issues are grouped into incidents.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
grouping_level |
Grouping level of failed data quality checks for creating higher level data quality incidents. The default grouping level is by a table, a data quality dimension and a check category (i.e. a datatype data quality incident detected on a table X in the numeric checks category). | enum | table table_dimension table_dimension_category table_dimension_category_type table_dimension_category_name |
||
minimum_severity |
Minimum severity level of data quality issues that are grouped into incidents. The default minimum severity level is 'warning'. Other supported severity levels are 'error' and 'fatal'. | enum | warning error fatal |
||
divide_by_data_groups |
Create separate data quality incidents for each data group, creating different incidents for different groups of rows. By default, data groups are ignored for grouping data quality issues into data quality incidents. | boolean | |||
max_incident_length_days |
The maximum length of a data quality incident in days. When a new data quality issue is detected after max_incident_length_days days since a similar data quality was first seen, a new data quality incident is created that will capture all following data quality issues for the next max_incident_length_days days. The default value is 60 days. | integer | |||
mute_for_days |
The number of days that all similar data quality issues are muted when a a data quality incident is closed in the 'mute' status. | integer | |||
disabled |
Disables data quality incident creation for failed data quality checks on the data source. | boolean | |||
incident_notification |
Configuration of addresses for new or updated incident notifications. | IncidentNotificationSpec |
IncidentNotificationSpec
Configuration of addresses used for new or updated incident's notifications. Specifies the webhook URLs or email addresses where the notification messages are sent.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
incident_opened_addresses |
Notification address(es) where the notification messages describing new incidents are pushed using a HTTP POST request (for webhook address) or an SMTP (for email address). The format of the JSON message is documented in the IncidentNotificationMessage object. | string | |||
incident_acknowledged_addresses |
Notification address(es) where the notification messages describing acknowledged messages are pushed using a HTTP POST request (for webhook address) or an SMTP (for email address). The format of the JSON message is documented in the IncidentNotificationMessage object. | string | |||
incident_resolved_addresses |
Notification address(es) where the notification messages describing resolved messages are pushed using a HTTP POST request (for webhook address) or an SMTP (for email address). The format of the JSON message is documented in the IncidentNotificationMessage object. | string | |||
incident_muted_addresses |
Notification address(es) where the notification messages describing muted messages are pushed using a HTTP POST request (for webhook address) or an SMTP (for email address). The format of the JSON message is documented in the IncidentNotificationMessage object. | string | |||
filtered_notifications |
Filtered notifications map with filter configuration and notification addresses treated with higher priority than those from the current class. | FilteredNotificationSpecMap |
FilteredNotificationSpecMap
The map for the filtered notification specification.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
self |
Dict[string, FilteredNotificationSpec] |
FilteredNotificationSpec
Notification with filters that is sent only if values in notification message match the filters.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
filter |
Notification filter specification for filtering the incident by the values of its fields. | NotificationFilterSpec | |||
target |
Notification target addresses for each of the status. | IncidentNotificationTargetSpec | |||
priority |
The priority of the notification. Notifications are sent to the first notification targets that matches the filters when processAdditionalFilters is not set. | integer | |||
process_additional_filters |
Flag to break sending next notifications. Setting to true allows to send next notification from the list in priority order that matches the filter. | boolean | |||
disabled |
Flag to turn off the notification filter. | boolean | |||
message |
Message with the details of the filtered notification such as purpose explanation, SLA note, etc. | string | |||
do_not_create_incidents |
Flag to remove incident that match the filters. | boolean |
NotificationFilterSpec
Filter for filtered notifications.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
connection |
Connection name. Supports search patterns in the format: 'source*', '*_prod', 'prefix*suffix'. | string | |||
schema |
Schema name. This field accepts search patterns in the format: 'schema_name_*', '*_schema', 'prefix*suffix'. | string | |||
table |
Table name. This field accepts search patterns in the format: 'table_name_*', '*table', 'prefix*suffix'. | string | |||
table_priority |
Table priority. | integer | |||
data_group_name |
Data group name. This field accepts search patterns in the format: 'group_name_*', '*group', 'prefix*suffix'. | string | |||
quality_dimension |
Quality dimension. | string | |||
check_category |
The target check category, for example: nulls, volume, anomaly. | string | |||
check_type |
The target type of checks to run. Supported values are profiling, monitoring and partitioned. | enum | profiling monitoring partitioned |
||
check_name |
The target check name to run only this named check. Uses the short check name which is the name of the deepest folder in the checks folder. This field supports search patterns such as: 'profiling_*', '*count', 'profiling*_percent'. | string | |||
highest_severity |
Highest severity. | integer |
IncidentNotificationTargetSpec
Configuration of addresses used for new or updated incident's notifications. Specifies the webhook URLs or email addresses where the notification messages are sent.
The structure of this object is described below
Property name | Description | Data type | Enum values | Default value | Sample values |
---|---|---|---|---|---|
incident_opened_addresses |
Notification address(es) where the notification messages describing new incidents are pushed using a HTTP POST request (for webhook address) or an SMTP (for email address). The format of the JSON message is documented in the IncidentNotificationMessage object. | string | |||
incident_acknowledged_addresses |
Notification address(es) where the notification messages describing acknowledged messages are pushed using a HTTP POST request (for webhook address) or an SMTP (for email address). The format of the JSON message is documented in the IncidentNotificationMessage object. | string | |||
incident_resolved_addresses |
Notification address(es) where the notification messages describing resolved messages are pushed using a HTTP POST request (for webhook address) or an SMTP (for email address). The format of the JSON message is documented in the IncidentNotificationMessage object. | string | |||
incident_muted_addresses |
Notification address(es) where the notification messages describing muted messages are pushed using a HTTP POST request (for webhook address) or an SMTP (for email address). The format of the JSON message is documented in the IncidentNotificationMessage object. | string |
LabelSetSpec
A collection of unique labels assigned to items (tables, columns, checks) that can be targeted for a data quality check execution.