Content-Length: 249907 | pFad | http://dqops.com/docs/reference/yaml/ConnectionYaml/#duckdbparametersspec

DQOps YAML file definitions
Skip to content

Last updated: January 14, 2025

DQOps YAML file definitions

The definition of YAML files used by DQOps to configure the data sources, monitored tables, and the configuration of activated data quality checks.

ConnectionYaml

Connection definition for a data source connection that is covered by data quality checks.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
api_version DQOps YAML schema version string dqo/v1
kind File type enum source
table
sensor
provider_sensor
rule
check
settings
file_index
connection_similarity_index
dashboards
default_schedules
default_checks
default_table_checks
default_column_checks
default_notifications
source
spec Connection specification object with the connection parameters to the data source ConnectionSpec

ConnectionSpec

Data source (connection) specification.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
provider_type Database provider type (required). enum bigquery
clickhouse
databricks
db2
duckdb
hana
mariadb
mysql
oracle
postgresql
presto
questdb
redshift
snowflake
spark
sqlserver
teradata
trino
bigquery BigQuery connection parameters. Specify parameters in the bigquery section. BigQueryParametersSpec
snowflake Snowflake connection parameters. Specify parameters in the snowflake section or set the url (which is the Snowflake JDBC url). SnowflakeParametersSpec
postgresql PostgreSQL connection parameters. Specify parameters in the postgresql section or set the url (which is the PostgreSQL JDBC url). PostgresqlParametersSpec
duckdb DuckDB connection parameters. Specify parameters in the duckdb section or set the url (which is the DuckDB JDBC url). DuckdbParametersSpec
redshift Redshift connection parameters. Specify parameters in the redshift section or set the url (which is the Redshift JDBC url). RedshiftParametersSpec
sqlserver SQL Server connection parameters. Specify parameters in the sqlserver section or set the url (which is the SQL Server JDBC url). SqlServerParametersSpec
presto Presto connection parameters. Specify parameters in the presto section or set the url (which is the Presto JDBC url). PrestoParametersSpec
trino Trino connection parameters. Specify parameters in the trino section or set the url (which is the Trino JDBC url). TrinoParametersSpec
mysql MySQL connection parameters. Specify parameters in the mysql section or set the url (which is the MySQL JDBC url). MysqlParametersSpec
oracle Oracle connection parameters. Specify parameters in the oracle section or set the url (which is the Oracle JDBC url). OracleParametersSpec
spark Spark connection parameters. Specify parameters in the spark section or set the url (which is the Spark JDBC url). SparkParametersSpec
databricks Databricks connection parameters. Specify parameters in the databricks section or set the url (which is the Databricks JDBC url). DatabricksParametersSpec
hana HANA connection parameters. Specify parameters in the hana section or set the url (which is the HANA JDBC url). HanaParametersSpec
db2 DB2 connection parameters. Specify parameters in the db2 section or set the url (which is the DB2 JDBC url). Db2ParametersSpec
mariadb MariaDB connection parameters. Specify parameters in the mariadb section or set the url (which is the MariaDB JDBC url). MariaDbParametersSpec
clickhouse ClickHouse connection parameters. Specify parameters in the clickhouse section or set the url (which is the ClickHouse JDBC url). ClickHouseParametersSpec
questdb QuestDB connection parameters. Specify parameters in the questdb section or set the url (which is the QuestDB JDBC url). QuestDbParametersSpec
teradata Teradata connection parameters. Specify parameters in the teradata section or set the url (which is the Teradata JDBC url). TeradataParametersSpec
parallel_jobs_limit The concurrency limit for the maximum number of parallel SQL queries executed on this connection. integer
default_grouping_configuration Default data grouping configuration for all tables. The configuration may be overridden on table, column and check level. Data groupings are configured in two cases: (1) the data in the table should be analyzed with a GROUP BY condition, to analyze different datasets using separate time series, for example a table contains data from multiple countries and there is a 'country' column used for partitioning. a static dimension is assigned to a table, when the data is partitioned at a table level (similar tables store the same information, but for different countries, etc.). (2) a static dimension is assigned to a table, when the data is partitioned at a table level (similar tables store the same information, but for different countries, etc.). DataGroupingConfigurationSpec
schedules Configuration of the job scheduler that runs data quality checks. The scheduler configuration is divided into types of checks that have different schedules. CronSchedulesSpec
auto_import_tables Configuration of CRON schedule used to automatically import new tables in regular intervals. AutoImportTablesSpec
schedule_on_instance Limits running scheduled checks (started by a CRON job scheduler) to run only on a named DQOps instance. When this field is empty, data quality checks are run on all DQOps instances. Set a DQOps instance name to run checks on a named instance only. The default name of the DQOps Cloud SaaS instance is "cloud". string
incident_grouping Configuration of data quality incident grouping. Configures how failed data quality checks are grouped into data quality incidents. ConnectionIncidentGroupingSpec
comments Comments for change tracking. Please put comments in this collection because YAML comments may be removed when the YAML file is modified by the tool (serialization and deserialization will remove non tracked comments). CommentsListSpec
labels Custom labels that were assigned to the connection. Labels are used for searching for tables when filtered data quality checks are executed. LabelSetSpec
advanced_properties A dictionary of advanced properties that can be used for e.g. to support mapping data to data catalogs, a key/value dictionary. Dict[string, string]

BigQueryParametersSpec

BigQuery connection parameters.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
source_project_id Source GCP project ID. This is the project that has datasets that will be imported. string
jobs_create_project Configures the way how to select the project that will be used to start BigQuery jobs and will be used for billing. The user/service identified by the credentials must have bigquery.jobs.create permission in that project. enum create_jobs_in_source_project
create_jobs_in_default_project_from_credentials
create_jobs_in_selected_billing_project_id
billing_project_id Billing GCP project ID. This is the project used as the default GCP project. The calling user must have a bigquery.jobs.create permission in this project. string
authentication_mode Authentication mode to the Google Cloud. enum google_application_credentials
json_key_content
json_key_path
json_key_content JSON key content. Use an environment variable that contains the content of the key as ${KEY_ENV} or a name of a secret in the GCP Secret Manager: ${sm://key-secret-name}. Requires the authentication-mode: json_key_content. string
json_key_path A path to the JSON key file. Requires the authentication-mode: json_key_path. string
quota_project_id Quota GCP project ID. string

SnowflakeParametersSpec

Snowflake connection parameters.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
account Snowflake account name, e.q. <account>, <account>-<locator>, <account>.<region> or <account>.<region>.<platform>.. Supports also a ${SNOWFLAKE_ACCOUNT} configuration with a custom environment variable. string
warehouse Snowflake warehouse name. Supports also a ${SNOWFLAKE_WAREHOUSE} configuration with a custom environment variable. string
database Snowflake database name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
user Snowflake user name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
password Snowflake database password. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
role Snowflake role name. Supports also ${SNOWFLAKE_ROLE} configuration with a custom environment variable. string
properties A dictionary of custom JDBC parameters that are added to the JDBC connection string, a key/value dictionary. Dict[string, string]

PostgresqlParametersSpec

Postgresql connection parameters.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
host PostgreSQL host name. Supports also a ${POSTGRESQL_HOST} configuration with a custom environment variable. string
port PostgreSQL port number. The default port is 5432. Supports also a ${POSTGRESQL_PORT} configuration with a custom environment variable. string
database PostgreSQL database name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
user PostgreSQL user name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
password PostgreSQL database password. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
options PostgreSQL connection 'options' initialization parameter. For example setting this to -c statement_timeout=5min would set the statement timeout parameter for this session to 5 minutes. Supports also a ${POSTGRESQL_OPTIONS} configuration with a custom environment variable. string
sslmode Sslmode PostgreSQL connection parameter. The default value is disabled. enum disable
allow
prefer
require
verify-ca
verify-full
postgresql_engine_type Postgresql engine type. Supports also a ${POSTGRESQL_ENGINE} configuration with a custom environment variable. enum postgresql
timescale
properties A dictionary of custom JDBC parameters that are added to the JDBC connection string, a key/value dictionary. Dict[string, string]

DuckdbParametersSpec

DuckDB connection parameters.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
read_mode DuckDB read mode. enum in_memory
files
files_format_type Type of source files format for DuckDB. enum csv
json
parquet
avro
iceberg
delta_lake
database DuckDB database name for in-memory read mode. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
properties A dictionary of custom JDBC parameters that are added to the JDBC connection string, a key/value dictionary. Dict[string, string]
csv Csv file format specification. CsvFileFormatSpec
json Json file format specification. JsonFileFormatSpec
parquet Parquet file format specification. ParquetFileFormatSpec
avro Avro file format specification. AvroFileFormatSpec
iceberg Iceberg file format specification. IcebergFileFormatSpec
delta_lake Delta Lake file format specification. DeltaLakeFileFormatSpec
directories Virtual schema name to directory mappings. The path must be an absolute path. Dict[string, string]
storage_type The storage type. enum local
s3
azure
gcs
aws_authentication_mode The authentication mode for AWS. Supports also a ${DUCKDB_AWS_AUTHENTICATION_MODE} configuration with a custom environment variable. enum iam
default_credentials
azure_authentication_mode The authentication mode for Azure. Supports also a ${DUCKDB_AZURE_AUTHENTICATION_MODE} configuration with a custom environment variable. enum connection_string
credential_chain
service_principal
default_credentials
user DuckDB user name for a remote storage type. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
password DuckDB password for a remote storage type. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
region The region for the storage credentials. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
tenant_id Azure Tenant ID used by DuckDB Secret Manager. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
client_id Azure Client ID used by DuckDB Secret Manager. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
client_secret Azure Client Secret used by DuckDB Secret Manager. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
account_name Azure Storage Account Name used by DuckDB Secret Manager. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string

CsvFileFormatSpec

Csv file format specification for querying data in the csv format files.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
all_varchar Option to skip type detection for CSV parsing and assume all columns to be of type VARCHAR. boolean
allow_quoted_nulls Option to allow the conversion of quoted values to NULL values. boolean
auto_detect Enables auto detection of CSV parameters. boolean
compression The compression type for the file. By default this will be detected automatically from the file extension (e.g., t.csv.gz will use gzip, t.csv will use none). Options are none, gzip, zstd. enum none
auto
gzip
zstd
snappy
lz4
no_compression_extension Whether the compression extension is present at the end of the file name. boolean
dateformat Specifies the date format to use when parsing dates. string
decimal_separator The decimal separator of numbers. string
delim Specifies the string that separates columns within each row (line) of the file. string
escape Specifies the string that should appear before a data character sequence that matches the quote value. string
filename Whether or not an extra filename column should be included in the result. boolean
header Specifies that the file contains a header line with the names of each column in the file. boolean
hive_partitioning Whether or not to interpret the path as a hive partitioned path. boolean
ignore_errors Option to ignore any parsing errors encountered - and instead ignore rows with errors. boolean
new_line Set the new line character(s) in the file. Options are '\r','\n', or '\r\n'. enum cr
lf
crlf
quote Specifies the quoting string to be used when a data value is quoted. string
sample_size The number of sample rows for auto detection of parameters. long
skip The number of lines at the top of the file to skip. long
timestampformat Specifies the date format to use when parsing timestamps. string

JsonFileFormatSpec

Json file format specification for querying data in the json format files.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
auto_detect Whether to auto-detect detect the names of the keys and data types of the values automatically. boolean
compression The compression type for the file. By default this will be detected automatically from the file extension (e.g., t.json.gz will use gzip, t.json will use none). Options are 'none', 'gzip', 'zstd', and 'auto'. enum none
auto
gzip
zstd
snappy
lz4
no_compression_extension Whether the compression extension is present at the end of the file name. boolean
convert_strings_to_integers Whether strings representing integer values should be converted to a numerical type. boolean
dateformat Specifies the date format to use when parsing dates. string
filename Whether or not an extra filename column should be included in the result. boolean
format Json format. Can be one of ['auto', 'unstructured', 'newline_delimited', 'array']. enum auto
unstructured
newline_delimited
array
hive_partitioning Whether or not to interpret the path as a hive partitioned path. boolean
ignore_errors Whether to ignore parse errors (only possible when format is 'newline_delimited'). boolean
maximum_depth Maximum nesting depth to which the automatic schema detection detects types. Set to -1 to fully detect nested JSON types. long
maximum_object_size The maximum size of a JSON object (in bytes). long
records Can be one of ['auto', 'true', 'false']. enum auto
true
false
sample_size The number of sample rows for auto detection of parameters. long
timestampformat Specifies the date format to use when parsing timestamps. string

ParquetFileFormatSpec

Parquet file format specification for querying data in the parquet format files.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
binary_as_string Parquet files generated by legacy writers do not correctly set the UTF8 flag for strings, causing string columns to be loaded as BLOB instead. Set this to true to load binary columns as strings. boolean
filename Whether or not an extra filename column should be included in the result. boolean
file_row_number Whether or not to include the file_row_number column. boolean
hive_partitioning Whether or not to interpret the path as a hive partitioned path. boolean
union_by_name Whether the columns of multiple schemas should be unified by name, rather than by position. boolean
compression The compression type for the file. enum none
auto
gzip
zstd
snappy
lz4
no_compression_extension Whether the compression extension is present at the end of the file name. boolean

AvroFileFormatSpec

Csv file format specification for querying data in the Avro format files.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
filename Whether or not an extra filename column should be included in the result. boolean

IcebergFileFormatSpec

Iceberg file format specification for querying data in the csv format files.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
allow_moved_paths The option ensures that some path resolution is performed, which allows scanning Iceberg tables that are moved. boolean

DeltaLakeFileFormatSpec

DeltaLake file format specification for querying data in the csv format files.


RedshiftParametersSpec

Redshift connection parameters.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
host Redshift host name. Supports also a ${REDSHIFT_HOST} configuration with a custom environment variable. string
port Redshift port number. The default port is 5432. Supports also a ${REDSHIFT_PORT} configuration with a custom environment variable. string
database Redshift database name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
redshift_authentication_mode The authentication mode for AWS. Supports also a ${REDSHIFT_AUTHENTICATION_MODE} configuration with a custom environment variable. enum iam
default_credentials
user_password
user Redshift user name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
password Redshift database password. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
properties A dictionary of custom JDBC parameters that are added to the JDBC connection string, a key/value dictionary. Dict[string, string]

SqlServerParametersSpec

Microsoft SQL Server connection parameters.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
host SQL Server host name. Supports also a ${SQLSERVER_HOST} configuration with a custom environment variable. string
port SQL Server port number. The default port is 1433. Supports also a ${SQLSERVER_PORT} configuration with a custom environment variable. string
database SQL Server database name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
user SQL Server user name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
password SQL Server database password. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
disable_encryption Disable SSL encryption parameter. The default value is false. You may need to disable encryption when SQL Server is started in Docker. boolean
authentication_mode Authenticaiton mode for the SQL Server. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. enum sql_password
active_directory_password
active_directory_service_principal
active_directory_default
properties A dictionary of custom JDBC parameters that are added to the JDBC connection string, a key/value dictionary. Dict[string, string]

PrestoParametersSpec

Presto connection parameters.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
host Presto host name. Supports also a ${PRESTO_HOST} configuration with a custom environment variable. string
port Presto port number. The default port is 8080. Supports also a ${PRESTO_PORT} configuration with a custom environment variable. string
database Presto database name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
user Presto user name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
password Presto database password. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
properties A dictionary of custom JDBC parameters that are added to the JDBC connection string, a key/value dictionary. Dict[string, string]

TrinoParametersSpec

Trino connection parameters.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
trino_engine_type Trino engine type. Supports also a ${TRINO_ENGINE} configuration with a custom environment variable. enum trino
athena
host Trino host name. Supports also a ${TRINO_HOST} configuration with a custom environment variable. string
port Trino port number. The default port is 8080. Supports also a ${TRINO_PORT} configuration with a custom environment variable. string
user Trino user name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
password Trino database password. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
aws_authentication_mode The authentication mode for AWS Athena. Supports also a ${ATHENA_AWS_AUTHENTICATION_MODE} configuration with a custom environment variable. enum iam
default_credentials
athena_region The AWS Region where queries will be run. Supports also a ${ATHENA_REGION} configuration with a custom environment variable. string
catalog The catalog that contains the databases and the tables that will be accessed with the driver. Supports also a ${TRINO_CATALOG} configuration with a custom environment variable. string
athena_work_group The workgroup in which queries will run. Supports also a ${ATHENA_WORK_GROUP} configuration with a custom environment variable. string
athena_output_location The location in Amazon S3 where query results will be stored. Supports also a ${ATHENA_OUTPUT_LOCATION} configuration with a custom environment variable. string
properties A dictionary of custom JDBC parameters that are added to the JDBC connection string, a key/value dictionary. Dict[string, string]

MysqlParametersSpec

MySql connection parameters.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
host MySQL host name. Supports also a ${MYSQL_HOST} configuration with a custom environment variable. string
port MySQL port number. The default port is 3306. Supports also a ${MYSQL_PORT} configuration with a custom environment variable. string
database MySQL database name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
user MySQL user name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
password MySQL database password. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
sslmode SslMode MySQL connection parameter. enum DISABLED
PREFERRED
REQUIRED
VERIFY_CA
VERIFY_IDENTITY
single_store_db_parameters_spec Single Store DB parameters spec. SingleStoreDbParametersSpec
mysql_engine_type MySQL engine type. Supports also a ${MYSQL_ENGINE} configuration with a custom environment variable. enum mysql
singlestoredb
properties A dictionary of custom JDBC parameters that are added to the JDBC connection string, a key/value dictionary. Dict[string, string]

SingleStoreDbParametersSpec

Single Store DB connection parameters.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
load_balancing_mode SingleStoreDB Failover and Load-Balancing Modes for Single Store DB. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. enum none
sequential
loadbalance
host_descriptions SingleStoreDB Host descriptions. Supports also a ${SINGLE_STORE_HOST_DESCRIPTIONS} configuration with a custom environment variable. List[string]
schema SingleStoreDB database/schema name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
use_ssl Force enables SSL/TLS on the connection. Supports also a ${SINGLE_STORE_USE_SSL} configuration with a custom environment variable. boolean

OracleParametersSpec

Oracle connection parameters.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
host Oracle host name. Supports also a ${ORACLE_HOST} configuration with a custom environment variable. string
port Oracle port number. The default port is 1521. Supports also a ${ORACLE_PORT} configuration with a custom environment variable. string
database Oracle database name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
user Oracle user name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
password Oracle database password. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
initialization_sql Custom SQL that is executed after connecting to Oracle. This SQL script can configure the default language, for example: alter session set NLS_DATE_FORMAT='YYYY-DD-MM HH24:MI:SS' string
properties A dictionary of custom JDBC parameters that are added to the JDBC connection string, a key/value dictionary. Dict[string, string]

SparkParametersSpec

Apache Spark connection parameters.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
host Spark host name. Supports also a ${SPARK_HOST} configuration with a custom environment variable. string
port Spark port number. The default port is 10000. Supports also a ${SPARK_PORT} configuration with a custom environment variable. string
user Spark user name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
password Spark database password. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
properties A dictionary of custom JDBC parameters that are added to the JDBC connection string, a key/value dictionary. Dict[string, string]

DatabricksParametersSpec

Databricks connection parameters.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
host Databricks host name. Supports also a ${DATABRICKS_HOST} configuration with a custom environment variable. string
port Databricks port number. The default port is 443. Supports also a ${DATABRICKS_PORT} configuration with a custom environment variable. string
catalog Databricks catalog name. Supports also a ${DATABRICKS_CATALOG} configuration with a custom environment variable. string
user (Obsolete) Databricks user name. Supports also a ${DATABRICKS_USER} configuration with a custom environment variable. string
password (Obsolete) Databricks database password. Supports also a ${DATABRICKS_PASSWORD} configuration with a custom environment variable. string
http_path Databricks http path to the warehouse. For example: /sql/1.0/warehouses/<warehouse instance id>. Supports also a ${DATABRICKS_HTTP_PATH} configuration with a custom environment variable. string
access_token Databricks access token the warehouse. Supports also a ${DATABRICKS_ACCESS_TOKEN} configuration with a custom environment variable. string
initialization_sql Custom SQL that is executed after connecting to Databricks. string
properties A dictionary of custom JDBC parameters that are added to the JDBC connection string, a key/value dictionary. Dict[string, string]

HanaParametersSpec

Sap Hana connection parameters.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
host Hana host name. Supports also a ${HANA_HOST} configuration with a custom environment variable. string
port Hana port number. The default port is 30015. Supports also a ${HANA_PORT} configuration with a custom environment variable. string
instance_number Hana instance number. Supports also a ${HANA_INSTANCE_NUMBER} configuration with a custom environment variable. string
user Hana user name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
password Hana database password. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
properties A dictionary of custom JDBC parameters that are added to the JDBC connection string, a key/value dictionary. Dict[string, string]

Db2ParametersSpec

IBM DB2 connection parameters.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
db2_platform_type DB2 platform type. Supports also a ${DB2_PLATFORM} configuration with a custom environment variable. enum luw
zos
host DB2 host name. Supports also a ${DB2_HOST} configuration with a custom environment variable. string
port DB2 port number. The default port is 50000. Supports also a ${DB2_PORT} configuration with a custom environment variable. string
database DB2 database name. Supports also a ${DB2_DATABASE} configuration with a custom environment variable. string
user DB2 user name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
password DB2 database password. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
properties A dictionary of custom JDBC parameters that are added to the JDBC connection string, a key/value dictionary. Dict[string, string]

MariaDbParametersSpec

MariaDB connection parameters.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
host MariaDB host name. Supports also a ${MARIADB_HOST} configuration with a custom environment variable. string
port MariaDB port number. The default port is 3306. Supports also a ${MARIADB_PORT} configuration with a custom environment variable. string
database MariaDB database name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
user MariaDB user name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
password MariaDB database password. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
properties A dictionary of custom JDBC parameters that are added to the JDBC connection string, a key/value dictionary. Dict[string, string]

ClickHouseParametersSpec

ClickHouse connection parameters.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
host ClickHouse host name. Supports also a ${CLICKHOUSE_HOST} configuration with a custom environment variable. string
port ClickHouse port number. The default port is 30015. Supports also a ${CLICKHOUSE_PORT} configuration with a custom environment variable. string
database ClickHouse instance number. Supports also a ${CLICKHOUSE_DATABASE_NAME} configuration with a custom environment variable. string
user ClickHouse user name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
password ClickHouse database password. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
properties A dictionary of custom JDBC parameters that are added to the JDBC connection string, a key/value dictionary. Dict[string, string]

QuestDbParametersSpec

QuestDB connection parameters.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
host QuestDB host name. Supports also a ${QUESTDB_HOST} configuration with a custom environment variable. string
port QuestDB port number. The default port is 8812. Supports also a ${QUESTDB_PORT} configuration with a custom environment variable. string
database QuestDB database name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
user QuestDB user name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
password QuestDB database password. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
properties A dictionary of custom JDBC parameters that are added to the JDBC connection string, a key/value dictionary. Dict[string, string]

TeradataParametersSpec

Teradata connection parameters.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
host Teradata host name. Supports also a ${TERADATA_HOST} configuration with a custom environment variable. string
port Teradata port number. The default port is 1025. Supports also a ${TERADATA_PORT} configuration with a custom environment variable. string
user Teradata user name. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
password Teradata database password. The value can be in the ${ENVIRONMENT_VARIABLE_NAME} format to use dynamic substitution. string
properties A dictionary of custom JDBC parameters that are added to the JDBC connection string, a key/value dictionary. Dict[string, string]

DataGroupingConfigurationSpec

Configuration of the data groupings that is used to calculate data quality checks with a GROUP BY clause. Data grouping levels may be hardcoded if we have different (but similar) tables for different business areas (countries, product groups). We can also pull data grouping levels directly from the database if a table has a column that identifies a business area. Data quality results for new groups are dynamically identified in the database by the GROUP BY clause. Sensor values are extracted for each data group separately, a time series is build for each data group separately.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
level_1 Data grouping dimension level 1 configuration. DataGroupingDimensionSpec
level_2 Data grouping dimension level 2 configuration. DataGroupingDimensionSpec
level_3 Data grouping dimension level 3 configuration. DataGroupingDimensionSpec
level_4 Data grouping dimension level 4 configuration. DataGroupingDimensionSpec
level_5 Data grouping dimension level 5 configuration. DataGroupingDimensionSpec
level_6 Data grouping dimension level 6 configuration. DataGroupingDimensionSpec
level_7 Data grouping dimension level 7 configuration. DataGroupingDimensionSpec
level_8 Data grouping dimension level 8 configuration. DataGroupingDimensionSpec
level_9 Data grouping dimension level 9 configuration. DataGroupingDimensionSpec

DataGroupingDimensionSpec

Single data grouping dimension configuration. A data grouping dimension may be configured as a hardcoded value or a mapping to a column.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
source The source of the data grouping dimension value. The default source of the grouping dimension is a tag. The tag should be assigned when there are many similar tables that store the same data for different areas (countries, etc.). It can be the name of the country if the table or partition stores information for that country. enum tag
column_value
tag The value assigned to the data quality grouping dimension when the source is 'tag'. Assign a hard-coded (static) value to the data grouping dimension (tag) when there are multiple similar tables storing the same data for different areas (countries, etc.). This can be the name of the country if the table or partition stores information for that country. string
column Column name that contains a dynamic data grouping dimension value (for dynamic data-driven data groupings). Sensor queries will be extended with a GROUP BY {data group level colum name}, sensors (and alerts) will be calculated for each unique value of the specified column. Also a separate time series will be tracked for each value. string
name Data grouping dimension name. string

CronSchedulesSpec

Container of all monitoring schedules (cron expressions) for each type of checks. Data quality checks are grouped by type (profiling, whole table checks, time period partitioned checks). Each group of checks can be further divided by time scale (daily, monthly, etc). Each time scale has a different monitoring schedule used by the job scheduler to run the checks. These schedules are defined in this object.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
profiling Schedule for running profiling data quality checks. CronScheduleSpec
monitoring_daily Schedule for running daily monitoring checks. CronScheduleSpec
monitoring_monthly Schedule for running monthly monitoring checks. CronScheduleSpec
partitioned_daily Schedule for running daily partitioned checks. CronScheduleSpec
partitioned_monthly Schedule for running monthly partitioned checks. CronScheduleSpec

AutoImportTablesSpec

Specification object configured on a connection that configures how DQOps performs automatic schema import by a CRON scheduler.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
schema_filter Source schema name filter. Accepts filters in the form of s, s and s to restrict import to selected schemas. string
table_name_contains Source table name filter. It is a table name or a text that must be present inside the table name. string
schedule Schedule for importing source tables using a CRON scheduler. CronScheduleSpec

ConnectionIncidentGroupingSpec

Configuration of data quality incident grouping on a connection level. Defines how similar data quality issues are grouped into incidents.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
grouping_level Grouping level of failed data quality checks for creating higher level data quality incidents. The default grouping level is by a table, a data quality dimension and a check category (i.e. a datatype data quality incident detected on a table X in the numeric checks category). enum table
table_dimension
table_dimension_category
table_dimension_category_type
table_dimension_category_name
minimum_severity Minimum severity level of data quality issues that are grouped into incidents. The default minimum severity level is 'warning'. Other supported severity levels are 'error' and 'fatal'. enum warning
error
fatal
divide_by_data_groups Create separate data quality incidents for each data group, creating different incidents for different groups of rows. By default, data groups are ignored for grouping data quality issues into data quality incidents. boolean
max_incident_length_days The maximum length of a data quality incident in days. When a new data quality issue is detected after max_incident_length_days days since a similar data quality was first seen, a new data quality incident is created that will capture all following data quality issues for the next max_incident_length_days days. The default value is 60 days. integer
mute_for_days The number of days that all similar data quality issues are muted when a a data quality incident is closed in the 'mute' status. integer
disabled Disables data quality incident creation for failed data quality checks on the data source. boolean
incident_notification Configuration of addresses for new or updated incident notifications. IncidentNotificationSpec

IncidentNotificationSpec

Configuration of addresses used for new or updated incident's notifications. Specifies the webhook URLs or email addresses where the notification messages are sent.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
incident_opened_addresses Notification address(es) where the notification messages describing new incidents are pushed using a HTTP POST request (for webhook address) or an SMTP (for email address). The format of the JSON message is documented in the IncidentNotificationMessage object. string
incident_acknowledged_addresses Notification address(es) where the notification messages describing acknowledged messages are pushed using a HTTP POST request (for webhook address) or an SMTP (for email address). The format of the JSON message is documented in the IncidentNotificationMessage object. string
incident_resolved_addresses Notification address(es) where the notification messages describing resolved messages are pushed using a HTTP POST request (for webhook address) or an SMTP (for email address). The format of the JSON message is documented in the IncidentNotificationMessage object. string
incident_muted_addresses Notification address(es) where the notification messages describing muted messages are pushed using a HTTP POST request (for webhook address) or an SMTP (for email address). The format of the JSON message is documented in the IncidentNotificationMessage object. string
filtered_notifications Filtered notifications map with filter configuration and notification addresses treated with higher priority than those from the current class. FilteredNotificationSpecMap

FilteredNotificationSpecMap

The map for the filtered notification specification.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
self Dict[string, FilteredNotificationSpec]

FilteredNotificationSpec

Notification with filters that is sent only if values in notification message match the filters.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
filter Notification filter specification for filtering the incident by the values of its fields. NotificationFilterSpec
target Notification target addresses for each of the status. IncidentNotificationTargetSpec
priority The priority of the notification. Notifications are sent to the first notification targets that matches the filters when processAdditionalFilters is not set. integer
process_additional_filters Flag to break sending next notifications. Setting to true allows to send next notification from the list in priority order that matches the filter. boolean
disabled Flag to turn off the notification filter. boolean
message Message with the details of the filtered notification such as purpose explanation, SLA note, etc. string
do_not_create_incidents Flag to remove incident that match the filters. boolean

NotificationFilterSpec

Filter for filtered notifications.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
connection Connection name. Supports search patterns in the format: 'source*', '*_prod', 'prefix*suffix'. string
schema Schema name. This field accepts search patterns in the format: 'schema_name_*', '*_schema', 'prefix*suffix'. string
table Table name. This field accepts search patterns in the format: 'table_name_*', '*table', 'prefix*suffix'. string
table_priority Table priority. integer
data_group_name Data group name. This field accepts search patterns in the format: 'group_name_*', '*group', 'prefix*suffix'. string
quality_dimension Quality dimension. string
check_category The target check category, for example: nulls, volume, anomaly. string
check_type The target type of checks to run. Supported values are profiling, monitoring and partitioned. enum profiling
monitoring
partitioned
check_name The target check name to run only this named check. Uses the short check name which is the name of the deepest folder in the checks folder. This field supports search patterns such as: 'profiling_*', '*count', 'profiling*_percent'. string
highest_severity Highest severity. integer

IncidentNotificationTargetSpec

Configuration of addresses used for new or updated incident's notifications. Specifies the webhook URLs or email addresses where the notification messages are sent.

The structure of this object is described below

 Property name   Description                       Data type   Enum values   Default value   Sample values 
incident_opened_addresses Notification address(es) where the notification messages describing new incidents are pushed using a HTTP POST request (for webhook address) or an SMTP (for email address). The format of the JSON message is documented in the IncidentNotificationMessage object. string
incident_acknowledged_addresses Notification address(es) where the notification messages describing acknowledged messages are pushed using a HTTP POST request (for webhook address) or an SMTP (for email address). The format of the JSON message is documented in the IncidentNotificationMessage object. string
incident_resolved_addresses Notification address(es) where the notification messages describing resolved messages are pushed using a HTTP POST request (for webhook address) or an SMTP (for email address). The format of the JSON message is documented in the IncidentNotificationMessage object. string
incident_muted_addresses Notification address(es) where the notification messages describing muted messages are pushed using a HTTP POST request (for webhook address) or an SMTP (for email address). The format of the JSON message is documented in the IncidentNotificationMessage object. string

LabelSetSpec

A collection of unique labels assigned to items (tables, columns, checks) that can be targeted for a data quality check execution.










ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: http://dqops.com/docs/reference/yaml/ConnectionYaml/#duckdbparametersspec

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy