SQLMesh Utils

This repository contains things that are not included in SQLMesh Core for various reasons but may be useful in very specific cases.

For example, the custom materializations included here will typically do things like relax constraints / guarantees of the SQLMesh Core model kinds to work around environment-specific limitations.

Usage

Install the sqlmesh-utils library into your SQLMesh Python environment:

$ pip install sqlmesh-utils

Custom Materializations

After installing the sqlmesh-utils library above, you can reference the custom materializations via the CUSTOM model kind like so:

MODEL (
    name my_db.my_model,
    kind CUSTOM (
        materialization 'custom_materialization_name',
        materialization_properties (
            config_key = 'config_value'
        )
    )
);

Before using any of these materializations, you should take the time to understand the tradeoffs. There is generally a reason they are not in upstream SQLMesh.

Non-Idempotent Incremental By Time Range

This behaves similar to INCREMENTAL_BY_TIME_RANGE model kind in SQLMesh, but with one important difference - it loads and restates data using a MERGE statement instead of INSERT OVERWRITE or DELETE+INSERT.

The reason you might want to use it is to prevent dirty reads during data restatement on engines that do not support atomically replacing a partition of data, such as Trino on Iceberg / Delta Lake.

Due to the use of a MERGE statement, this materialization type supports upserts only. That is, if records are deleted from the source data, these deletions will not be reflected in the target table. So the downside of using this materialization is that it's possible to end up with ghost records in your target table after a restatement. For this reason, we call it "non-idempotent".

Note

Note that some engines can propagate deletes in a MERGE statement using syntax like WHEN NOT MATCHED [IN SOURCE] THEN DELETE. However, this is not part of ANSI SQL so engines like Trino and Postgres do not implement it.

Usage

This mostly follows the same usage as INCREMENTAL_BY_TIME_RANGE:

MODEL (
    name my_db.my_model,
    kind CUSTOM (
        materialization 'non_idempotent_incremental_by_time_range',
        time_column event_timestamp,
        primary_key (event_id, event_source)
    )
);

SELECT event_id, event_source, event_data, event_timestamp
FROM upstream.table

The properties are as follows:

time_column

This is the column in the dataset that contains the timestamp. It follows the same syntax as upstream INCREMENTAL_BY_TIME_RANGE.

primary_key

This is the column or combination of columns that uniquely identifies a record.

The columns listed here are used in the ON clause of the SQL Merge to join the source and target datasets.

Note that the time_column is not automatically injected into this list (to allow timestamps on records to be updated), so if the time_column does actually form part of the primary key in your dataset then it needs to be added here.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github		.github
sqlmesh_utils		sqlmesh_utils
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SQLMesh Utils

Usage

Custom Materializations

Non-Idempotent Incremental By Time Range

Usage

time_column

primary_key

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

License

TobikoData/sqlmesh-utils

Folders and files

Latest commit

History

Repository files navigation

SQLMesh Utils

Usage

Custom Materializations

Non-Idempotent Incremental By Time Range

Usage

time_column

primary_key

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Packages