Skip to content

TobikoData/sqlmesh-utils

 
 

Repository files navigation

SQLMesh Utils

This repository contains things that are not included in SQLMesh Core for various reasons but may be useful in very specific cases.

For example, the custom materializations included here will typically do things like relax constraints / guarantees of the SQLMesh Core model kinds to work around environment-specific limitations.

Usage

Install the sqlmesh-utils library into your SQLMesh Python environment:

$ pip install sqlmesh-utils

Custom Materializations

After installing the sqlmesh-utils library above, you can reference the custom materializations via the CUSTOM model kind like so:

MODEL (
    name my_db.my_model,
    kind CUSTOM (
        materialization 'custom_materialization_name',
        materialization_properties (
            config_key = 'config_value'
        )
    )
);

Before using any of these materializations, you should take the time to understand the tradeoffs. There is generally a reason they are not in upstream SQLMesh.

Non-Idempotent Incremental By Time Range

This behaves similar to INCREMENTAL_BY_TIME_RANGE model kind in SQLMesh, but with one important difference - it loads and restates data using a MERGE statement instead of INSERT OVERWRITE or DELETE+INSERT.

The reason you might want to use it is to prevent dirty reads during data restatement on engines that do not support atomically replacing a partition of data, such as Trino on Iceberg / Delta Lake.

Due to the use of a MERGE statement, this materialization type supports upserts only. That is, if records are deleted from the source data, these deletions will not be reflected in the target table. So the downside of using this materialization is that it's possible to end up with ghost records in your target table after a restatement. For this reason, we call it "non-idempotent".

Note

Note that some engines can propagate deletes in a MERGE statement using syntax like WHEN NOT MATCHED [IN SOURCE] THEN DELETE. However, this is not part of ANSI SQL so engines like Trino and Postgres do not implement it.

Usage

This mostly follows the same usage as INCREMENTAL_BY_TIME_RANGE:

MODEL (
    name my_db.my_model,
    kind CUSTOM (
        materialization 'non_idempotent_incremental_by_time_range',
        time_column event_timestamp,
        primary_key (event_id, event_source)
    )
);

SELECT event_id, event_source, event_data, event_timestamp
FROM upstream.table

The properties are as follows:

time_column

This is the column in the dataset that contains the timestamp. It follows the same syntax as upstream INCREMENTAL_BY_TIME_RANGE.

primary_key

This is the column or combination of columns that uniquely identifies a record.

The columns listed here are used in the ON clause of the SQL Merge to join the source and target datasets.

Note that the time_column is not automatically injected into this list (to allow timestamps on records to be updated), so if the time_column does actually form part of the primary key in your dataset then it needs to be added here.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy