0% found this document useful (0 votes)

449 views

Keboola Advanced Training - Public PDF

This document provides an overview of Keboola best practices for data transformation and loading. Some key points include: 1. Use a business data model (BDM) to describe the business entities, properties, and relationships in a technology-agnostic way. This provides a shared understanding and supports reuse. 2. Implement a multi-project architecture with staging (L1) and integration (L2) projects to isolate transformations and merge data. 3. Avoid complex transformations in a single phase. Instead, split work into simple atomic queries and reuse intermediate results. 4. Define input/output mappings carefully to reduce load volumes and enable incremental loads based on change detection. 5. Leverage

Uploaded by

xjunp05

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

449 views

Keboola Advanced Training - Public PDF

Uploaded by

xjunp05

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 76

Keboola Best Practices

Keboola Training Overview

● BDM Methodology
● Keboola Basic Training
● Advanced Training (Best Practices)
● Generic Extractor Introduction
● Component Creation
● Project Health
● API Overview
Databases

Data Transformation Data Sandboxes

SaaS Apps

Advertising / RTB
Analysis Ready
Extract Transform Load Time To Value
Flexible Setup

3rd Party ML Data Science

AppStore Catalogue
Online

Public API Augmentation Services

Each component either

writes it’s result to a
storage or loads data from
it.

Storage is a separate layer

that may be implemented
on various technologies
(Snowflake, Redshift,
Azure, etc.)
All-in-one cloud environment
Component & Component Configuration
Transformation & Transformation Buckets
Transformation Phases & Dependencies
Multi-Project Architecture
● Each Department is not limited by a single project. You can stage your data
pipeline into set of project that share data
● Consider it as L1 and L2 transformation projects
○ L1 - Minor cleanup data pipelines, isolated transformations forming single tables
○ L2 - Final data pipelines, possibly merging multiple tables from L1 together to form BDM
objects
Business Data Model (BDM)
● Method of describing a business in the language of data
● Independent on the underlying technology
● Defines and describes “objects”, “properties” and “values” that are key to
the business operation,
● Provides:
○ Extendable data model that supports all current and future data
initiatives
○ Unified terminology and business understanding through data across
departments
○ Precursor of multi-project architecture
BDM example: CRM
Multi-Project Architecture
Transformations
● Transformation Script (SQL, R, Python or
OpenRefine backend) which you can use to
manipulate your data.
● Operate in a completely separate provisioned
workspace created for each transformation.
● Independent of storage backends.
● Versioning, complete history + fast rollback
● Sandboxes
○ Separated environments for each user/project
○ Available for all backends

Limits (Soft)
● Sandbox disk space is limited to 10GB.
● Memory is limited to 8GB.
● Maximum runtime - 6 hrs
● Limits are soft -> it is possible to increase the limits
on request per project.
Avoid ALTER SESSION Statements
● Avoid ALTER SESSION statements within transformations, since it may lead
to unpredictable behaviour. Since transformation phases are executed in a
single SF workspace, ALTER SESSION statements may be executed in
unpredictable order.
● Also the loading and unloading sessions are separate from your
transformation/sandbox session and the format may change unexpectedly.
● Using explicit statements instead of a global SESSION parameter also
leads to better readability -> explicit is always better.
● More info in the DOCs
Avoid ALTER SESSION Statements
Example:

ALTER SESSION SET TIMESTAMP_OUTPUT_FORMAT= 'YYYY-MM-DD

HH24:MI:SS';

Alternative:

SET DEF_TIMESTAMP_FORMAT= 'YYYY-MM-DD HH:MI:SS';

SELECT
TO_CHAR("datetime"::TIMESTAMP,$DEF_TIMESTAMP_FORMAT);
Set Timezone Explicitly
Be careful when working with timestamps. TIMESTAMP casting may convert the date to local
timezone based on the worker location. It is always better to convert explicitly to required
timezone and to use TIMESTAMP_TZ when you need the timezone information.

Dangerous:

ALTER SESSION SET timezone = 'Asia/Bangkok' ;

Better:

SET DEF_TIMEZONE = 'Asia/Bangkok' ;

SELECT CONVERT_TIMEZONE($DEF_TIMEZONE , '2019-03-05 12:00:00

+02:00'::TIMESTAMP_TZ);
Avoid SELECT * Statements
● Generic SELECT * statements should be avoided and columns always listed
explicitly.

Reasoning

● Using * statements makes it diﬃcult to control upstream effects of a

structural change on the input (column added).
● More lines of code are traded-off by clarity - It is clear what columns are
being used directly from the code.
Transformation Phases vs. Dependencies
Phases

● Each phase executes the transformation code in separate workspace

● Phases ensure transformation execution order
● Phases are to be used when you absolutely need to export the product into storage before
another group of tasks processes the outputs, typically when you're combining different
transformation backends within a transformation bucket i.e. preparing sets using SQL in order for
them to be processed by a python script.

Dependencies

● Deﬁne order of transformation execution within single phase

● Use single Snowﬂake workspace
○ Performance advantages - I/O done only once
○ All objects created within phase are visible from all transformations involved
Transformation Phases vs. Dependencies

Running in workspaces
Phases vs. Dependencies: Best Practice
General rule: we suggest using mainly dependencies in order to ensure all
necessary transformations happen when you trigger a single one manually
within the bucket.

Reasoning

● Using dependencies instead of a phase enables execution parallelisation

and parallel I/O loads which in turn provides better performance.
● Objects can be shared across transformations allowing better code
segmentation
SQL Dep Feature
Keboola connection provides integration with SQL Dep Service.

This service is very useful for analysis of transformation queries, it

allows exploring relationships between objects (tables, columns,
queries) created within Transformation.
Avoid Complex Nested Queries
● Split complex nested queries into as many atomic pieces as possible.

Reasoning:

● Better code readability -> Easier maintenance.

● Complex nested queries may easily exceed the DWH query execution time
limit. Using multiple simple queries helps to avoid such issues.
● You can reuse those “temp” tables in multiple queries within same phase.

Also read: https://multithreaded.stitchﬁx.com/blog/2019/05/21/maintainable-etls/

Variables
TRUNCATE and INSERT
Case statements -> Mapping table
Deﬁning Input / Output Mapping
Deﬁning Input / Output Mapping
● Enables declaring what input data from storage is used within the
transformation and gets copied into a secure separate workspace where
the transformation runs and what data is collected back.
● There are several techniques that allows additional control over I/O
mappings and enhance performance of I/O stage.

I/O Features:

● Table import mode

● Filters
● Data-types, cleaning
Copy Table Mode
Default mode of loading tables that allows deﬁnition
of ﬁlters:

Column ﬁlter:
choose a subset of columns - reduces load time and
volume transferred

Incremental load:
Deﬁne “Changed in last” interval to load only data that
had been changed in speciﬁed period

● Not deﬁned on time-dimension within data

itself. Uses table metadata to bring only rows
that had been updated.
Filters

When using ‘Copy Table’ mode it is possible to deﬁne

additional ﬁlters and data-type conversions prior to load.

Data Filter: Simple ﬁlter on particular attribute, simply list of

values that should be included / excluded.

Data types: Deﬁne data types of input columns. By default

everything is VARCHAR, but it is possible to do the casting
on the input level

● Note that using this feature might hide the datatype

conversion from the code and reduce clarity of code.
Clone Table Mode
Advance mode, leveraging zero-copy clone
functionality of Snowﬂake DWH.

● This mode provides signiﬁcant performance

boost when loading extremely large tables in full
or for speedup of transformations with large
amounts of tables on the input.
● Note that when using this mode no filters can be
setup and the table is loaded as is from the
Storage.
● Using `Clone Table` option provides only
performance enhancement, it does not affect the
credits consumed
Incremental vs. Full Load
● Incremental data flow processes only data changed since last successful run
and adds them to previously processed data.
● Full data flow processes everything every day.
● Keboola behaves as follows:
○ Full load
○ Incremental with PK - Type 1 Slowly changing dimension (SCD)
○ Incremental without PK - Type 2 SCD
○ Note: There are ways to have Type 3 SCD, either by utilizing
transformation of a custom component
■ (such as Table Snapshot)
■ leochan.event_snapshotting
Automatic / Manual Incremental Load
Automatic Incremental Load Manual Incremental Load
- Append all data that has been added or - Append all selected data. If a primary key
changed since the last successful run. If a is specified, updates will be applied to
primary key is specified, updates will be rows with matching primary key values.
applied to rows with matching primary
key values.

(Writers only at this point)

Demo
Break Time
Orchestration Notiﬁcations

Set up notiﬁcation of orchestrations to group chat

a. Full transparency
b. Faster reaction type
Event-triggered Orchestration
Trigger your orchestrations based on source
data changes.

● Cooldown period: limits number of

triggers within some period. E.g. max
once per 5 min
● All tables have to be changed for trigger
to apply

What is a change?

● Any table import. Even if it does not

contain any new data.

TIP: Use artiﬁcial “state” tables to control the triggers.

These may contain some additional run info.
Storage
● Storage is the central KBC component managing everything related to storing data and
accessing it.
● Implemented as a layer on top of various database engines / storage services
○ Snowﬂake
○ S3
○ Redshift, Azure

Table Storage
● Stores tabular data created by components or table imports
● [Currently] based on Snowﬂake backend
● Easy backups and restorations - Snapshots, “Timetravel” restores
● No limitation by data types, semi-structured data formats natively supported by Snowﬂake
(JSON, Avro, ORC, Parquet, or XML)
● Organised into “buckets” and tables, can be shared across projects/ organisations
● Can be accessed only via Storage API or interacting with components.

File Storage
● Can be used to store an arbitrary ﬁle (e.g. R models)
● Every data load is stored in Files before it is processed and pushed into a table.
In Complete Control
● Any interaction with Storage is logged
● Every payload, result of a transformation or component is stored in Files before pushing to Storage
● Every Storage job is linked to the actual component job via RunID (searchable in Jobs tab)
Data Lineage
Storage Columns Descriptions
- Columns in Storage now
support Markdown language
for text descriptions, similar to
other Keboola components

Demo
Data Types
● Everything is treated as a String in the Storage
○ More flexibility on loading data
○ Datatype can be defined as a metadata on column in
the Storage
○ Datatypes are also transferred from the DB
extractors
● Generally, data types are needed on the output
○ Define datatypes on columns in output stage -> it
will be transferred to any new writer configuration
○ No need to define them in transformations, except
for some particular cases. E.g. for date operations
DATEADD(day, 2, “date_col”::DATE).
○ Snowflake infers datatypes so SUM(value) works
even if its a String
○ It is recommended to cast explicitly in the code
Override Manual Data Types
- Data types set by extractors can now be overriden by the user, by adding a
new row to the “data type” of the column.
- Source data type will exist for lineage and audit sake, but the override data
type will take precedence.

Demo
Deﬁned Data Types
- Metadata for data types in Storage are now exposed to Storage UI
- Setting data types in Storage is only for metadata purposes, the underlying
Keboola storage is still stored as VARCHAR
- Impact:
- By setting the data type, your transformations will automatically set the input mapping of
the tables to the selected data type
- If the column contains data type errors, your transformations will error out on input phase

More info

Demo
Storage Table Operations
Delete full table
● Use with caution! Don’t do just to replace the
table.
Remove or add primary key
● May cause failure of dependent transformations
(check I/O mappings)
Create column
● May break backward dependencies.
Delete column
● May break forward dependencies.
Change column datatype
● Will be inherited by any new transformation or
writer, make sure the data is valid for the datatype.

NOTE: Always be careful when executing any of these operations and

refer to the Graph view to check any affected dependencies..
Chained Aliases
- An alias can be created from
another alias.
- Also aliases created in Shared
Buckets are propagated to linked
buckets and can be further aliased.
- Previous feature did not allow for
chained aliases or sharing aliases
across projects
Metadata Layer
Conceptually, there are those types of metadata collected or generated:
● The catalog information of datasets, such as schema structure, business description, business
context, dataset business taxonomy, location, responsibility/ownership information, PII tags, etc.
● Operational metadata, which includes the jobs, execution information, such as timestamps,
components writing from/into, etc.
● Lineage information metadata, which is the connection between components(jobs) and datasets.
● Data profiling statistics to reveal the column-level and set-level characteristics of the dataset.
This can be further leveraged for data QA analysis
Metadata collection and generation:
● Storage API
● System-collected information - operational metadata
● Generated data - Data profiling, data lineage
How to NOT use Keboola Storage
Snowﬂake DB, as other analytical databases is not durable for following cases:

● Transactional workloads (OLTP)

● Key-value access with high request rate
● Blob or document storage
● Over-normalized data

Additionally, there is no external access to Keboola storage allowed, due to

metadata layer used on the top of DB.
Use Input Mapping as Health Check
Using Sandboxes
● Safe environment for you to explore, analyze and experiment with copies of
selected data
● Place to troubleshoot and develop transformation scripts without
modifying the actual data
● Unique sandbox per user
● Jupyter, SQL, R
● Online interface & Remote connection credentials
Using Sandboxes
Code Templates
- Save Jupyter
notebook in a file
storage with
predefined tag
- If a sandbox is
loaded from a
transformation, the
transformation code
will be appended
after the template
code.
Code Templates (cont.)
- Similar feature as
Jupyter notebook for
R-Studio
- (_r_sandbox_template_)
- Also ability to create
personal templates

More information:
https://help.keboola.com/ma
nipulation/transformations/sa
ndbox/#code-templates
Markdown Descriptions
● Each component and storage can described with Markdown text
● Important for team work
● Enables “jump-in & fix” work style
● Utilize markdown formatting for text readability (headers, bullet points, etc.)
● Recommended content:
○ Storage: Data structures & types, foreign keys, etc. PII/Personal Data columns,
internal/temp table vs. production
○ Transformations: Function in the dataflow in broad terms, inputs, outputs, TODO section,
unknowns or known quirks to signal possible issues in the future, hardcoded values
○ What happens within transformation, how did you get from A-B and what are the logical
steps of the transformation.
Descriptions in Markdown
Descriptions in Markdown
Versioning
● Configuration change of any
component (that means
extractors, transformation bucket,
writers and orchestrations) -> New
version of the whole configuration
● Available in the UI and API
● Easy RESTORE or FORK
● Tip: Use FORK functionality to make
a quick copy of
configuration/transformation
bucket to avoid manual work
Versioning Improvements
- Save with descriptions
- Transformations UI
Avoiding Tedious Work
Keboola Logs Everything
● Component and Transformation runs
● Versioning
● Storage events
● Tip: Storage imports
Extensive logging
Every Storage Job contains detail with all information about
the execution.

● Run ID referencing the actual component job

● Link to affected table
● Timestamps, duration and transfer size
● File ID as a reference to the actual payload in Files
storage
● Type of load: incremental/full
● Total number of rows imported
● Other results or warnings
Data Retention & Table Snapshots
● Table Snapshots
● All changes saved
○ default data retention range is 7 days
● Ability to trigger ad-hoc snapshot
○ Manually
○ API call
● No need for date and user name

● Component Trash
○ Easy conﬁguration restore
Use the Weapon of Choice
Components
Components

● Extractor – allowing customers to get data from new sources. It only processes input tables from external
sources (usually API).
● Application – further enriching the data or add value in new ways. It processes input tables stored as CSV
ﬁles and generates result tables as CSV ﬁles.
● Writer – pushing data into new systems and consumption methods. It does not generate any data in KBC
project.
● Processor – adjusting the inputs or outputs of other components. It has to be run together with any of the
above components.
● All components are run using the Docker Runner
● All components must adhere to the common interface
● List of all available components is to see on https://components.keboola.com/

Generic extractor

● Generic Extractor is a KBC component acting like a customizable HTTP REST client.
● Can be conﬁgured to extract data from virtually any API and offers a vast amount of conﬁguration options.
● Entirely new extractor for KBC in less than an hour.
Extractors
Extractor types
● Database extractors: SQL Databases and NoSQL MongoDB
● Communication, Social Networks and Marketing and Sales extractors
● Other extractors such as Geocoding-Augmentation or GoogleDrive
● Keboola provided, 3rd part

Generic Extractor

● Very flexible, supports various types of authentication , pagination, nesting objects. Fits almost all REST-like
services.
● Supports iterations (run with multiple parameters)
● Functions
○ allow you to add extra flexibility when needed.
○ Can be used in several places of the Generic Extractor configuration to introduce dynamically generated
values instead of those provided statically.
● Incremental load, remembers last state
● Can be set up and used as a standalone component
A custom component
Common interface
● Predefined set of input and output folders for tables and files,

● a configuration file,
○ Custom defined JSON injected to the docker container
● environment variables and return values.

Other Features

● Logging, manifest files for working with table and file meta-data
● the OAuth part of the configuration file, and
● actions for quick synchronous tasks.
● Docker Runner provides tools for encryption and OAuth2 authorization.
● Custom Science apps (legacy)
● Generic UI
● Processors
● Easy to setup CI workflows - Travis, Quay, Bitbucket pipelines
Processors
Processor is a special type of component which may be used before or after
running an arbitrary component (extractor, writer, etc.).

● They perform an atomic action on the data before load into the
component (writer) or into the Storage.
● They may be plugged to any component.
○ Some components provide UI support for processor, otherwise
they need to be added to a conﬁguration via API call

A Processor would be typically used with a component like S3 Extractor, which

also uses processors on the background. The main advantage is that you do
not need to write any additional code or customize existing application.

Use case: Let’s say you receive a CSV report generated by a legacy system via
S3 and the report contain a report header on 10 lines before actual data =>
such a CSV ﬁle is invalid and would fail while attempting to load directly to a
storage

Solution: Plug in a skip-lines processor that would skip the ﬁrst X rows and
convert the CSV ﬁle to a valid form.
Integration - technical intro
Integrate KBC with other systems.

● Use KBC just to exchange data (using the Storage API).

● Use KBC as a data-handling backbone for your product.
● Wrap KBC in your own UI for your customers.
● Control whole data processing pipeline within KBC from the
outside.
● Control any component of the KBC programmatically
Docker runner (behind the scenes)

Mastering JSON Processing in Snowflake Cheat Sheet
No ratings yet
Mastering JSON Processing in Snowflake Cheat Sheet
2 pages
Welz Weak Signals
No ratings yet
Welz Weak Signals
12 pages
Informatica Deployment Checklist
No ratings yet
Informatica Deployment Checklist
9 pages
Investigative Case Management
0% (1)
Investigative Case Management
5 pages
SQL SERVER - Data Warehousing Interview Questions and Answers Part 1 PDF
No ratings yet
SQL SERVER - Data Warehousing Interview Questions and Answers Part 1 PDF
3 pages
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
Elina Hiltunen, Good Sources of Week Signals
No ratings yet
Elina Hiltunen, Good Sources of Week Signals
24 pages
Business Intelligence & Business Performance Mgt.: อภิชาต ชมภูนุช Sunday, June 27, 2010
No ratings yet
Business Intelligence & Business Performance Mgt.: อภิชาต ชมภูนุช Sunday, June 27, 2010
50 pages
Clover ETL - 1
No ratings yet
Clover ETL - 1
29 pages
A Trio of Interesting Snowflakes - Kimball Group
No ratings yet
A Trio of Interesting Snowflakes - Kimball Group
9 pages
Benefits of Data Archiving in Data Warehouses
100% (1)
Benefits of Data Archiving in Data Warehouses
12 pages
Incident Management LVL 100: Introduction To The Incident Management Application in Servicenow
No ratings yet
Incident Management LVL 100: Introduction To The Incident Management Application in Servicenow
11 pages
03 Etl 081028 2055
No ratings yet
03 Etl 081028 2055
46 pages
DWH Architecture
No ratings yet
DWH Architecture
3 pages
Analyst's Notebook 8
No ratings yet
Analyst's Notebook 8
2 pages
Conncetivity To Change Data Capture
No ratings yet
Conncetivity To Change Data Capture
74 pages
MVA Implementing A Data Warehouse With SQL Jump Start Mod 1 Final
No ratings yet
MVA Implementing A Data Warehouse With SQL Jump Start Mod 1 Final
37 pages
CDM Best Practice
No ratings yet
CDM Best Practice
34 pages
Data Collection, Processing & Organization With USPA Framework
100% (1)
Data Collection, Processing & Organization With USPA Framework
54 pages
ETL Introduction
No ratings yet
ETL Introduction
44 pages
DW Concepts Shiva
No ratings yet
DW Concepts Shiva
32 pages
ETL Specification Review Check List Ods - Ap
No ratings yet
ETL Specification Review Check List Ods - Ap
5 pages
SDD Template
No ratings yet
SDD Template
7 pages
CDC With HDFS Apply
No ratings yet
CDC With HDFS Apply
10 pages
Access Control Snowflake
No ratings yet
Access Control Snowflake
6 pages
ETL Specification Table of Contents: Change Log
No ratings yet
ETL Specification Table of Contents: Change Log
3 pages
Definition of Conceptual Framework
No ratings yet
Definition of Conceptual Framework
3 pages
A Brief History in Time For Data Vault
100% (1)
A Brief History in Time For Data Vault
6 pages
Best Practices - ETL
No ratings yet
Best Practices - ETL
3 pages
Ram Manohar Bheemana: Contact About Me
No ratings yet
Ram Manohar Bheemana: Contact About Me
7 pages
Change Data Capture Error 14234
No ratings yet
Change Data Capture Error 14234
2 pages
ADF Course Deck
No ratings yet
ADF Course Deck
88 pages
Iso/Iec 27001 Solution Brief: Eventtracker
No ratings yet
Iso/Iec 27001 Solution Brief: Eventtracker
8 pages
Best Practices and Solutions For GENESIS64 G64 104
No ratings yet
Best Practices and Solutions For GENESIS64 G64 104
51 pages
American Slavery Powerpoint
100% (1)
American Slavery Powerpoint
13 pages
MDM Use Case
No ratings yet
MDM Use Case
3 pages
Portable Cloud Services Using TOSCA
No ratings yet
Portable Cloud Services Using TOSCA
5 pages
Building Your ETL Framework With BIML
No ratings yet
Building Your ETL Framework With BIML
19 pages
ST FM 2-22.3 Human Intelligence Collector Operations
No ratings yet
ST FM 2-22.3 Human Intelligence Collector Operations
30 pages
RQS - Reithmaier - Redacted PDF
0% (1)
RQS - Reithmaier - Redacted PDF
2 pages
List of ETL Tools
No ratings yet
List of ETL Tools
2 pages
ODI Experts Blog-Changed Data Capture (CDC)
No ratings yet
ODI Experts Blog-Changed Data Capture (CDC)
7 pages
Traffic Analysis: in Military Intelligence
No ratings yet
Traffic Analysis: in Military Intelligence
5 pages
A Tradecraft Primer Basic Structured Analytic
No ratings yet
A Tradecraft Primer Basic Structured Analytic
69 pages
Build A Sales Force of Experts
No ratings yet
Build A Sales Force of Experts
21 pages
ETL Standards For Informatica
100% (2)
ETL Standards For Informatica
16 pages
Informatica Codereview Checklist
100% (1)
Informatica Codereview Checklist
4 pages
DataCaptureMethodsC3 18mar06
No ratings yet
DataCaptureMethodsC3 18mar06
32 pages
Service Delivery Status Report Template
No ratings yet
Service Delivery Status Report Template
6 pages
Software Testing FAQ: Explain The Software Development Lifecycle
No ratings yet
Software Testing FAQ: Explain The Software Development Lifecycle
30 pages
Etl Cook Book PDF
No ratings yet
Etl Cook Book PDF
14 pages
Dwques
No ratings yet
Dwques
5 pages
Factless Fact Table
No ratings yet
Factless Fact Table
5 pages
DODD 3600.01 Information Operations (IO)
No ratings yet
DODD 3600.01 Information Operations (IO)
14 pages
J5 NCD Counterproliferation Operational Architecture
No ratings yet
J5 NCD Counterproliferation Operational Architecture
23 pages
Velocity v8 Data Warehousing Methodology
No ratings yet
Velocity v8 Data Warehousing Methodology
1,106 pages
Oracle Change Data Capture
No ratings yet
Oracle Change Data Capture
31 pages
Co-Evolution of Metamodels and Model Transformations: An operator-based, stepwise approach for the impact resolution of metamodel evolution on model transformations.
From Everand
Co-Evolution of Metamodels and Model Transformations: An operator-based, stepwise approach for the impact resolution of metamodel evolution on model transformations.
Steffen Kruse
No ratings yet
What's New in .NET 8? A Complete Guide to the Latest Features
From Everand
What's New in .NET 8? A Complete Guide to the Latest Features
Nitika
No ratings yet
SAP interface programming with RFC and VBA: Edit SAP data with MS Access
From Everand
SAP interface programming with RFC and VBA: Edit SAP data with MS Access
Karl Josef Hensel
No ratings yet
User Manual: Barcode Scanner Barcode Scanner
No ratings yet
User Manual: Barcode Scanner Barcode Scanner
43 pages
Comms 9 Digital Datacom Answers
No ratings yet
Comms 9 Digital Datacom Answers
5 pages
Postgresql 9.5 Main6
No ratings yet
Postgresql 9.5 Main6
9 pages
Assignment 1 - DSA
No ratings yet
Assignment 1 - DSA
14 pages
Data Structures
No ratings yet
Data Structures
529 pages
Session 10 - Dirvish Backup in Ubuntu
No ratings yet
Session 10 - Dirvish Backup in Ubuntu
11 pages
Toshiba EX100 Serial Driver Help
No ratings yet
Toshiba EX100 Serial Driver Help
21 pages
Test File
No ratings yet
Test File
14 pages
CLASS 8th COMPUTER PT 1 (RPS)
No ratings yet
CLASS 8th COMPUTER PT 1 (RPS)
3 pages
AIX Replace Disk by Mirroring
No ratings yet
AIX Replace Disk by Mirroring
5 pages
RDZ Web Services Tutorial 03 Cics Web Services Enablement (Bottom-Up)
No ratings yet
RDZ Web Services Tutorial 03 Cics Web Services Enablement (Bottom-Up)
77 pages
02-View, Stored Procedure, Function, and Trigger
No ratings yet
02-View, Stored Procedure, Function, and Trigger
29 pages
Teradata SQL Assistant and Java Edition
No ratings yet
Teradata SQL Assistant and Java Edition
22 pages
Course Contents: Oracle 11g Structure Query Language Fundamental-L (SQL)
No ratings yet
Course Contents: Oracle 11g Structure Query Language Fundamental-L (SQL)
15 pages
Ict 9
No ratings yet
Ict 9
5 pages
CBSE Class 12 Information Technology Qunnnestion Paper 2023
No ratings yet
CBSE Class 12 Information Technology Qunnnestion Paper 2023
24 pages
Assignment 2
No ratings yet
Assignment 2
9 pages
Module 2: Information Technology Infrastructure: Chapter 5: Databases and Information Management
No ratings yet
Module 2: Information Technology Infrastructure: Chapter 5: Databases and Information Management
39 pages
CS 241 Data Structures: Dr. Bashir M. Ghandi
No ratings yet
CS 241 Data Structures: Dr. Bashir M. Ghandi
27 pages
32-Bit Programmable Cyclic Redundancy Check (CRC) : Highlights
No ratings yet
32-Bit Programmable Cyclic Redundancy Check (CRC) : Highlights
36 pages
CSE 3rd Sem DSA
0% (1)
CSE 3rd Sem DSA
2 pages
ICT-Note Full and Final
No ratings yet
ICT-Note Full and Final
10 pages
Unit-1 Introduction To PLSQL
No ratings yet
Unit-1 Introduction To PLSQL
63 pages
A Review of Hadoop Security Issues, Threats and Solutions
No ratings yet
A Review of Hadoop Security Issues, Threats and Solutions
6 pages
KTMT Bai 3
No ratings yet
KTMT Bai 3
79 pages
CCS353 Set2
No ratings yet
CCS353 Set2
2 pages
Upgrading To Ifix 5.0: Installing The Hardware Key
No ratings yet
Upgrading To Ifix 5.0: Installing The Hardware Key
9 pages
Ice STUN TURN of NAT
No ratings yet
Ice STUN TURN of NAT
31 pages
Sap On db2 Commands: S.no Command Description
No ratings yet
Sap On db2 Commands: S.no Command Description
4 pages
T24 Uses JBASE As The Back End To Store Its Data
No ratings yet
T24 Uses JBASE As The Back End To Store Its Data
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Keboola Advanced Training - Public PDF

Uploaded by

Keboola Advanced Training - Public PDF

Uploaded by

Keboola Best Practices

Keboola Training Overview

Data Transformation Data Sandboxes

3rd Party ML Data Science

Public API Augmentation Services

Each component either

Storage is a separate layer

ALTER SESSION SET TIMESTAMP_OUTPUT_FORMAT= 'YYYY-MM-DD

SET DEF_TIMESTAMP_FORMAT= 'YYYY-MM-DD HH:MI:SS';

ALTER SESSION SET timezone = 'Asia/Bangkok' ;

SET DEF_TIMEZONE = 'Asia/Bangkok' ;

SELECT CONVERT_TIMEZONE($DEF_TIMEZONE , '2019-03-05 12:00:00

● Using * statements makes it diﬃcult to control upstream effects of a

● Each phase executes the transformation code in separate workspace

● Deﬁne order of transformation execution within single phase

● Using dependencies instead of a phase enables execution parallelisation

This service is very useful for analysis of transformation queries, it

● Better code readability -> Easier maintenance.

Also read: https://multithreaded.stitchﬁx.com/blog/2019/05/21/maintainable-etls/

● Table import mode

● Not deﬁned on time-dimension within data

When using ‘Copy Table’ mode it is possible to deﬁne

Data Filter: Simple ﬁlter on particular attribute, simply list of

Data types: Deﬁne data types of input columns. By default

● Note that using this feature might hide the datatype

● This mode provides signiﬁcant performance

(Writers only at this point)

Set up notiﬁcation of orchestrations to group chat

● Cooldown period: limits number of

● Any table import. Even if it does not

TIP: Use artiﬁcial “state” tables to control the triggers.

NOTE: Always be careful when executing any of these operations and

● Transactional workloads (OLTP)

Additionally, there is no external access to Keboola storage allowed, due to

● Run ID referencing the actual component job

A Processor would be typically used with a component like S3 Extractor, which

● Use KBC just to exchange data (using the Storage API).

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.