See DQOps’ activity on LinkedIn

DQOps reposted this

Founder @ DQOps open-source Data Quality platform | Detect any data quality issue and watch for new issues with Data Observability

2mo

Reliable file ingestion depends on validating the schema, constraints, and data quality rules. File ingestion is a process that has been known since most of us were born. Anyway, it is still worth showing the stages where data quality validation should be performed. There are three steps where flat files, such as CSV, JSON, or XML, should be validated: ⚡ Verify the schema and the ability to read the file after copying it to your own raw file location. The file could be truncated (partially uploaded), or it may not have the required columns. ⚡ Use data quality checks that validate constraints, such as value uniqueness, nulls in required columns, and values not convertible to their target data types. You will detect and reject files that cannot be loaded to a typed staging table. ⚡ Run additional data quality checks defined by data stewards and business users. If some of these checks fail, but it is not a critical severity issue, load the data to the target table anyway. This process requires one more important component. You need a job orchestrator that supports restarting jobs at the step where they failed. You can still retry failed jobs if your data transformation code is wrong or the data quality checks are too restrictive. For data quality checks, use a data quality platform that is callable from the data pipelines, such as DQOps. Check my profile to learn more. #dataquality #dataengineering #datagovernance

4 Comments

Piotr Czarnas

Founder @ DQOps open-source Data Quality platform | Detect any data quality issue and watch for new issues with Data Observability

2mo

The manual for testing CSV files from DQOps is here: https://dqops.com/docs/data-sources/csv/

3 Reactions

Kseniya Soroka MBA, PCC, ORSCC, CPCC, CPQC, Strengths Coach, ICP

Leadership & Relationship Coach specializing in bringing Teams, Couples and Individuals from Good to GREAT

2mo

Hey Piotr, great insights on file ingestion validation! Your breakdown of the three key steps is spot on. Have you considered leveraging AI-powered anomaly detection algorithms for enhanced data quality assurance? It could be a game-changer in streamlining the process.

Cao Thanh Nam

⚡️Database Optimization at Wecommit with Database Development expertise

1mo

Interesting

Ayush Pandey

Student Data Scientist

2mo

Interesting

See more comments

To view or add a comment, sign in

DQOps’ Post

Explore topics

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!