DQOps reposted this
There are two ways to ensure healthy data: find and fix them with data quality methods or prevent them by data integrity constraints. At first glance, you could ask why we waste time on data quality management if we could enforce data integrity by database constraints. Well, we cannot verify everything with database constraints. The expressions that we can use are limited and cannot reference third-party data sources. Additionally, if there is no way to force a person to correct a bad record before saving, the invalid records would be ignored and lost forever. It often happens when we receive a copy of data, such as an old list of products or customers. We have to accept the data as it is and find and cleanse the issues - that is the purpose of data quality management. #dataquality #dataengineering #datagovernance
This is why I really like #datavault 's approach of parallel data loading capabilities and pure focus on business keys. It allows to easily integrate and load data without constraints from various sources and measure data quality in a centralized way. In that way strategies can be assessed according to priorities, needs and feasibility
Solving for data instrumentation completely through AI agents would solve the root of it. One person among many, can make the mistake of not adhering to the tracking standards, but AI agents won't do that. Just that the tracking standards should encompass all/most potential cases.
Great points, Piotr! It's interesting how often we overlook the role of user training in ensuring data quality. Even with top-notch systems in place, if users aren't aware of best practices or common pitfalls, bad data slips through.
Interesting
Very informative! Thanks for sharing insights and best practices on Data Quality.
Senior Product Manager in Data | I talk about human centred Data, AI and Business Strategy - Data Governance, Data Quality, AI Automations and UX Design Principles.
1moGreat graphic to explain these concepts!