0% found this document useful (0 votes)

11 views12 pages

Can Generative AI Transform Data Quality

Uploaded by

Rene Barber

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views12 pages

Can Generative AI Transform Data Quality

Uploaded by

Rene Barber

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Research Article

Published: 2024-11-29
https://doi.org/10.20935/AcadEng7407

Can generative AI transform data quality? a critical

discussion of ChatGPT’s capabilities
Otmane Azeroual1, *
Academic Editors: Dimitrios A. Karras and Angeles Blanco

Abstract
Data quality (DQ) is a fundamental element for the reliability and utility of data across various domains. The emergence of generative AI
technologies, such as GPT-4, has introduced innovative methods for automating data cleaning, validation, and enhancement processes.
This paper investigates the role of generative AI, particularly ChatGPT, in transforming data quality. We assess the effectiveness of these
technologies in error identification and correction, data consistency validation, and metadata enhancement. Our study includes empirical
results demonstrating how generative AI can significantly improve DQ. The findings suggest that generative AI and ChatGPT have a
transformative impact on data management practices, offering new opportunities for enhancing data quality across various applications.

Keywords: data quality (DQ); generative AI; GPT-4; ChatGPT; data cleaning; metadata enhancement

Citation: Azeroual O. Can generative AI transform data quality? a critical discussion of ChatGPT’s capabilities. Academia Engineering
2024;1. https://doi.org/10.20935/AcadEng7407

1. Introduction
In the contemporary data-driven landscape, the quality of data is and metadata enhancement [8]. Empirical studies reveal that GPT-
critical for accurate decision-making, operational efficiency, and 4’s application in data quality management can lead to substantial
the dependability of data-dependent systems [1]. Low data quality improvements.
can lead to incorrect conclusions, operational inefficiencies, and
ChatGPT, a variant of GPT-4, is optimized for conversational tasks
substantial risks [2]. As organizations increasingly handle vast
and can interact with data dynamically and intuitively [9]. It can
amounts of data, ensuring their quality has become essential.
automatically correct metadata errors, infer missing information,
Traditional data cleaning and validation methods, though effective, and enrich data by adding relevant details [10]. Its conversational
are often labor-intensive and susceptible to human error [3]. These interface facilitates a more interactive and user-friendly approach
methods generally involve manual processes such as identifying to data management, making it accessible to users with varying
and correcting inconsistencies, validating data against predefined levels of technical expertise [11].
standards, and enriching metadata. Despite diligent efforts, human
This paper explores the potential of generative AI, with a focus
involvement introduces variability and potential inaccuracies, par-
on ChatGPT, in transforming data quality. We critically evalu-
ticularly as data volume and complexity continue to grow [4].
ate whether these interfaces can be relied upon to enhance data
The advent of generative AI technologies offers promising solu- quality. This paper includes an analysis of GPT-4 and ChatGPT’s
tions to these challenges. Generative AI, exemplified by advanced effectiveness in error correction, data consistency validation, and
interfaces like GPT-4, provides novel approaches for automating metadata enhancement, supported by quantitative results and case
data cleaning, validation, and enhancement processes [5]. These studies.
interfaces excel in natural language processing (NLP) tasks due to
The implications of this research are profound. Demonstrating that
their ability to understand and generate human-like text, making
generative AI can reliably improve data quality could revolutionize
them particularly adept at tasks requiring contextual understand-
data management practices, leading to higher accuracy and effi-
ing and linguistic capabilities [6].
ciency while reducing reliance on manual processes. Furthermore,
GPT-4, the fourth generation of the Generative Pre-trained Trans- the scalability of AI-driven solutions could enable more effective
former, has shown remarkable proficiency in various NLP tasks [7]. management of larger datasets, addressing the increasing demand
Its capability to generate coherent and contextually relevant text for high-quality data.
enables automation in error detection, data consistency validation,

1
German Centre for Higher Education Research and Science Studies (DZHW), 10117 Berlin, Germany.
∗
email: azeroual@dzhw.eu

ACADEMIA ENGINEERING 2024, 1 1 of 12

https://www.academia.edu/journals/academia-engineering/about https://doi.org/10.20935/AcadEng7407

In conclusion, this paper provides a thorough evaluation of gen- need for more advanced and scalable solutions for data quality
erative AI and ChatGPT’s capabilities in enhancing data quality. management [20]. The limitations of traditional methods under-
By establishing their reliability, we aim to support the broader score the necessity for innovative approaches that can automate
adoption of these technologies in data management, contributing and enhance data cleaning and validation processes while main-
to more accurate, efficient, and reliable data systems. taining high levels of accuracy and consistency [21].

2. 3. Generative AI and GPT-4: introduction and

2. Background and literature review applications in data processing
2. 1. Data quality (DQ): definition, importance, and
Generative AI refers to a class of artificial intelligence models
challenges
designed to generate new data instances that resemble a given
Data quality (DQ) refers to the condition of data based on factors training dataset [22]. These models can create text, images, audio,
such as accuracy, completeness, reliability, relevance, and timeli- and other types of data, making them highly versatile tools for
ness [12]. High-quality data are essential for various organizational various applications. Generative AI encompasses a broad range of
activities, including decision-making, operational processes, and models and techniques, each with its specific capabilities and use
strategic planning [13]. Accurate and reliable data ensure that deci- cases.
sions are based on factual information, leading to better outcomes.
They support operational efficiency by reducing the likelihood of
errors and the need for rework. Additionally, many industries are 2. 3. 1. GPT-4 overview
subject to regulatory requirements that necessitate precise and One of the most prominent generative AI models is the Generative
comprehensive data reporting. High-quality data also enhance cus- Pre-trained Transformer (GPT) series developed by OpenAI [7].
tomer satisfaction by providing accurate information and timely GPT-4, the fourth iteration of the GPT model, represents a sig-
responses to inquiries [14]. nificant advancement in this series. Unlike general generative AI
Despite its critical importance, maintaining high-quality data models that might focus on different types of data or tasks, GPT-4
presents several challenges. Manual data entry can introduce er- is specifically designed for advanced natural language understand-
rors such as typographical mistakes, misclassifications, and omis- ing and generation [23]. Its architecture enables it to process
sions [15]. Integrating data from multiple sources often leads to and generate coherent, contextually relevant text based on large
inconsistencies and duplicate records. Data can quickly become datasets. This specialization makes GPT-4 particularly effective for
outdated, necessitating regular updates to maintain their relevance applications involving complex language tasks.
and accuracy [16]. Complex data structures, especially unstruc-
tured data like text and images, pose difficulties in standardization 2. 3. 2. Clarification of GPT-4 and ChatGPT
and validation. As data volumes grow, the task of maintaining
quality becomes increasingly challenging due to the sheer amount It is important to differentiate between GPT-4 and its variant
of data that need to be processed. ChatGPT to fully understand their applications and capabilities.
GPT-4 encompasses a broad range of models within the GPT
2. 2. Traditional data cleaning and validation methods: framework, each optimized for different types of tasks and data
overview, limitations, and need for improvement processing needs. ChatGPT, a conversational variant of GPT-4,
is specifically tailored for interactive dialogue and context-aware
Traditional methods for data cleaning and validation typically
communication [7]. While GPT-4 as a general model can perform
involve a combination of manual processes and rule-based au-
a variety of language-related tasks, ChatGPT is designed to excel in
tomated systems [17]. Manual review requires data stewards to
generating human-like conversational responses.
inspect datasets and identify and correct errors. This process relies
heavily on human expertise and is labor-intensive. Rule-based
systems use predefined rules to identify anomalies and validate 2. 3. 3. Applications in data processing
data. Common rules include format checks, range checks, and
consistency checks. Deduplication processes identify and merge Generative AI, particularly GPT-4, has found numerous applica-
duplicate records to ensure a single version of the truth. Stan- tions in data processing, including error correction, data valida-
dardization converts data into a common format or structure to tion, metadata generation, and more (see Table 1). The capabil-
facilitate consistency and comparison. ities of GPT-4 are harnessed to perform several key functions:

While these traditional methods have been foundational in data 1. Error Identification and Correction: GPT-4 can analyze
quality management, they have several limitations. Manual pro- datasets to identify and correct errors by understanding the
cesses are not scalable and become impractical as data volumes context and providing accurate suggestions. This capability is
increase [18]. Manual review and correction are time-consuming, enhanced by its ability to generate human-like text that aligns
leading to delays in data availability. Human intervention intro- with the intended data structure.
duces variability, with different data stewards potentially applying
different standards and practices. Rule-based systems can be rigid 2. Data Validation: GPT-4 can validate data against prede-
and may not adapt well to changing data patterns or new types of fined standards, flagging inconsistencies or anomalies with a
errors. Traditional methods often struggle with unstructured data high degree of accuracy. This function is crucial for ensuring
such as text, images, and videos, which are increasingly prevalent data integrity and compliance with quality standards.
in modern datasets [19].
3. Metadata Generation and Enrichment: GPT-4 can au-
The increasing volume, variety, and velocity of data highlight the tomatically generate and enrich metadata, improving data

ACADEMIA ENGINEERING 2024, 1 2 of 12

https://www.academia.edu/journals/academia-engineering/about https://doi.org/10.20935/AcadEng7407

organization and retrieval. By understanding the content, established norms. The model’s ability to understand the context
GPT-4 can create meaningful metadata that enhance data and structure of the data allows it to detect typographical mistakes,
accessibility. inconsistent formatting, and logical inconsistencies. For example,
in a dataset containing date entries in multiple formats, GPT-4
4. Text Summarization and Insight Extraction: GPT-4
can identify these inconsistencies by comparing each entry to stan-
excels in summarizing large volumes of text data, extracting
dard date formats and analyzing the surrounding data context.
key information, and providing insights. This ability is useful
Additionally, GPT-4 can detect logical errors such as mismatched
for managing and analyzing extensive datasets efficiently.
or contradictory information by cross-referencing different parts
5. Natural Language Processing (NLP) Applications: of the dataset. Empirical evidence shows that GPT-4’s contextual
GPT-4’s advanced NLP capabilities enable applications such analysis can reduce error rates significantly. For instance, in a case
as sentiment analysis, language translation, and content gen- study [25, 26], GPT-4 identified discrepancies in patient records,
eration [24]. These applications leverage GPT-4’s proficiency such as mismatched age and birthdate fields, by cross-referencing
in handling nuanced language tasks. the data with other entries and external databases, leading to a 30%
reduction in data entry errors [27].

2. 3. 4. Comparison and clarification

• Correction of Data Errors
While GPT-4 provides a broad range of capabilities in natural lan-
guage understanding and generation, ChatGPT, a variant of GPT- Once errors are identified, GPT-4 generates correction suggestions
4, is optimized for conversational interactions. This distinction based on contextual understanding. For instance, if a numerical
highlights that GPT-4 encompasses a diverse group of language entry falls outside the expected range, the model can infer the
models, each suited to different tasks. ChatGPT’s design focuses correct value by analyzing similar entries or applying predefined
specifically on engaging in dialogues and providing contextually business rules. The correction process involves generating poten-
appropriate responses, differentiating it from the more generalized tial solutions and evaluating their fit within the dataset’s context.
GPT-4 model [7]. In a practical example, GPT-4 was used to enhance data quality
in a healthcare dataset [25, 26], where the model identified and
2. 3. 5. Transformative Potential in data quality corrected discrepancies, resulting in improved accuracy of patient
management records. Quantitative results indicate a significant reduction in
data entry errors and an enhancement in data accuracy [10].
Both GPT-4 and ChatGPT offer transformative potential for data
quality management. By automating and enhancing traditional 3. 2. Data validation
data cleaning, validation, and metadata management processes,
these AI models address the limitations of conventional methods. Generative AI ensures data consistency and accuracy through ad-
Organizations can achieve higher levels of data accuracy, consis- vanced validation techniques. GPT-4 performs various checks to
tency, and completeness by leveraging GPT-4’s advanced text gen- confirm that data adhere to predefined standards and formats,
eration capabilities and ChatGPT’s conversational strengths [7]. which is crucial for maintaining data integrity.

• Ensuring Data Consistency

3. Generative AI for data quality
enhancement GPT-4 validates data consistency by ensuring that all entries con-
Figure 1 illustrates the comprehensive process flow of how gener- form to uniform standards. This involves verifying that data en-
ative AI, particularly models like GPT-4, can be utilized to enhance tries, such as dates, follow a consistent format and that numerical
data quality. This process not only improves data integrity but values fall within acceptable ranges. In financial datasets, for exam-
also enhances the overall reliability and usability of the data. The ple, GPT-4 ensures that all monetary values are correctly formatted
process is divided into three main components, error detection and and in the appropriate currency [28, 29]. By maintaining uniform
correction, data validation, and metadata enhancement, which are data formats and logical coherence, the model enhances overall
described below. data integrity. Consistency checks can also be applied to categorical
data, ensuring that all entries fall within predefined categories.
3. 1. Error detection and correction
Generative AI models, particularly GPT-4, have shown substantial • Ensuring Data Accuracy
promise in revolutionizing error detection and correction in data
management. These models leverage advanced natural language To ensure data accuracy, GPT-4 uses cross-referencing techniques
processing (NLP) capabilities to identify and rectify errors by un- to compare data entries with external sources. For example, in
derstanding the context and semantics of data. This process is research databases, GPT-4 can verify author names, publication
crucial for maintaining high data quality, which is essential for dates, and journal titles against trusted external databases such
accurate analysis and decision-making. as PubMed or CrossRef [30]. This cross-referencing process helps
identify and correct discrepancies, thereby enhancing the reliabil-
• Identification of Data Errors ity of the data. A notable case study in the financial sector demon-
strated GPT-4’s validation capabilities [31, 32]. The model cross-
GPT-4 identifies data errors by comparing each entry against an ex- referenced transaction records with external banking databases to
tensive training dataset and recognizing patterns that deviate from verify accuracy, uncovering and correcting duplicated transactions

ACADEMIA ENGINEERING 2024, 1 3 of 12

https://www.academia.edu/journals/academia-engineering/about https://doi.org/10.20935/AcadEng7407

Table 1 • Summary of generative AI applications and performance metrics.

Application LLM models used Comparison Use cases Training and test Quantitative Remarks,
problem between GPT data results discussions, and
versions open problems
Error GPT-3, GPT-4 GPT-3 vs. GPT-4 Brown, T. B. (2020). Diverse Text GPT-4 outperforms Discussion on model
Identification and Language models Corpora GPT-3 in certain scaling
Correction are few-shot tasks
learners. arXiv
preprint
arXiv:2005.14165.
Data Validation GPT-4 - Nori, H., King, N., Benchmark Data High accuracy in Challenges with
McKinney, S. M., validation large datasets
Carignan, D., &
Horvitz, E. (2023).
Capabilities of gpt-4
on medical
challenge problems.
arXiv preprint
arXiv:2303.13375.
Metadata GPT-4 - Angelici, P. Text Corpora Improved metadata Applications in
Generation and Enhancing quality document
Enrichment documents review management
through Knowledge
Graphs and Large
Language Models
(Doctoral
dissertation).
Text GPT-4 - Kumar, J. (2023). Summarization Effective Comparison of
Summarization Large language Datasets summarization of summarization
models for text large texts capabilities
summarization: A
comprehensive
study. Pranjana: The
Journal of
Management
Awareness,
26(1and2), 113-124.
NLP Applications GPT-4, ChatGPT GPT-4 vs. ChatGPT Liu, Y., Han, T., Ma, Diverse NLP Advances in Conversational
S., Zhang, J., Yang, Datasets sentiment analysis strengths of
Y., Tian, J., ... & Ge, and translation ChatGPT
B. (2023). Summary
of chatgpt-related
research and
perspective towards
the future of large
language models.
Meta-Radiology,
100017.

Figure 1 • Process flow for data quality enhancement using generative AI.

ACADEMIA ENGINEERING 2024, 1 4 of 12

https://www.academia.edu/journals/academia-engineering/about https://doi.org/10.20935/AcadEng7407

and incorrect amounts, which led to more accurate financial re- 4. Implementation strategies
porting. Quantitative results showed a notable improvement in
The integration of generative AI models, such as GPT-4, into ex-
data accuracy and reliability [33].
isting data management systems requires a structured approach to
ensure seamless operation, scalability, and efficiency. This section
3. 3. Metadata enhancement outlines the key strategies for successful implementation, focusing
Metadata play a crucial role in data organization and retrieval. on integration with current systems, scalability and efficiency,
ChatGPT, a conversational variant of GPT-4, excels in enhancing and detailed case studies of successful applications across various
metadata by correcting, completing, and enriching them. By im- sectors.
proving the quality of metadata, these models make data more
accessible and useful, facilitating better data management and 4. 1. Integration with existing systems
utilization.
Integrating generative AI models into current data management
systems necessitates a thorough understanding of both the tech-
• Correction of Metadata nical requirements and the best practices for implementation. A
detailed technical assessment is needed to evaluate the compat-
ChatGPT corrects metadata errors by identifying and rectifying ibility of the AI models with existing data management systems,
inaccuracies based on the content’s context. For example, if meta- including evaluating system architecture and data workflows. The
data include incorrect author names or publication dates, ChatGPT process begins with an assessment of the existing infrastructure to
can suggest corrections by analyzing the document and its context. identify compatibility with AI models. Key technical requirements
This improves the accuracy and reliability of metadata records. In include robust computing power, sufficient storage capacity, and
practice, ChatGPT has been used to correct metadata in academic high-speed internet connectivity to support the large-scale data
research databases, ensuring that critical fields like author names processing capabilities of models like GPT-4 [7].
and publication dates are accurate [34, 35]. The model’s contex- To facilitate integration, it is crucial to develop APIs (Application
tual understanding allows it to make informed corrections, such Programming Interfaces) that allow for seamless communication
as distinguishing between similarly named authors or accurately between the AI models and the existing systems. These APIs should
interpreting abbreviated publication dates. support bidirectional data flow and real-time processing to maxi-
mize efficiency. These APIs act as bridges, enabling data exchange
• Completion of Metadata and functionality extension without significant overhauls of the
current infrastructure. Additionally, implementing middleware so-
ChatGPT also infers and completes missing metadata fields by lutions can help manage the data flow and ensure that the AI
analyzing the available content. For instance, it can generate rel- models can access and process data efficiently. Middleware can
evant keywords for datasets that lack this information. In a library also provide logging and error-handling capabilities to improve
database, ChatGPT was used to complete missing metadata for system robustness [37].
thousands of books, significantly enhancing the searchability of the Best practices for integration include the use of modular archi-
database by adding pertinent keywords and descriptors [36]. The tecture, which allows different components of the system to be
model’s inferential capabilities enable it to generate plausible and updated or replaced without affecting the entire system. This ap-
contextually appropriate metadata, even when explicit information proach includes incorporating version control and rollback mech-
is missing. This includes generating abstracts for research articles anisms to handle updates and changes seamlessly. This approach
or deducing the subject matter of books based on their content. promotes flexibility and adaptability, essential for integrating ad-
vanced AI technologies. Regular monitoring and maintenance are
• Enrichment of Metadata also critical to address any performance issues promptly and en-
sure the AI models operate optimally. Quantitative performance
metrics should be established to continuously evaluate the AI sys-
Beyond correcting and completing metadata, ChatGPT enriches
tem’s impact on data quality and processing efficiency [38].
metadata by adding relevant details that enhance data accessibility
and usability. For example, in a research database, ChatGPT can
add abstracts, keywords, and related research topics to each en- 4. 2. Scalability and efficiency
try. This enrichment process makes the data more accessible and Generative AI models offer significant benefits in terms of scalabil-
useful for researchers and other users. A notable case study in an ity and efficiency, particularly when handling large datasets. These
academic research database highlighted ChatGPT’s effectiveness. models can process vast amounts of data quickly, improving pro-
The model corrected existing metadata errors, completed missing cessing times and enabling real-time data analysis. Scalability can
fields, and enriched entries with abstracts and keywords, thereby be further enhanced by employing parallel processing techniques
improving the overall quality and utility of the database and mak- and optimizing algorithms to handle increasing data loads effec-
ing it easier for researchers to find relevant articles. tively. The scalability of AI-driven solutions ensures that as data
These capabilities demonstrate how generative AI, particularly volumes grow, the systems can expand to accommodate increased
GPT-4 and ChatGPT, can significantly enhance data quality demand without compromising performance.
through advanced error detection and correction, rigorous data One of the primary advantages of AI-driven data management
validation, and comprehensive metadata enhancement. is the ability to automate repetitive tasks, such as data cleaning
and validation. Automated workflows can be designed to handle
various data formats and types, reducing manual intervention. This

ACADEMIA ENGINEERING 2024, 1 5 of 12

https://www.academia.edu/journals/academia-engineering/about https://doi.org/10.20935/AcadEng7407

automation not only reduces the workload on human operators organizations can achieve significant improvements in data accu-
but also minimizes the potential for human error, leading to more racy, processing efficiency, and scalability, ultimately leading to
accurate and reliable data. Additionally, AI models can continu- better decision-making and operational outcomes. The incorpora-
ously learn and adapt, enhancing their efficiency over time as they tion of quantitative results and detailed examples highlights the
process more data and refine their algorithms. practical benefits and effectiveness of generative AI in real-world
applications.
To achieve scalability, it is essential to leverage cloud-based so-
lutions that provide the necessary computational resources on
demand. Cloud solutions should be selected based on their ability 5. Evaluation of generative AI solutions
to handle large-scale AI tasks and their integration capabilities
with existing systems. Cloud platforms offer flexibility in resource The evaluation of generative AI solutions for data quality enhance-
allocation, allowing organizations to scale their AI models ac- ment requires a multi-faceted approach. This section explores the
cording to their needs without significant upfront investments in critical performance metrics used to assess the effectiveness of AI
hardware [39]. Furthermore, distributed computing techniques models, compares AI-driven methods with traditional techniques,
can be employed to divide the data processing tasks across multiple and delves into user feedback and interaction to understand the
machines, further enhancing scalability and efficiency. Empiri- end-user experience.
cal studies have shown that distributed computing can improve
processing efficiency by up to 40% in large-scale data environ- 5. 1. Performance metrics
ments [40].
Evaluating the effectiveness of generative AI models in improving
data quality involves several key performance metrics. These met-
4. 3. Case studies rics provide quantitative measures to assess the AI models’ impact
1. Healthcare Sector: In the healthcare sector, a prominent case on data accuracy, consistency, and overall quality.
study involves the integration of GPT-4 into a hospital’s elec-
tronic health record (EHR) system [41]. The AI model was 1. Error Reduction Rates: One of the primary metrics is
used to enhance data quality by identifying and correcting dis- the error reduction rate, which measures the decrease in
crepancies in patient records. Quantitative results from this data errors after applying AI-driven solutions. This metric is
integration showed a 30% reduction in data entry errors and a crucial for understanding how effectively AI models identify
25% increase in the accuracy of patient records. For example, and correct data inaccuracies. To provide a clearer picture,
the model detected inconsistencies between recorded ages it is helpful to report the percentage decrease in error rates
and birthdates by cross-referencing with external medical and compare it with benchmarks from similar AI implementa-
databases. This integration led to a significant reduction in tions. For instance, in a healthcare dataset, an error reduction
data entry errors, improving the accuracy of patient records rate might be measured by comparing the number of discrep-
and facilitating better patient care. ancies in patient records before and after implementing the
AI intervention.
2. Financial Sector: In the financial sector, a leading bank im-
plemented GPT-4 to streamline its transaction validation pro- 2. Consistency Checks: Another essential metric is the con-
cess [42]. The model was integrated with the bank’s transac- sistency check rate, which evaluates an AI model’s ability to
tion processing system to validate transaction records against ensure uniformity in data entries. This includes verifying that
external banking databases. The implementation led to a 20% data formats, such as dates and numerical values, adhere to
improvement in transaction validation speed and a 15% re- predefined standards. Quantifying the consistency check rate
duction in errors. This integration helped identify and correct by reporting the percentage of data entries that conform to
duplicated transactions and incorrect amounts, enhancing standard formats before and after AI implementation pro-
the accuracy of financial reporting. The AI-driven solution vides a more detailed assessment of performance. For exam-
also significantly reduced the time required for transaction ple, a model’s success in standardizing date formats across a
validation, improving operational efficiency. large dataset can be quantified to gauge its consistency check
rate.
3. Research and Academia: In academic research, GPT-4 was
integrated into a university’s research database to enhance 3. Validation Accuracy: Validation accuracy assesses an AI
metadata quality [5]. The model corrected errors in metadata model’s capability to validate data against external sources
entries, completed missing fields, and enriched the metadata accurately. This metric can be measured by the percentage
with additional details such as abstracts and keywords. This of data entries correctly validated through cross-referencing
integration resulted in a 40% increase in metadata accuracy with trusted databases. Including case-specific examples and
and a 50% improvement in the discoverability of relevant numerical validation accuracy improvements can highlight an
research articles. This integration made the research database AI model’s effectiveness. In a financial context, this might in-
more accessible and useful for researchers, facilitating easier volve verifying transaction records against banking databases
discovery of relevant articles and improving the overall user and calculating the proportion of accurately validated en-
experience. tries [43].

4. Processing Time: The efficiency of AI models is often eval-

These case studies demonstrate the transformative potential of uated by measuring the processing time required for data
generative AI models in enhancing data quality across various cleaning and validation tasks. Reduced processing times in-
sectors. By integrating AI-driven solutions into existing systems, dicate higher efficiency, which is particularly important when

ACADEMIA ENGINEERING 2024, 1 6 of 12

https://www.academia.edu/journals/academia-engineering/about https://doi.org/10.20935/AcadEng7407

dealing with large datasets. Documenting processing time re- 2. Trustworthiness: Trust in AI-enhanced data quality pro-
ductions with specific before and after metrics helps illustrate cesses is built on the consistent delivery of accurate and re-
the efficiency gains achieved through AI implementation. This liable results. Providing transparency in AI decision-making
metric can be compared before and after AI implementation processes, such as detailed reports on correction algorithms
to assess improvements in operational efficiency. and error handling, can further build user trust. Users tend
to trust AI solutions when they observe significant improve-
5. 2. Comparison with traditional methods ments in data quality and reduced error rates. Additionally,
transparency in an AI model’s decision-making process, such
To fully understand the advantages of AI-driven methods, it is as providing explanations for corrections made, can further
essential to compare them with traditional data cleaning and val- enhance user trust.
idation techniques. This comparison involves assessing accuracy,
efficiency, and reliability. 3. Feedback Mechanisms: Incorporating user feedback
mechanisms into AI-driven data quality systems allows for
1. Accuracy: Traditional methods often rely on manual data continuous improvement. Feedback systems that enable
entry and validation, which are prone to human error. AI- users to provide input on AI performance and suggest
driven methods, leveraging sophisticated algorithms, signif- enhancements contribute to iterative improvements and
icantly enhance accuracy by automating error detection and alignment with user needs. Users can report issues, suggest
correction processes. Empirical studies have demonstrated enhancements, and share their experiences, which can be
that AI models like GPT-4 can achieve accuracy improve- used to refine and optimize AI models. This iterative feedback
ments of up to 25% compared to manual methods [44]. loop ensures that AI solutions remain aligned with user needs
Studies have shown that AI models like GPT-4 can achieve and expectations.
higher accuracy rates in identifying and correcting data errors
compared to manual methods.
In conclusion, evaluating generative AI solutions for data quality
2. Efficiency: Traditional data quality processes can be time- enhancement involves assessing performance metrics, comparing
consuming and labor-intensive. AI-driven solutions offer sub- AI-driven methods with traditional techniques, and considering
stantial improvements in efficiency by automating repetitive user feedback and interaction. These comprehensive evaluation
tasks and processing large volumes of data in a fraction of strategies ensure that AI models not only improve data accuracy
the time. Quantitative comparisons showing that AI models and efficiency but also gain user acceptance and trust, ultimately
can reduce task completion time by 70% or more compared leading to more robust and reliable data management practices.
to traditional methods underline their efficiency [45]. For
instance, while manual validation of financial transactions
might take hours or days, AI models can complete the same 6. Challenges and considerations
task in minutes, leading to significant time savings.
Implementing generative AI models for data quality enhancement
3. Reliability: The reliability of AI-driven methods surpasses presents several challenges and considerations that need to be
that of traditional techniques due to their ability to consis- carefully addressed. These include the intricacies of model training
tently apply predefined rules and standards. AI models are and customization, data privacy and security concerns, and the
capable of maintaining high reliability levels by performing ethical implications of automating data quality management.
extensive testing and validation to minimize errors [46]. AI
models do not suffer from fatigue or cognitive biases, which Training generative AI models for specific datasets is a complex
can affect human operators. Consequently, the reliability of task that requires significant expertise and resources. Each dataset
data processed by AI models is typically higher, ensuring has unique characteristics and idiosyncrasies that must be un-
consistent data quality over time. derstood and incorporated into the training process. This cus-
tomization is essential to ensure that the AI model can effectively
identify and correct errors within the context of the specific indus-
5. 3. User feedback and interaction
try it is applied to. For instance, healthcare datasets may require
Understanding how end users perceive AI-enhanced data quality domain-specific adjustments to recognize medical terminologies
processes is crucial for evaluating the overall success of generative and patient record formats, while financial datasets might need
AI solutions. User feedback provides insights into the usability and fine-tuning to detect transaction anomalies and fraud indicators.
trustworthiness of these advanced technologies. Tailoring the model to these specific needs involves extensive train-
ing on relevant data, which can be time-consuming and resource-
intensive. Additionally, the dynamic nature of data means that
1. Ease of Use: Users generally appreciate the ease of use
models must be continuously updated and retrained to maintain
provided by AI-driven data quality tools. User experience
their effectiveness as new data patterns emerge. Implementing
studies can provide quantitative data on how much time and
continuous learning mechanisms and automated retraining sched-
effort are saved through automation, highlighting the benefits
ules can help address this challenge and ensure models stay current
of reduced manual intervention. The automation of complex
with evolving data trends.
tasks reduces the need for extensive manual intervention,
allowing users to focus on more strategic activities. User inter- Data privacy and security are paramount concerns when using AI
faces designed for these tools often feature intuitive workflows for data processing. The integration of AI models into data man-
that simplify the data management process, contributing to agement systems necessitates the handling of potentially sensitive
positive user experiences. information. Ensuring that data privacy is maintained throughout

ACADEMIA ENGINEERING 2024, 1 7 of 12

https://www.academia.edu/journals/academia-engineering/about https://doi.org/10.20935/AcadEng7407

the process is critical. This includes implementing robust encryp- 7. 1. Advancements in AI technology
tion methods and secure data storage solutions to protect against
The rapid pace of AI development suggests significant potential
unauthorized access and breaches. Using privacy-preserving tech-
for future enhancements in generative AI models. One area of ad-
niques such as differential privacy or federated learning can further
vancement is the improvement in natural language understanding
enhance data protection while still leveraging the power of AI.
and generation capabilities. Future iterations of models like GPT-4
Moreover, the AI models themselves must be designed to com-
could possess an even deeper contextual understanding, enabling
ply with data protection regulations, such as the General Data
more precise error detection and correction. Improving the ability
Protection Regulation (GDPR) or the Health Insurance Portabil-
to handle complex queries and nuanced language will enhance the
ity and Accountability Act (HIPAA), depending on the industry.
model’s effectiveness in diverse data scenarios. Enhanced contex-
Ensuring compliance with these regulations requires a thorough
tual awareness will allow AI to better handle ambiguous data and
understanding of legal requirements and the implementation of
make more nuanced corrections, improving overall data quality.
rigorous security protocols. Additionally, transparency in how data
are processed and used by AI models can help build trust with Additionally, advancements in multi-modal AI, which integrates
stakeholders and ensure that privacy concerns are adequately ad- text, image, and possibly other forms of data, could revolutionize
dressed. how generative AI models approach data quality. Future develop-
ments in multi-modal capabilities could involve integrating audio
The ethical implications of automating data quality management
and video data, further expanding AI’s ability to process and correct
also warrant careful consideration. One significant ethical con-
a wide range of data types. For example, integrating visual data
cern is the potential for bias in AI models. Bias can arise from
analysis could help in correcting errors in datasets that include
various sources, including biased training data or inherent biases
images or other non-textual elements. This multi-modal approach
in the algorithms used. To address these issues, it is essential to
could broaden the applicability of AI in various fields, such as
incorporate strategies for detecting, mitigating, and correcting bias
healthcare, where medical images and patient records need to be
throughout the AI lifecycle. If not properly addressed, this bias can
analyzed concurrently.
lead to unequal treatment or erroneous data corrections, which can
have far-reaching consequences, especially in sensitive fields like Another promising development is the integration of reinforce-
healthcare or criminal justice. To mitigate these risks, it is crucial ment learning with generative AI models. Reinforcement learning
to implement strategies for detecting and correcting bias in AI could enable models to learn from their corrections over time,
models. This includes diverse and representative training datasets, continually improving their performance through feedback loops.
regular audits of model performance, and the involvement of mul- By creating adaptive systems that refine their algorithms based
tidisciplinary teams in the development and deployment of AI on real-world feedback, AI can achieve higher levels of accuracy
solutions. and efficiency in maintaining data quality. This adaptive learning
process would allow AI to become more efficient and accurate in
Another ethical consideration is the need for transparency in AI
maintaining data quality across diverse and dynamic datasets.
operations. Users and stakeholders should be informed about how
AI models make decisions and corrections. This transparency can
help build trust in AI-driven processes and ensure accountability. 7. 2. Broader applications
Providing mechanisms for users to query the AI’s decision-making
Generative AI’s potential extends beyond traditional data quality
process and offer feedback on AI-driven corrections can enhance
tasks. One promising application is in the realm of predictive
trust and accountability. Providing explanations for AI-driven cor-
analytics. By enhancing the quality of historical data, generative
rections and allowing users to review and over-ride these sug-
AI can improve the accuracy of predictive models used in various
gestions can enhance the overall reliability and acceptance of AI
industries, such as finance and healthcare. Enhancing predictive
solutions.
models with AI can lead to more accurate forecasts and better
In conclusion, while generative AI models hold great promise for decision-making capabilities across these sectors. High-quality in-
improving data quality, their implementation involves several chal- put data are crucial for reliable predictions, and generative AI can
lenges and considerations. Effective model training and customiza- play a key role in ensuring this foundational accuracy.
tion, robust data privacy and security measures, and addressing
Another area is in automating data integration processes. Many
ethical implications are critical to the successful deployment of
organizations struggle with integrating data from multiple sources,
AI-driven data quality enhancement solutions. By carefully navi-
often leading to inconsistencies and errors. AI can facilitate data
gating these challenges, organizations can harness the full potential
integration by harmonizing disparate data sources and automating
of generative AI to achieve high standards of data integrity and
reconciliation tasks, reducing the manual effort involved. Gener-
reliability.
ative AI can streamline this process by automatically reconciling
data discrepancies and ensuring consistent data formats. This ca-
pability can significantly enhance the efficiency of data warehous-
7. Future directions
ing and business intelligence operations.
The future of generative AI in data quality enhancement is promis-
Furthermore, generative AI can be applied to enhance the quality
ing, with numerous advancements and broader applications on
of real-time data streams. Incorporating AI-driven error detection
the horizon. This section explores potential developments in AI
in real-time systems can help in promptly addressing issues as they
technology, the expansion of AI’s role in data management, and
arise, improving the reliability of live data feeds. In industries such
areas for future research.
as telecommunications and IoT, where real-time data are critical,
generative AI can help in identifying and correcting errors on the
fly, ensuring that decision-making processes are based on accurate

ACADEMIA ENGINEERING 2024, 1 8 of 12

https://www.academia.edu/journals/academia-engineering/about https://doi.org/10.20935/AcadEng7407

and reliable data. The practical implications of using generative AI and ChatGPT
for data quality improvement are profound. Organizations can
7. 3. Research opportunities benefit from increased accuracy in their datasets, which directly
impacts decision-making processes and operational efficiency. For
Despite the progress, there are numerous opportunities for further instance, the automation of data cleaning and validation not only
research in the field of generative AI and data quality enhancement. accelerates these processes but also reduces human error, allow-
One key area is the development of more sophisticated algorithms ing resources to focus on strategic decision-making and innova-
that can handle the complexities of large-scale, heterogeneous tion. Additionally, the ability to cross-reference data with external
datasets. Research focused on improving the scalability and ro- sources and apply business rules ensures that the data remain
bustness of AI models will be essential for managing increasingly consistent and accurate over time. This consistency is crucial for
large and diverse data environments. Current AI models often maintaining high data integrity and supporting reliable analytics.
struggle with the variability and scale of big data, and research fo- These advancements make generative AI an invaluable tool for
cused on the scalability and robustness of these models is essential. maintaining high standards of data integrity across various indus-
Another research opportunity lies in addressing the limitations re- tries.
lated to AI bias. Developing methods to identify, mitigate, and pre- The transformative potential of generative AI and ChatGPT in the
vent biases in AI models remains a critical area of study. Research realm of data management cannot be overstated. These interfaces
into advanced bias detection algorithms and fairness-enhancing offer a scalable, efficient, and accurate approach to data quality
interventions can contribute to more equitable AI systems. This in- management, addressing many of the limitations associated with
cludes creating more diverse and representative training datasets, traditional methods. By automating complex data processes, gen-
as well as developing techniques to ensure fairness and trans- erative AI not only enhances the accuracy of data but also im-
parency in AI-driven data quality processes. proves overall workflow efficiency and reduces operational costs.
Exploring the ethical implications of AI in data management also However, challenges such as interface training, data privacy, and
warrants further research. Investigating the impact of AI decisions integration complexities must be carefully navigated to fully realize
on various stakeholder groups and developing ethical guidelines its benefits. Addressing these challenges through robust interface
for AI deployment can help ensure responsible AI use. Understand- development practices, stringent data protection measures, and
ing how AI decisions impact stakeholders and developing frame- seamless integration strategies is essential for maximizing the
works to ensure ethical AI practices are crucial as the technology value of these AI-driven solutions.
becomes more integrated into everyday operations. Generative AI, exemplified by interfaces like GPT-4 and ChatGPT,
Lastly, interdisciplinary research that combines insights from com- marks a significant leap forward in automating and enhancing
puter science, data science, and domain-specific expertise can lead data quality processes. These technologies not only streamline data
to more tailored and effective AI solutions. Collaborative efforts management but also make data more accessible and reliable, thus
that bridge theoretical advancements with practical applications empowering organizations to make better-informed decisions. As
will ensure that AI models address the unique challenges of differ- organizations continue to adopt these AI technologies, they will
ent industries effectively. Collaborative efforts can help bridge the likely see improved data governance and more effective use of
gap between theoretical advancements and practical applications, data-driven insights. The findings of this paper suggest that gener-
ensuring that generative AI models are well suited to address the ative AI and ChatGPT can indeed transform data quality, providing
unique challenges of different industries. powerful tools for researchers and practitioners in their quest for
high-quality data.
In conclusion, the future of generative AI in data quality enhance-
ment is bright, with significant advancements, broader applica- In conclusion, the integration of generative AI technologies, such
tions, and ample research opportunities on the horizon. Continued as GPT-4 and ChatGPT, represents a significant advancement in
innovation and interdisciplinary collaboration will be key to un- the quest for high-quality data. These interfaces offer scalable,
locking the full potential of AI in transforming data management efficient, and accurate solutions for data quality management, po-
practices. sitioning them as essential tools for organizations striving to main-
tain reliable and valuable datasets in an increasingly data-driven
world. The continued evolution and widespread adoption of these
8. Conclusions AI-driven processes will play a critical role in shaping the future of
data management, ensuring that data remain a reliable foundation
This paper has explored the transformative potential of generative for innovation and progress.
AI, particularly interfaces like GPT-4 and ChatGPT, in enhancing
data quality. Generative AI interfaces demonstrate significant ca-
pabilities in error detection and correction, data validation, and
metadata enhancement. By leveraging advanced natural language
Funding
processing and contextual analysis, these interfaces can identify The author declares no financial support for the research, author-
and rectify errors, ensure data consistency, and enrich metadata, ship, or publication of this article.
thus improving the overall reliability and usability of datasets. The
successful implementation of these technologies, as evidenced by
case studies across sectors such as healthcare and finance, under- Author contributions
scores their effectiveness in driving substantial improvements in
The sole author, Otmane Azeroual, was responsible for all aspects
data quality and operational efficiency.
of the research, including conceptualization, methodology, data
collection, analysis, and the writing of the manuscript. The author

ACADEMIA ENGINEERING 2024, 1 9 of 12

https://www.academia.edu/journals/academia-engineering/about https://doi.org/10.20935/AcadEng7407

has read and approved the final version of the manuscript. 4. Chen CP, Zhang CY. Data-intensive applications, challenges,
techniques and technologies: A survey on Big Data. Inf Sci.
2014;275:314–47. doi: 10.1016/j.ins.2014.01.015
Conflict of interest
5. Sufi F. Generative pre-trained transformer (GPT) in re-
The author declares no conflicts of interest. search: A systematic review on data augmentation. Informa-
tion. 2024;15(2):99. doi: 10.3390/info15020099
Data availability statement 6. Bonner E, Lege R, Frazier E. Large Language Model-Based
This study does not report any data. Artificial Intelligence in the Language Classroom: Practical
Ideas for Teaching. Teach Engl Technol. 2023;23(1):23–41.
doi: 10.56297/BKAM1691/WIEO1749
Institutional review board statement 7. Yenduri G, Ramalingam M, Selvi GC, Supriya Y, Srivas-
Not applicable. tava G, Maddikunta PKR, et al. GPT (generative pre-trained
transformer)–a comprehensive review on enabling tech-
nologies, potential applications, emerging challenges, and
Informed consent statement future directions. IEEE Access. 2024. doi: 10.1109/AC-
Not applicable. CESS.2024.3389497

8. Saka A, Taiwo R, Saka N, Salami BA, Ajayi S, Akande K,

Additional information et al. GPT models in construction industry: Opportunities,
limitations, and a use case validation. Dev Built Environ.
Received: 2024-07-15 2023;100300. doi: 10.1016/j.dibe.2023.100300
Accepted: 2024-10-24 9. Hassani H, Silva ES. The role of ChatGPT in data science:
Published: 2024-11-29 how ai-assisted conversational interfaces are revolutioniz-
ing the field. Big Data Cogn Comput. 2023;7(2):62. doi:
Academia Engineering papers should be cited as Academia 10.3390/bdcc7020062
Engineering 2024, ISSN 2994-7065, https://doi.org/
10.20935/AcadEng7407. The journal’s official abbreviation is 10. Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I, Ale-
Acad. Eng. man FL, et al. GPT-4 technical report. arXiv preprint 2023,
arXiv:2303.08774. doi: 10.48550/arXiv.2303.08774

Publisher’s note 11. Atlas S. ChatGPT for higher education and professional de-
velopment: A guide to conversational AI. [cited 2024-06-
Academia.edu Journals stays neutral with regard to jurisdictional 20]. Available from: https://digitalcommons.uri.edu/cgi/vie
claims in published maps and institutional affiliations. All claims wcontent.cgi?article=1547&context=cba_facpubs.
expressed in this article are solely those of the authors and do
not necessarily represent those of their affiliated organizations, or 12. Sidi F, Panahy PH, Affendey LS, Jabar MA, Ibrahim H,
those of the publisher, the editors and the reviewers. Any product Mustapha A. Data quality: A survey of data quality dimen-
that may be evaluated in this article, or claim that may be made by sions. In: 2012 International Conference on Information Re-
its manufacturer, is not guaranteed or endorsed by the publisher. trieval & Knowledge Management. Piscataway (NJ): IEEE;
2012; p. 300–4. doi: 10.1109/InfRKM.2012.6204995

13. Ghasemaghaei M, Ebrahimi S, Hassanein K. Data an-

Copyright alytics competency for improving firm decision making
©2024 copyright by the author. This article is an open access performance. J Strateg Inf Syst. 2018;27(1):101–113. doi:
article distributed under the terms and conditions of the Creative 10.1016/j.jsis.2017.10.001
Commons Attribution (CC BY) license (https://creativecommons.
14. Lee YW, Strong DM. Knowing-why about data processes
org/licenses/by/4.0/).
and data quality. J Manag Inf Syst. 2003;20(3):13–39. doi:
10.1080/07421222.2003.11045775
References 15. Pannekoek J, Scholtus S, Van der Loo M. Automated and
1. McGilvray D. Executing data quality projects: Ten steps to manual data editing: a view on process design and method-
quality data and trusted information (TM). Cambridge (MA): ology. J Off Stat. 2013;29(4):511–537. doi: 10.2478/jos-2013-
Academic Press; 2021. 0038

2. Batini C, Cappiello C, Francalanci C, Maurino A. 16. Adadi A. A survey on data-efficient algorithms in big data era.
Methodologies for data quality assessment and J Big Data. 2021;8(1):24. doi: 10.1186/s40537-021-00419-9
improvement. ACM Comput Surv (CSUR). 2009;41(3):1–52.
17. Hosseinzadeh M, Azhir E, Ahmed OH, Ghafour MY, Ahmed
doi: 10.1145/1541880.1541883
SH, Rahmani AM, et al. Data cleansing mechanisms and ap-
3. Ridzuan F, Zainon WMNW. A review on data cleansing meth- proaches for big data analytics: a systematic study. J Ambient
ods for big data. Procedia Comput Sci. 2019;161:731–38. doi: Intell Human Comput. 2023;1–13. doi: 10.1007/s12652-021-
10.1016/j.procs.2019.11.177 03590-2

ACADEMIA ENGINEERING 2024, 1 10 of 12

https://www.academia.edu/journals/academia-engineering/about https://doi.org/10.20935/AcadEng7407

18. Balusamy B, Kadry S, Gandomi AH. Big concepts technology, 32. Niszczota P, Abbas S. GPT has become financially literate:
and architecture. Hoboken (NJ): John Wiley & Sons; 2021. Insights from financial literacy tests of GPT and a preliminary
test of how people use it as a source of advice. Finance Res
19. Zadgaonkar A, Agrawal AJ. An Approach for analyzing Lett. 2023;58:104333. doi: 10.1016/j.frl.2023.104333
unstructured text data using topic modeling techniques
for efficient information extraction. New Gen Comput. 33. Woo H, Kim J, Lee W. Analysis of cross-referencing artificial
2024;42(1):109–34. doi: 10.1007/s00354-023-00230-5 intelligence topics based on sentence modeling. Appl Sci.
2020;10(11):3681. doi: 10.3390/app10113681
20. Taleb I, Serhani MA, Bouhaddioui C, Dssouli R. Big data
quality framework: a holistic approach to continuous quality 34. Nazarovets S, Teixeira da Silva JA. ChatGPT as
management. J Big Data. 2021;8(1):76. doi: 10.1186/s40537- an “author”: Bibliometric analysis to assess the
021-00468-0 validity of authorship. Account Res. 2024, 1–11. doi:
10.1080/08989621.2024.2345713
21. Aldoseri A, Al-Khalifa KN, Hamouda AM. Re-thinking data
35. Shopovski J. Generative Artificial Intelligence, AI for
strategy and integration for artificial intelligence: concepts,
Scientific Writing: A Literature Review. Preprints.
opportunities, and challenges. Appl Sci. 2023;13(12):7082.
2024;2024060011. doi: 10.20944/preprints202406.0011.v1
doi: 10.3390/app13127082
36. Yang SQ, Mason S. Beyond the Algorithm: Understanding
22. Harshvardhan GM, Gourisaria MK, Pandey M, Rautaray SS.
How ChatGPT Handles Complex Library Queries.
A comprehensive survey and analysis of generative models
Internet Ref Serv Q. 2024;28(2):97–151. doi:
in machine learning. Comput Sci Rev. 2020;38:100285. doi:
10.1080/10875301.2023.2291441
10.1016/j.cosrev.2020.100285
37. Dipsis N, Stathis K. A RESTful middleware for AI con-
23. Aydın Ö, Karaarslan E. Is ChatGPT leading generative AI? trolled sensors, actuators and smart devices. J Ambient Intell
What is beyond expectations? Acad Platform J Eng Smart Human Comput. 2020;11(7):2963–86. doi: 10.1007/s12652-
Syst. 2023;11(3):118–34. doi: 10.21541/apjess.1293702 019-01439-3

24. Obaid AJ, Bhushan B, Rajest SS, editors. Advanced appli- 38. Ahmed S. Performance Evaluation and Metrics: Advances in
cations of generative AI and natural language processing Management Science. Manag Sci Lett. 2024;2(1):39–50.
models. Hershey (PA): IGI Global; 2023.
39. Rossi M, Russo G. Innovative Solutions: Cloud Computing
25. Nori H, King N, McKinney SM, Carignan D, Horvitz E. Ca- and AI Synergy in Software Engineering. MZ J Artif Intell.
pabilities of GPT-4 on medical challenge problems. arXiv 2024;1(1):1–9 [cited 2024-08-13]. Available from: https://aa
preprint. 2023. arXiv:2303.13375 rlj.com/index.php/AARLJ/article/view/15.

26. Lahat A, Sharif K, Zoabi N, Shneor Patt Y, Sharif Y, Fisher L, 40. Sai S, Kanadia M, Chamola V. Empowering IoT with
et al. Assessing Generative Pre-trained Transformers (GPT) Generative AI: Applications, Case Studies, and Limita-
in Clinical Decision-Making: Comparative Analysis of GPT- tions. IEEE Internet Things Mag. 2024;7(3):38–43. doi:
3.5 and GPT-4. J Med Internet Res. 2024;26:e54571. doi: 10.1109/IOTM.001.2300246
10.2196/54571
41. Afshar M, Gao Y, Wills G, Wang J, Churpek MM, West-
enberger CJ, et al. Prompt Engineering GPT-4 to Answer
27. Sai S, Gaur A, Sai R, Chamola V, Guizani M, Rodrigues JJ.
Patient Inquiries: A Real-Time Implementation in the Elec-
Generative AI for transformative healthcare: A comprehen-
tronic Health Record across Provider Clinics. medRxiv.
sive study of emerging models, applications, case studies and
2024;2024–01. doi: 10.1101/2024.01.23.24301692
limitations. Piscataway (NJ): IEEE Access; 2024.
42. Huang K, Chen X, Yang Y, Ponnapalli J, Huang G. ChatGPT
28. Fatouros G, Soldatos J, Kouroumali K, Makridis G, Kyr-
in Finance and Banking. In: Beyond AI: ChatGPT, Web3,
iazis D. Transforming sentiment analysis in the financial
and the Business Landscape of Tomorrow. Cham: Springer
domain with ChatGPT. Mach Learn Appl. 2023;14:100508.
Nature Switzerland; 2023; p. 187–218. doi: 10.1007/978-3-
doi: 10.1016/j.mlwa.2023.100508
031-45282-6_7
29. Yuan Z, Wang K, Zhu S, Yuan Y, Zhou J, Zhu Y, et al. Fin- 43. Mosteanu NR, Faccia A. Digital systems and new challenges
LLMs: A Framework for Financial Reasoning Dataset Gen- of financial management–FinTech, XBRL, blockchain and
eration with Large Language Models. arXiv preprint. 2024. cryptocurrencies. Qual Access Success. 2020;21(174):159–
arXiv:2401.10744 66.

30. Liu Y, Han T, Ma S, Zhang J, Yang Y, Tian J, et al. 44. Mandapuram M, Gutlapalli SS, Bodepudi A, Reddy M. In-
Summary of ChatGPT-related research and perspective to- vestigating the Prospects of Generative Artificial Intelli-
wards the future of large language models. Meta-Radiology. gence. Asian J Humanit Art Lit. 2018;5(2):167–74. doi:
2023;100017. doi: 10.48550/arXiv.2304.01852 10.18034/ajhal.v5i2.659

31. Bhatia G, Nagoudi EMB, Cavusoglu H, Abdul-Mageed M. 45. Stadlmann C, Zehetner A. Human Intelligence Versus Artifi-
Fintral: A family of GPT-4 level multimodal financial large cial Intelligence: A Comparison of Traditional and AI-Based
language models. arXiv preprint. 2024. arXiv:2402.10986 Methods for Prospect Generation. In: Marketing and Smart

ACADEMIA ENGINEERING 2024, 1 11 of 12

https://www.academia.edu/journals/academia-engineering/about https://doi.org/10.20935/AcadEng7407

Technologies: Proceedings of ICMarkTech 2020. Singapore: Statistical perspectives on reliability of artificial

Springer Singapore; 2021; pp. 11–22. intelligence systems. Qual Eng. 2023;35(1):56–78. doi:
10.1080/08982112.2022.2089854
46. Hong Y, Lian J, Xu L, Min J, Wang Y, Freeman LJ, Deng X.

ACADEMIA ENGINEERING 2024, 1 12 of 12

Multi-Dimensional Data Quality
No ratings yet
Multi-Dimensional Data Quality
26 pages
Optimizing Data Warehousing with Advanced AI Modeling Techniques
No ratings yet
Optimizing Data Warehousing with Advanced AI Modeling Techniques
25 pages
Explainable AI in The Context of Data Engineering: Unveiling The Black Box in The Pipeline
No ratings yet
Explainable AI in The Context of Data Engineering: Unveiling The Black Box in The Pipeline
6 pages
Analysis On Data Engineering - Solving Data Preparation Tasks With Chatgpt To Finish Data Preparation
No ratings yet
Analysis On Data Engineering - Solving Data Preparation Tasks With Chatgpt To Finish Data Preparation
10 pages
【密】透视麦肯锡 (EN)
No ratings yet
【密】透视麦肯锡 (EN)
643 pages
Sbi Power Capsule for Sbi Clerk Mains 2025 Part 2 1
No ratings yet
Sbi Power Capsule for Sbi Clerk Mains 2025 Part 2 1
74 pages
Essential Teachings 2 Handout
100% (2)
Essential Teachings 2 Handout
64 pages
B21 HST
No ratings yet
B21 HST
342 pages
Homeopathic Part B
No ratings yet
Homeopathic Part B
171 pages
Doubles in Bridge
No ratings yet
Doubles in Bridge
3 pages
300Mbps Wireless-N AP/ Repeater/ Router Client: WF2419 User Manual
No ratings yet
300Mbps Wireless-N AP/ Repeater/ Router Client: WF2419 User Manual
60 pages
NS-LAB Manual
No ratings yet
NS-LAB Manual
72 pages
350-2104-Fms7 Nvis Rev Nc
No ratings yet
350-2104-Fms7 Nvis Rev Nc
14 pages
Footwear
100% (1)
Footwear
28 pages
IE 503: Operations Analysis: Jayendran Venkateswaran Ie & or
No ratings yet
IE 503: Operations Analysis: Jayendran Venkateswaran Ie & or
29 pages
Healthcare Kaizen: Engaging Front-Line Staff in Sustainable Continuous Improvements
No ratings yet
Healthcare Kaizen: Engaging Front-Line Staff in Sustainable Continuous Improvements
60 pages
DOWANY - VIVALDI - Concerto Op. 10 Nº 3, RV 428 IL GARDELINO
No ratings yet
DOWANY - VIVALDI - Concerto Op. 10 Nº 3, RV 428 IL GARDELINO
9 pages
Wildlife Protection Act
100% (1)
Wildlife Protection Act
10 pages
Business Model Innovation
No ratings yet
Business Model Innovation
4 pages
Low-Profile Wideband Dual-Circularly Polarized Metasurface Antenna Based on Traveling-Wave Sequential Feeding Mechanism
No ratings yet
Low-Profile Wideband Dual-Circularly Polarized Metasurface Antenna Based on Traveling-Wave Sequential Feeding Mechanism
5 pages
Vietnam Flight
No ratings yet
Vietnam Flight
4 pages
Dimitar Dechkov Dimitrov - Resume
No ratings yet
Dimitar Dechkov Dimitrov - Resume
2 pages
Learning Outcomes 1
No ratings yet
Learning Outcomes 1
5 pages
Ricoh Im c6500 c8000 Brochure
No ratings yet
Ricoh Im c6500 c8000 Brochure
4 pages
Report 2 Reconductoring 138 KV Barotac Viejo To Dingle NGCP
No ratings yet
Report 2 Reconductoring 138 KV Barotac Viejo To Dingle NGCP
2 pages
Inspection and Test Plan: Reference Documents Legend
No ratings yet
Inspection and Test Plan: Reference Documents Legend
4 pages
Reading Exam 4
No ratings yet
Reading Exam 4
4 pages
2020 Virtual In-Service Training For Teachers: Don Alejandro Rocess Sr. Science-Technology High School
No ratings yet
2020 Virtual In-Service Training For Teachers: Don Alejandro Rocess Sr. Science-Technology High School
8 pages
A Study On Dimensions of Fractal Geometry in Iranian Architecture
No ratings yet
A Study On Dimensions of Fractal Geometry in Iranian Architecture
11 pages
Lesson Plan in Grade 8 English Modals - Lesson Plan in Grade 8 English I. OBJECTIVES at The End of - Studocu
No ratings yet
Lesson Plan in Grade 8 English Modals - Lesson Plan in Grade 8 English I. OBJECTIVES at The End of - Studocu
1 page
Switch On Worksheets 2 Video 6
No ratings yet
Switch On Worksheets 2 Video 6
1 page
Auto Week #2
No ratings yet
Auto Week #2
1 page
The Great Barrier Reef Presentation Revised
No ratings yet
The Great Barrier Reef Presentation Revised
11 pages
The Canterbury Tales
No ratings yet
The Canterbury Tales
2 pages
Metaflow for Data Science Workflows: The Complete Guide for Developers and Engineers
From Everand
Metaflow for Data Science Workflows: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
StreamSets Data Integration Architecture and Design: The Complete Guide for Developers and Engineers
From Everand
StreamSets Data Integration Architecture and Design: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
GreptimeDB Essentials: The Complete Guide for Developers and Engineers
From Everand
GreptimeDB Essentials: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Dgraph Essentials: The Complete Guide for Developers and Engineers
From Everand
Dgraph Essentials: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Deepset Cloud for Intelligent Search and Question Answering: The Complete Guide for Developers and Engineers
From Everand
Deepset Cloud for Intelligent Search and Question Answering: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Introduction to Data Platforms: How to leverage data fabric concepts to engineer your organization's data for today's cloud-based digital world
From Everand
Introduction to Data Platforms: How to leverage data fabric concepts to engineer your organization's data for today's cloud-based digital world
Anthony David Giordano
No ratings yet
Comprehensive Guide to Data Integration with Hevo: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Data Integration with Hevo: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Applied Data Mining with Weka: Definitive Reference for Developers and Engineers
From Everand
Applied Data Mining with Weka: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Data Cleaning with Power BI: The definitive guide to transforming dirty data into actionable insights
From Everand
Data Cleaning with Power BI: The definitive guide to transforming dirty data into actionable insights
Gus Frazer
No ratings yet
Smarter Data Science: Succeeding with Enterprise-Grade Data and AI Projects
From Everand
Smarter Data Science: Succeeding with Enterprise-Grade Data and AI Projects
Neal Fishman
No ratings yet
Python Data Wrangling for Business Analytics: Python for Business Analytics Series
From Everand
Python Data Wrangling for Business Analytics: Python for Business Analytics Series
George Snypes
2/5 (1)
Data Quality: Empowering Businesses with Analytics and AI
From Everand
Data Quality: Empowering Businesses with Analytics and AI
Prashanth Southekal
No ratings yet
Xplenty Data Integration Architecture: Definitive Reference for Developers and Engineers
From Everand
Xplenty Data Integration Architecture: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Pandas Essentials for Data Analysis: Definitive Reference for Developers and Engineers
From Everand
Pandas Essentials for Data Analysis: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Synthetic Data Generation: A Beginner’s Guide
From Everand
Synthetic Data Generation: A Beginner’s Guide
Robert Johnson
No ratings yet
Practical NetCDF Techniques: Definitive Reference for Developers and Engineers
From Everand
Practical NetCDF Techniques: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Practical Parquet Engineering: Definitive Reference for Developers and Engineers
From Everand
Practical Parquet Engineering: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Practical Holistics for Data Analysts: Definitive Reference for Developers and Engineers
From Everand
Practical Holistics for Data Analysts: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Fivetran Data Integration Essentials: Definitive Reference for Developers and Engineers
From Everand
Fivetran Data Integration Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Applied GPT-4 Systems: Definitive Reference for Developers and Engineers
From Everand
Applied GPT-4 Systems: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Advanced Resilient Distributed Datasets in Distributed Computing: Definitive Reference for Developers and Engineers
From Everand
Advanced Resilient Distributed Datasets in Distributed Computing: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Deep Learning with Fast.ai: Definitive Reference for Developers and Engineers
From Everand
Deep Learning with Fast.ai: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Comprehensive Guide to Matillion for Data Integration: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Matillion for Data Integration: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Teradata Architecture and SQL Essentials: Definitive Reference for Developers and Engineers
From Everand
Teradata Architecture and SQL Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Advanced Apache Tez Techniques: Definitive Reference for Developers and Engineers
From Everand
Advanced Apache Tez Techniques: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Talend Data Integration Essentials: Definitive Reference for Developers and Engineers
From Everand
Talend Data Integration Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Applied Techniques for GPT-3: Definitive Reference for Developers and Engineers
From Everand
Applied Techniques for GPT-3: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Data Integration with Blendo: Definitive Reference for Developers and Engineers
From Everand
Data Integration with Blendo: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Data Analytics and Data Processing Essentials
From Everand
Data Analytics and Data Processing Essentials
gareth thomas
No ratings yet
Jitterbit Integration Design and Implementation: Definitive Reference for Developers and Engineers
From Everand
Jitterbit Integration Design and Implementation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
XGBoost in Practice: Definitive Reference for Developers and Engineers
From Everand
XGBoost in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
DataGrip Essentials: Definitive Reference for Developers and Engineers
From Everand
DataGrip Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Comprehensive Guide to Glue for Scientific Data Exploration: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Glue for Scientific Data Exploration: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Informatica Solutions and Data Integration: Definitive Reference for Developers and Engineers
From Everand
Informatica Solutions and Data Integration: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Essential Guide to DataStage Systems: Definitive Reference for Developers and Engineers
From Everand
Essential Guide to DataStage Systems: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Applied Analytics with Spotfire: Definitive Reference for Developers and Engineers
From Everand
Applied Analytics with Spotfire: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Dataiku Platform Foundations: Definitive Reference for Developers and Engineers
From Everand
Dataiku Platform Foundations: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Efficient Data Preparation with AWS Glue DataBrew: Definitive Reference for Developers and Engineers
From Everand
Efficient Data Preparation with AWS Glue DataBrew: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
ChatGPT Application and Integration Guide: Definitive Reference for Developers and Engineers
From Everand
ChatGPT Application and Integration Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Efficient Time Tracking with Toggl: Definitive Reference for Developers and Engineers
From Everand
Efficient Time Tracking with Toggl: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
FineReport System Design and Implementation: Definitive Reference for Developers and Engineers
From Everand
FineReport System Design and Implementation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Pentaho Solutions and Architecture: Definitive Reference for Developers and Engineers
From Everand
Pentaho Solutions and Architecture: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Podio Technical Implementation Guide: Definitive Reference for Developers and Engineers
From Everand
Podio Technical Implementation Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Architecting Real-Time Analytics with Druid: Definitive Reference for Developers and Engineers
From Everand
Architecting Real-Time Analytics with Druid: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mastering LlamaIndex: Simplifying Data Access for Large Language Models
From Everand
Mastering LlamaIndex: Simplifying Data Access for Large Language Models
Robert Johnson
No ratings yet
ThoughtSpot Analytics and Administration: Definitive Reference for Developers and Engineers
From Everand
ThoughtSpot Analytics and Administration: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
InfluxDB Essentials: Definitive Reference for Developers and Engineers
From Everand
InfluxDB Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Efficient Project Collaboration with Freedcamp: Definitive Reference for Developers and Engineers
From Everand
Efficient Project Collaboration with Freedcamp: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
How to Be a Successful Software Project Manager
From Everand
How to Be a Successful Software Project Manager
Dr. Tuhin Chattopadhyay
No ratings yet
Streamlit Development Essentials: Definitive Reference for Developers and Engineers
From Everand
Streamlit Development Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Building and Operating Data Hubs: Using a practical Framework as Toolset
From Everand
Building and Operating Data Hubs: Using a practical Framework as Toolset
Georg Graner
No ratings yet
Workfront Implementation and Optimization Techniques: Definitive Reference for Developers and Engineers
From Everand
Workfront Implementation and Optimization Techniques: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Data Science Career Guide Interview Preparation
From Everand
Data Science Career Guide Interview Preparation
Gradient Publication
No ratings yet
Contextualization of Project Management Practice and Best Practice
From Everand
Contextualization of Project Management Practice and Best Practice
Claude Besner
No ratings yet
Data Entry Operator: Skills, Software, Career Tips, and Interview Q&A
From Everand
Data Entry Operator: Skills, Software, Career Tips, and Interview Q&A
Sumitra Kumari
No ratings yet
Data Science Mastery: From Beginner to Expert in Big Data Analytics
From Everand
Data Science Mastery: From Beginner to Expert in Big Data Analytics
Kameron Hussain
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Can Generative AI Transform Data Quality

Uploaded by

Can Generative AI Transform Data Quality

Uploaded by

Research Article

Can generative AI transform data quality? a critical

ACADEMIA ENGINEERING 2024, 1 1 of 12

2. 3. Generative AI and GPT-4: introduction and

ACADEMIA ENGINEERING 2024, 1 2 of 12

2. 3. 4. Comparison and clarification

• Ensuring Data Consistency

ACADEMIA ENGINEERING 2024, 1 3 of 12

Table 1 • Summary of generative AI applications and performance metrics.

ACADEMIA ENGINEERING 2024, 1 4 of 12

ACADEMIA ENGINEERING 2024, 1 5 of 12

4. Processing Time: The efficiency of AI models is often eval-

ACADEMIA ENGINEERING 2024, 1 6 of 12

ACADEMIA ENGINEERING 2024, 1 7 of 12

ACADEMIA ENGINEERING 2024, 1 8 of 12

ACADEMIA ENGINEERING 2024, 1 9 of 12

8. Saka A, Taiwo R, Saka N, Salami BA, Ajayi S, Akande K,

13. Ghasemaghaei M, Ebrahimi S, Hassanein K. Data an-

ACADEMIA ENGINEERING 2024, 1 10 of 12

ACADEMIA ENGINEERING 2024, 1 11 of 12

Technologies: Proceedings of ICMarkTech 2020. Singapore: Statistical perspectives on reliability of artificial

ACADEMIA ENGINEERING 2024, 1 12 of 12

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.