0% found this document useful (0 votes)
54 views

XAI For All: Can Large Language Models Simplify Explainable AI?

Uploaded by

cuongduong172839
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views

XAI For All: Can Large Language Models Simplify Explainable AI?

Uploaded by

cuongduong172839
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

XAI for All: Can Large Language Models Simplify Explainable AI?

Philip Mavrepisa , Georgios Makridisa , Georgios Fatourosa , Vasileios Koukosa , Maria Margarita Separdanib , Dimosthenis Kyriazisa
a University of Piraeus, Department of Digital Systems, Karaoli ke Dimitriou 80, Piraeus, 18534, Attica, Greece
b University of Piraeus, Department of Maritime Studies, Karaoli ke Dimitriou 80, Piraeus, 18534, Attica, Greece

Abstract
The field of Explainable Artificial Intelligence (XAI) often focuses on users with a strong technical background, making it chal-
lenging for non-experts to understand XAI methods. This paper presents ”x-[plAIn]”, a new approach to make XAI more accessible
to a wider audience through a custom Large Language Model (LLM), developed using ChatGPT Builder. Our goal was to design
a model that can generate clear, concise summaries of various XAI methods, tailored for different audiences, including business
arXiv:2401.13110v1 [cs.AI] 23 Jan 2024

professionals and academics. The key feature of our model is its ability to adapt explanations to match each audience group’s
knowledge level and interests. Our approach still offers timely insights, facilitating the decision-making process by the end users.
Results from our use-case studies show that our model is effective in providing easy-to-understand, audience-specific explanations,
regardless of the XAI method used. This adaptability improves the accessibility of XAI, bridging the gap between complex AI tech-
nologies and their practical applications. Our findings indicate a promising direction for LLMs in making advanced AI concepts
more accessible to a diverse range of users.
Keywords: Explainable AI, Human-Centric Explainable AI, LLM, GPT Builder, Audience Analysis, XAI, AI

1. Introduction design ensures broad applicability and relevance across


a wide spectrum of XAI techniques and knowledge do-
In the contemporary epoch, frequently denoted as the Digital
mains, without necessitating specific training or adaptation
or Information Age, a characteristic feature is the proliferation
for each distinct method. This flexibility marks a signifi-
of sophisticated computational systems generating copious data
cant advancement in the field of XAI.
on a daily basis. This epoch is further defined by the digital
metamorphosis occurring within industrial realms, culminating 3. Decision-Making Facilitation: The model’s capacity to
in the advent of the fourth industrial revolution, Industry 4.0 provide timely, clear, and contextually relevant explana-
Makridis et al. (2020). The cornerstone of this revolutionary tions significantly augments decision-making processes
phase is AI, which stands as the pivotal facilitator of the In- for end-users. This aspect is particularly crucial in sce-
dustry 4.0 paradigm, fostering the development of innovative narios where comprehension of AI outputs is essential for
tools and processes Soldatos and Kyriazis (2021). Simultane- critical decision-making but is hindered by the technical
ously, there is an escalating intrigue in XAI, which is oriented complexity of XAI outputs.
towards providing intelligible explanations for the inferences
4. Empirical Validation through Use-Case Studies: The prac-
and choices formulated by machine learning algorithms.
tical efficacy of this LLM is further underscored by em-
The pivotal contribution is based on an innovative paradigm
pirical evidence gathered from use-case studies. These
in the realm of human-centric XAI. This paper introduces a
studies demonstrate the model’s effectiveness in deliver-
ground-breaking GPT-based LLM, serving as a versatile inter-
ing audience-specific explanations that are comprehensi-
face that empowers end-users to intuitively comprehend and in-
ble and relevant, thereby validating the model’s applica-
terpret results derived from a multitude of XAI methodologies.
bility and impact in real-world scenarios.
This model is characterized by its:
In essence, this paper propels the field of XAI towards greater
1. Audience-Adaptive Explanations: The core achievement
inclusivity and practicality, by innovatively merging advanced
lies in its capability to produce concise, easily digestible
AI concepts with user-friendly interfaces. This approach not
summaries of complex XAI methods, specifically tailored
only demystifies XAI for non-experts but also significantly con-
to align with the varying expertise levels and interests of
tributes to the broader adoption and understanding of AI tech-
diverse audience groups, ranging from business profes-
nologies in various professional contexts.
sionals to academic researchers. This customization en-
The remainder of the paper is organized as follows: Section
hances user engagement and understanding across differ-
2 presents the background and the motivation of our research,
ent sectors.
while Section 3 delivers the literature review in the areas of
2. XAI Methodology Agnosticism: A unique attribute of this study of this paper. Section 4, presents the proposed method-
model is its agnostic approach to XAI methods. This ological approach, introduces the overall implementation, and
Preprint submitted to Elsevier January 25, 2024
offers details regarding the datasets used and the evaluation pro- the extent to which these insights are comprehensible and rele-
cedure. Section 6 dives deeper into the results of the conducted vant within the framework of the audience’s specialized knowl-
research and the corresponding survey. Section 5 concludes edge base.
with recommendations for future research and the potential of Explainability is defined by three fundamental elements, as
the current study. outlined in the aforementioned source: the nature of the insights
provided, the specific audience targeted, and the underlying ne-
cessity for these insights. These insights emanate from various
2. Background methodologies in explainability, such as textual descriptions,
the significance of features, or localized elucidations, and are
Our research’s underlying motivation is illuminated through intended for a diverse audience including sector-specific pro-
an introduction to the foundational concepts of Image Classifi- fessionals, individuals directly impacted by the outcomes of the
cation, XAI methods, and adversarial attacks. model, and experts in the field of model development.
In the realm of interpretability, the focus shifts to the con-
2.1. eXplainable AI (XAI) gruence and logicality of the explanations about the targeted
Interpretability or explainability in Machine Learning (ML) audience’s pre-existing knowledge base. This includes assess-
models refers to the ability to describe and understand an ML ing whether the explanations are coherent and meaningful to
model’s workings Choo and Liu (2018). This is particularly vi- the audience, if the audience is capable of employing these ex-
tal in Deep Neural Networks (DNN), which are inherently com- planations in their decision-making processes, and whether the
plex and thus perceived as ”black boxes” Zahavy et al. (2016). explanations provided offer a rational basis for the decisions
The burgeoning field of research addressing the opacity of these made by the model.
ML ”black boxes” is known as XAI Gunning (2016).
Herein, XAI assumes a critical yet sensitive role, acting as 2.3. Challenges in Communicating AI Concepts
a conduit between intricate DL models and those without IT Communicating the concepts of AI to a broad audience en-
expertise. Consequently, XAI methodologies must be precise compasses a multitude of challenges, stemming from the inher-
and comprehendible to domain experts, fostering a sense of ently complex and rapidly evolving nature of AI technology.
”trust” in real-time settings. Over the past few years, several These challenges are amplified when discussing the domain of
XAI methods, strategies, and frameworks have emerged. For XAI, where the goal is to make AI decision-making processes
our research, we categorize XAI methods based on their sim- transparent and understandable to various stakeholders.
plicity, the degree of interpretability, and the dependency level Although the various available open-source XAI algorithms,
on the analyzed ML/AI model, as illustrated in Figure 1. such as LIME (Local Interpretable Model-agnostic Explana-
Moreover, complexity-related methods in XAI can be bifur- tions) Ribeiro et al. (2016), SHAP (SHapley Additive exPlana-
cated into i) intrinsically explainable (Ante-Hoc) models, also tions) Lundberg and Lee (2017), and Gradient-weighted Class
known as transparent or glass box approaches, and ii) black-box Activation Mapping (Grad-Cam) model Selvaraju et al. (2017),
(Post-hoc) models, which necessitate deciphering the reasoning examples of XAI in real-world applications still need to be dis-
steps behind predictions for explainability purposes. Addition- covered. The root cause of this is that SotA XAI algorithms
ally, these methods can be categorized based on their scope: i) aim to assist the developer of the AI system instead of the
global explainability methods, which scrutinize the algorithm end-user. Developing XAI applications needs human-centered
as a whole, including training data and proper algorithm usage, approaches that align technical development with people’s ex-
and ii) Local explainability, which pertains to the system’s abil- plainability needs and define success by human experience, em-
ity to elucidate specific decision-making processes. powerment, and trust. Furthermore, AI algorithms can exhibit
Lastly, it’s crucial to differentiate between model-specific various forms of bias Klein (2020), including social, racial, and
and model-agnostic XAI approaches. The key difference lies in gender prejudices. XAI and Exploratory Data Analysis Tor-
whether the XAI method depends on the underlying ML model ralba and Efros (2011). However, implementing bias mitiga-
or if it can be universally applied. tion and XAI techniques in a larger situational context (i.e.,
explaining multiple AI models that perform a single task) be-
comes increasingly more complicated. Cutting-edge XAI ap-
2.2. From explainability to interpretability
proaches are rigorously disconnected, with just a local input
In scholarly discussions, a notable discrepancy persists re- view linked with each particular AI model utilized throughout
garding the precise definitions of ”explainability” and ”inter- the overall (global) reasoning process Jan et al. (2020). More-
pretability.” While these terms are often used interchangeably, over, existing techniques usually lack reasoning semantics and
some scholars distinguish between them, as noted in Arrieta remain detached from the broader process context.
et al. (2020) and Chakraborty et al. (2017). Other challenges rely significantly on the utilized interface
This analysis adheres to the differentiation between explain- between the human and machine/software. An effective HMI
ability and interpretability as explicated in Saeed and Omlin should consider various aspects such as the level of autonomy,
(2023). According to this reference, explainability entails the user expertise, use case/domain Lim and Dey (2009), as well
provision of insights tailored to satisfy a specific requirement of as security and trust Virtue (2017). Despite the extended re-
a designated audience, while interpretability is concerned with search, many works suggest that designers need more guidance
2
Figure 1: Taxonomy of XAI Methods

in designing interfaces for intelligent systems [Baxter (2018) only aims at demystifying AI decisions but also at enriching
that could be used by the non-IT-savvy public. the user’s understanding by providing context-rich, detailed in-
sights into the AI’s decision-making process.
2.4. Objectives
This paper aims to operationalize human-centered perspec-
tives in XAI at the conceptual, methodological, and technical 3. Literature Review
levels toward Human-Centred Explainable AI (HC-XAI) mod-
3.1. Explainable AI: A Technical Overview
els. We enhance cutting-edge XAI approaches for explaining
ML models, and models that explain deep neural networks to As complex predictive models are increasingly integrated
HC-XAI models, shaping the final output of black-box models into areas traditionally governed by human judgement, there
considering the context and biases while allowing feedback and is a growing demand for these models to offer more clarity in
adjustment from the user. how they reach decisions Susnjak (2023). This transparency is
Our research was motivated based on two extensive surveys vital for building trust and meeting regulatory compliance, es-
to explore the challenges and preferences in the field of XAI, pecially in international legal contexts where explaining auto-
and they are focusing on two challenging areas considering XAI mated decisions affecting people is becoming a legal necessity.
modeling. These two areas are the Time-series Classification According to Wachter et al. (2017), it’s also crucial that indi-
Makridis et al. (2023b) and Vessel Route Forecasting (VRF) viduals can challenge decisions made by these systems and un-
Makridis et al. (2023a). Both of them are quite challenging derstand what changes in their data could lead to different out-
in terms of AI interpretability for various reasons such as the comes. Technologies like Counterfactuals have been developed
complexity of quantifying explainability in XAI, highlighting to provide insights into minimal changes needed for different
the subjective nature of explainability and the diverse range of predictions by these models.
stakeholders involved. The results highlight the inherent sub- This need for clarity has given rise to the field of XAI or
jectivity in explainability, with different individuals having var- Interpretable Machine Learning. This area aims to create meth-
ied preferences and understandings. This underscores the need ods that make complex predictive models more understandable
for a flexible approach in designing explainability communi- and tools that explain how these models formulate their con-
cation to end users (Human-centric XAI). However, a strong clusions (Molnar et al. (2020)). Additionally, there’s growing
preference for visualization techniques was revealed, such as interest in prescriptive analytics, which focuses on using data
overlaying predicted versus actual trajectories, indicating the to create actionable insights Lepenioti et al. (2020).
importance of visual methods in making explanations under- From a technical standpoint, model interpretability involves
standable. Given the responses on both surveys, it is clear that understanding the internal workings of a machine learning
a description of the visualizations as a complementary tool is model post-training, generally at a broad, global level. Con-
desired by non-IT end users. versely, model explainability delves into understanding the ra-
Based on these findings, we propose an integrative approach tionale behind a model’s prediction for a specific instance,
that combines the strengths of visual and textual explanations. known as local-level explainability. Both are important: inter-
This approach aims to make XAI results more human-centered, pretability allows institutions to broadly explain how a model
focusing on providing user-friendly interfaces such as chatbox- works to stakeholders, while local-level explainability facili-
based human-AI interactions, ensuring that the design of ex- tates validating specific predictions and providing detailed feed-
planations and interfaces is user-centric, focusing on the spe- back to those affected, like students identified as at-risk.
cific needs and preferences of different user groups. This also In the pursuit of model transparency, tools like SHAP, rec-
involves propositions of the decision-making process to offer ognized as a leading visualization technique in XAI, pro-
added values based on the XAI outcomes. This approach not vide insight into both global and local-level transparency
3
Gramegna and Giudici (2021). The Anchors technique (Ribeiro the ChatGPT model, exemplifies these advancements by effec-
et al. (2018)) offers a high degree of local-level explainabil- tively translating complex analytical outputs into user-friendly,
ity through human-readable, rule-based models. Furthermore, actionable language, aiding learners and advisors.
advanced Counterfactuals not only enhance predictive analysis In the realm of cybersecurity, HuntGPT utilizes the capabili-
but also enable prescriptive suggestions, helping learners un- ties of LLMs and XAI to enhance network anomaly detection.
derstand the changes needed for a different outcome. This study It integrates a Random Forest classifier with the KDD99 dataset
showcases the application of these technologies across various Stolfo et al. (1999), advanced XAI frameworks, and the power
stages of the proposed prescriptive analytics framework. of GPT-3.5 Turbo. HuntGPT not only detects threats with re-
markable accuracy but also conveys them in a clear, understand-
3.2. Language Models in AI able format, greatly improving decision-making for cybersecu-
Following the success of GPT, a range of LLMs have been rity experts Ali and Kostakos (2023). While, Chun and Elkins
developed, exhibiting impressive capabilities in various Natural (2023) delves into the fusion of XAI with Computational Digi-
Language Processing (NLP) tasks, including those in finance. tal Humanities. It investigates diachronic text sentiment analy-
One standout model in this domain is BloombergGPT, cre- sis and narrative generation using advanced LLMs like GPT-4.
ated by Bloomberg’s AI team and trained on an extensive col- Additionally, it introduces an innovative XAI grey box ensem-
lection of financial texts. It has shown exceptional proficiency ble. This ensemble combines top-tier model performance with
in financial NLP tasks (Wu et al., 2023). However, as of superior interpretability and privacy, underpinned by novel lo-
May 2023, BloombergGPT remains largely for internal use at cal and global XAI metrics.
Bloomberg, lacking a publicly accessible API.
Google’s Bard, a key competitor to ChatGPT, is another no-
4. Methodology
table LLM. Powered by Google’s LAMDA (Language Model
for Dialogue Applications), it merges aspects of BERT and Initially, an assessment of state-of-the-art XAI techniques,
GPT to facilitate engaging, contextually aware conversations such as LIME, SHAP, and GradCam and PDP. These methods
(Thoppilan et al., 2022). Like BloombergGPT, Bard also will be adapted and integrated into the customized LLM infras-
doesn’t offer an open API as of this writing. tructure, focusing on generating natural language explanations
BLOOM, an open-source contender to GPT-3 (Scao et al., delivered via the AI Chat Interface. This integration aims to
2022), has also gained attention in the LLM space. While it’s transform complex XAI visualizations into user-friendly narra-
open-source, effectively using BLOOM requires considerable tives and insights, interpretable by end users.
technical know-how and computing power, and it lacks a ver-
sion fine-tuned for conversational tasks, a feature where models 4.1. Role of GPT-Builder in LLM Development
like ChatGPT excel.
Since ChatGPT’s introduction, numerous LLMs have The development of LLMs such as GPT variants has rev-
emerged targeting specific functions, such as code completion olutionized the field of natural language processing (NLP). A
(Dakhel et al., 2023), content generation, and marketing. These critical component in this evolution is the role of tools like
models offer specialized utility, expanding the scope and impact GPT-Builder, a sophisticated framework for constructing, fine-
of LLMs. ChatGPT continues to lead in the field (JasperAI, tuning, and deploying these advanced models. GPT-Builder
2023), thanks to its open API, extensive training data, and ver- serves as a pivotal element in LLM development, offering a
satility across various tasks. Despite ChatGPT’s broad appli- blend of user-friendly interfaces and powerful backend pro-
cation in fields like healthcare and education (Sallam, 2023), cesses that streamline the creation and management of these
its direct use in financial sentiment analysis is relatively un- complex models GPT Builder.
charted. Fatouros et al. (2023) presents evidence that ChatGPT, GPT-Builder plays an instrumental role in democratizing ac-
even when applied with zero-shot prompting, can understand cess to LLM technology. It empowers organizations and in-
complex contexts requiring advanced reasoning capabilities. In dividual developers to build custom LLMs tailored to specific
addition, MarketSense-AI, a real-world financial application, needs or domains. This customization is crucial in scenarios
leverages GPT-4 with Chain-of-Thought (CoT) to effectively where a standard GPT model may not provide optimal per-
explain investment decisions Fatouros et al. (2024). formance, such as in specialized professional fields or for lan-
guages and dialects with limited representation in mainstream
3.3. Large Language Models in XAI models. GPT-Builder simplifies the process of training these
Significant advancements have been made in AI and LLMs models on niche datasets, making it feasible for non-experts
based on transformers, which now exhibit near-human profi- in machine learning to develop highly specialized and effective
ciency in text generation and discourse. This progress is largely LLMs.
attributed to their ability to understand long-range dependencies
and contextual nuances in texts, thanks to self-attention mech- 4.2. Use cases
anisms. Models like Google’s BERT (Devlin et al. (2018)) and In our study, we developed a custom GPT model, the x-
OpenAI’s latest GPT series have set new benchmarks in var- [plAIn] GPT. x-[plAIn] model underwent extensive testing
ious natural language processing tasks, including text genera- across a diverse range of XAI methods and problem definitions.
tion (Brown et al. (2020)). OpenAI’s most recent development, However, for the interactive component involving end users, we
4
focused on five specific use cases presented through a question- 4.2.3. Use Case 3
naire. We made a concerted effort to select XAI implementa- The third use-case in our study involved the visualizations
tions that spanned various sectors and catered to different levels developed by Feldhus et al. (2023). The authors employed the
of technical expertise. Integrated Gradients feature attribution method to represent the
predictions made by a BERT model. Building on this, they
4.2.1. Use Case 1 created a model-free and instructed (GPT-3.5) Saliency Map
The first use-case featured in our study was derived from Verbalization (SMV) explaining the prediction representations.
Makridis et al. (2022) which investigated the detection of boar
taint. In this research, the authors identified significant factors
contributing to the boar-taint phenomenon, employing SHAP
values among other methods. This particular implementation
of SHAP values was incorporated into our questionnaire.
Figure 4: Plot for Saliency Map Verbalization (SMV) from Feldhus et al.
(2023).

4.2.4. Use Case 4


The fourth application incorporated XAI techniques as im-
plemented by Moujahid et al. (2022). In their study, Grad-CAM
was employed to identify regions of interest pertinent to the pre-
diction of COVID-19 in lung X-ray images, utilizing various
network architectures.

Figure 2: Plot for SHAPley values evaluation in Makridis et al. (2022).

4.2.2. Use Case 2 Figure 5: Plot for Grad-CAM explanations from Moujahid et al. (2022).
The second use-case was based on Szczepański et al. (2021)
where the authors explored the use of LIME and Anchors (XAI
methods) for generating explainable visualizations in the con- 4.2.5. Use Case 5
text of fake news detection. This study represented another Lastly, the fifth use-case presented substantial technical com-
facet of XAI application, showcasing its utility in media and plexities. The researchers in Moosbauer et al. (2021) em-
information analysis. ployed Partial Dependence Plots (PDP), an infrequently used
method within XAI, for the purpose of hyperparameter opti-
mization. Consequently, they generated and scrutinized plots
to exhibit robust and trustworthy Partial Dependence (PD) esti-
mates across an intelligible subset of the hyperparameter space,
considering a variety of model parameters.

4.3. Baseline: XAI approaches


Our innovative approach in XAI integrates outputs from
LIME, SHAP, and Grad-CAM into a GPT masses enhanced
textual and data analysis from these tools.

• LIME interprets complex models by approximating them


locally with simpler models, revealing feature influence on
Figure 3: Plot for LIME evaluation of fake news from Szczepański et al. (2021). predictions. It’s insightful but limited by potential instabil-
ity and local focus.
5
the models, identifying potential biases highlighted by the XAI
methods, and exploring strategies to address these issues.
The design of this tool is interactive and user-centric, en-
abling it to evaluate the user’s proficiency in AI and XAI
methodologies, as well as any domain-specific expertise they
may possess. Following this assessment, the tool adeptly tai-
lors its responses, adjusting the focus of its answers to align
Figure 6: Plot for Partial Dependence Plots (PDP) from Moosbauer et al.
with the user’s knowledge level. This approach ensures that the
(2021). insights provided are not only relevant and actionable but are
also derived effectively from the input given by the user.

• SHAP, rooted in game theory, assesses feature importance 4.6. Evaluation - Feedback
globally, offering consistent and fair interpretations but at In our comprehensive study to evaluate the applicability and
the cost of computational complexity and possible non- effectiveness of our GPT-based XAI explainer, we conducted
intuitiveness. an extensive survey targeting a broad spectrum of professionals.
This survey, which can be accessed here, was designed to gather
• GradCam identifies important image regions for predic-
insights into the various aspects of XAI in the context of our
tions, improving versatility and accuracy while maintain-
GPT model.
ing intuitiveness.
Survey Design and Purpose: The survey was meticulously
structured to probe into the respondents’ understanding and ex-
4.4. LLM Enhanced XAI explainer
periences with AI, Machine Learning (ML), and Deep Learn-
In the development of our GPT-based XAI explainer, a rig- ing (DL), as well as their exposure to and perceptions of XAI
orous, technical approach was employed. Initially, we utilized methods. This allowed us to gauge the baseline knowledge of
a specialized interface for defining the explainer’s core objec- our audience, which is crucial in tailoring the XAI components
tives, focusing on advanced interpretability for AI decision- of our GPT model.
making processes. The configuration phase was comprehen- Assessing User Familiarity and Application of AI: One of
sive, involving precise customization of the model’s parame- the key objectives was to understand how familiar the respon-
ters, including its naming, operational descriptions, and initial- dents were with AI, especially in the context of using AI for
ization prompts tailored for nuanced AI explanations. The de- specific tasks. This information is vital to ensure that our GPT
velopment of the prompt engineering process is delineated in XAI explainer is accessible to users with varying levels of AI
Table 1. This progression adheres to the prompt-engineering expertise.
guidelines provided in the official documentation of OpenAI, Understanding Preferences in Data Description: The survey
accessible via this hyperlink. Additionally, acknowledging the extensively examined various applications of XAI methods, in-
paramount significance of causal explanations and insight gen- cluding LIME, SHAP, and Grad-CAM, each presented with two
eration for HC-XAI, we incorporated a CoT approach into distinct descriptions. The first was the original description from
our final prompt. Given the absence of universally correct re- the research papers, selectively modified to provide the end user
sponses for this task, we eschewed the methodology detailed in with essential information. In contrast, the second description
Wei et al. (2022). Instead, we adopted a more straightforward was generated by the x-[plAIn] GPT model in response to the
strategy, integrating the phrase ”Let’s think step by step.” This query, ”What are the top insights from this picture?” Notably,
inclusion has proven to be notably effective, as substantiated by in instances where the problem definition was not evident or de-
Tan (2023). This technical methodology ensured the creation ducible from the input, the model’s query included the specific
of a GPT model specifically fine-tuned for the complexities and research problem addressed in the original paper. This approach
demands of XAI, enhancing its effectiveness in delivering clear, enabled a comprehensive understanding of user preferences re-
understandable insights into AI decisions. garding textual explanations, including aspects like structure,
length, and formality, allowing for subsequent fine-tuning of
4.5. Audience Analysis and Content Customization the model.
This tool primarily serves two key demographics: end-users
of XAI methods and AI developers, notably data scientists, who 4.7. Limitations
utilize XAI methods for model understanding. The former cat- This tool demonstrates a remarkable ability to interpret vari-
egory, end-users, typically possesses limited technical knowl- ous outputs from XAI methods, offering insightful and targeted
edge but may exhibit considerable domain expertise. Their explanations. However, it has been observed that there are in-
primary interest lies in deriving insights from the XAI meth- stances in which the model mistakenly attempts to explain the
ods, rather than comprehending the technical intricacies of how provided image rather than focusing on the XAI output. An
these methods operate or the underlying model training pro- illustrative case of this behavior can be found in use case #3
cesses. Conversely, highly technical users, such as AI devel- (SMV), as detailed in Section 4.2. In this particular example,
opers, leverage this tool to gain deeper insights into their mod- the model states:
els. They focus on understanding the training mechanisms of Highlighted High-Impact Negative Phrases:
6
Table 1: Gradual Development of ChatGPT Prompts for XAI Method Simplification

Prompt Prompt Benefit Over Previous Version


Version
P1 Provide summaries of insights from XAI methods, focusing on clarity and relevance. Avoid including obvious Introduces the basic concept of summarizing XAI insights
elements unless specifically requested. with a focus on clarity and relevance.
P2 Summarize insights from XAI methods like LIME and SHAP. Focus on clarity and relevance, and begin to Specifies XAI methods (LIME and SHAP) and introduces the
consider the user’s context in your summaries. Exclude obvious elements unless they are explicitly asked for. concept of tailoring summaries to the user’s context.
P3 Generate clear and relevant summaries from XAI methods such as LIME and SHAP, tailored to the user’s context. Adds the element of actionability and the need to adjust re-
Begin to integrate actionability into the insights and ask the user for their expertise level (beginner, intermediate) sponses based on the user’s expertise level.
before responding.
P4 Provide clear, relevant, and actionable summaries of insights from XAI methods like LIME and SHAP. Tailor Emphasizes the customization of summaries to specific user
the content to the user’s expertise level and specific inquiries. Include practical suggestions or conclusions and inquiries and the inclusion of practical suggestions or conclu-
avoid obvious elements from the input unless requested. sions.
P5 Objective: Deliver concise summaries of insights from XAI methods like LIME and SHAP, tailored to user’s Introduces the concept of responsiveness to user-specific in-
context and expertise level. Focus on clarity, relevance, actionability, and responsiveness. Ask the user for their quiries and further emphasizes tailoring content based on ex-
expertise level and tailor your response accordingly. Avoid including obvious input elements unless explicitly pertise.
asked. Provide practical suggestions or conclusions.
P6 Objective: Provide concise, user-friendly summaries of insights derived from XAI methods. If multiple insights Fully integrates all elements including clarity, relevance, ac-
can be drawn from a single input try to combine them into a larger context. Let’s think step by step on how the tionability, responsiveness, and user-specific customization,
final insight is reached. creating a comprehensive and detailed approach to summa-
rizing XAI insights.
Output Expectations:

Clarity: Deliver straightforward and easily comprehensible summaries.


Relevance: Ensure insights are directly applicable to the user’s context or domain.
Actionability: Focus on providing practical suggestions or conclusions.
Responsiveness: Tailor summaries to answer user-specific inquiries based on the XAI analysis.

DO NOT include obvious elements (numbers, text) from the given input unless EXPLICITLY asked.
BEFORE ANSWERING
1. Ask the user for his expertise level (beginner, intermediate).
2. Ask the user’s domain of expertise (if none provided assume it aligns with the domain of the input provided)

If the user is intermediate provide information about how he should understand the provided input and then the
insight(s).
If the user is beginner DO NOT provide information about the provided input PROVIDE ONLY THE INSIGHT.

Tailor responses to a user’s technical and domain expertise.


Provide examples that match the expertise of the user in analogous manner to explain the insights.

– The phrases ”the worst movie ever produced,” ”worst mechanisms to primary decision-makers, emphasizing their
plot,” ”worst acting,” and ”worst special effects” are paramount need for transparency and trustworthiness. The sur-
strongly emphasized in the saliency map. This implies that vey pivoted on two primary axes: gauging participants’ baseline
these phrases are key elements the model associates with familiarity with AI, ML, and DL; and discerning their percep-
a negative review. tion of key data interpretation based on their experience on AI-
enhanced decision-making. By gathering feedback through this
– The recommendation to ”light a match and burn the tape” structured lens, we aimed to carve out a roadmap for the subse-
is exceedingly negative, indicating a high level of dissatis- quent development and refinement of x-[plAIn] that we plan to
faction. offer via the GPT Store.
While the second observation remains accurate and provides Through the conducted questionnaire, it was revealed that a
valuable insights, it is not directly related to the actual XAI significant majority of participants, exceeding 70%, expressed
output since those words are not prominently highlighted by a satisfaction level below 60% concerning their comprehension
the saliency map. of AI-based decision models. This is further underscored by
In general, the model’s contextual interpretation can yield the fact that a mere 30% of participants are actively employ-
generalized insights that may extend beyond the strict confines ing XAI methodologies. This finding raises critical questions
of the XAI output. Striking the right balance between the pre- about the prevalence and perceived efficacy of XAI techniques
cision of the response and the breadth of the insights provided within the industry. When it comes to scenario-based prefer-
poses a challenge. The most effective approach is to engage an ences, over 80% of participants favored x-[plAIn] descriptions
active end-user who employs critical thinking and maintains an in comparison to the conventional descriptions derived from
open-minded approach to understanding the results. original papers associated with XAI methods, particularly in
decision-making contexts and image comprehension.
5. Results and Discussion The feedback on enhancing x-[plAIn] predominantly re-
volved around the brevity of responses, given the tendency of
In our endeavor to delve into the usability and effectiveness GPT models towards verbosity, and the need for tailoring ex-
of x-[plAIn], we administered a meticulously crafted survey to planations to suit the specific background or domain of the end-
an eclectic mix of partners and participants, that can be found user. While the model inherently possesses the capability to
here. Drawing from real-world scenarios, we simulated a con- customize responses based on domain-specific information, this
text where AI models transition from mere decision-support feature was not fully showcased to the participants due to the
7
(a) Acceptability of x-[plAIn] concerning role of the users. (b) Acceptability of x-[plAIn] concerning the usage of XAI of the users.

Figure 7: Comparison of Acceptability

Figure 8: Description preference based on perceived AI understanding level.

absence of domain information in the prompts, which was in- Figure 8 demonstrates a notable trend concerning users’ pref-
tentionally omitted to ensure a level playing field in the assess- erences in relation to their self-reported comprehension of AI
ment. This suggests a potential area for refinement in future model outputs. This graphical representation indicates a dis-
iterations, where the model’s adaptive response generation can cernible correlation between the level of claimed understand-
be demonstrated more effectively to respondents. ing and the preferences exhibited by respondents. It appears
The comparative analysis of the two bar plots reveals insight- that individuals professing a more profound grasp of AI model
ful trends about the acceptability of ”x-[plAIn]” across different outputs tend to require less information, potentially influencing
user groups and their engagement with XAI methods. a shift in their preferences. Despite this observed trend, it is
noteworthy that x-[plAIn] retains a significant degree of favor-
In Figure 7a, which distinguishes between end users and AI ability, being the preferred choice in 75% of instances among
experts, a distinct pattern emerges. End users, presumably less the cohort exhibiting the highest level of understanding.
versed in the technical aspects of AI, show varying levels of
preference for ”x-[plAIn]” across different use cases. This vari-
ability could indicate a nuanced approach to AI explanations, 6. Conclusion
where the complexity or context of each use case significantly For future enhancements, it will be crucial to implement fea-
influences their preference. On the other hand, AI experts, with tures that allow end-users to specify (more strictly) their prefer-
their deeper technical understanding, exhibit a more consis- ence for the level of detail they require. A verbose setting could
tent response pattern across the use cases. Their preferences cater to those looking for in-depth understanding, while a more
might reflect a critical evaluation of ”x-[plAIn]” against their streamlined option would benefit users seeking brief clarifica-
advanced understanding of AI processes. tions. Such a choice empowers users, providing a user-centric
Figure 7b, focuses on the usage of XAI methods and also approach that accommodates a wide range of use cases from
provides compelling insights. Respondents who actively use novice inquiries to expert validations.
XAI methods demonstrate a certain level of preference for Additionally, the feedback highlights the importance of con-
”x-[plAIn]”, which might suggest that their familiarity with sidering user experience, particularly for those unfamiliar with
XAI influences their expectations and acceptance of explana- the subject matter. Breaking down complex topics into smaller,
tory tools. Conversely, non-XAI users, who might not have a individually explained segments can significantly enhance com-
benchmark for comparing such tools, show differing degrees prehension. Conversely, for experienced users, lengthy and
of acceptance for ”x-[plAIn]”, potentially guided more by the information-dense responses may prove unnecessary and time-
tool’s clarity and usability than by its technical robustness. consuming. To this end, introducing an option to toggle be-
8
tween longer and shorter answer formats while shifting focus Gunning, D., 2016. Explainable artificial intelligence (xai) darpa-baa-16-53.
from understanding the XAI methods to the extraction of in- Defense Advanced Research Projects Agency .
Jan, S.T., Ishakian, V., Muthusamy, V., 2020. Ai trust in business processes:
sights can be beneficial. the need for process-aware explanations, in: Proceedings of the AAAI Con-
In future work, we aim to investigate a specific characteris- ference on Artificial Intelligence, pp. 13403–13404.
tic of this tool, identified during the development of x-[plAIn]. JasperAI, 2023. The ai in business trend report. URL: https://www.jasper.
This tool holds potential for experienced AI engineers, offering ai/blog/ai-business-trend-report. accessed:May 26, 2023.
Klein, A., 2020. Reducing bias in ai-based financial services .
a resource to pinpoint and mitigate potential biases that may Lepenioti, K., Bousdekis, A., Apostolou, D., Mentzas, G., 2020. Prescriptive
emerge at different phases of the model creation pipeline. These analytics: Literature review and research challenges. International Journal
phases include data collection, preprocessing, model training, of Information Management 50, 57–70.
and validation processes. By utilizing this tool, AI profession- Lim, B.Y., Dey, A.K., 2009. Assessing demand for intelligibility in context-
aware applications, in: Proceedings of the 11th international conference on
als can significantly contribute to the cultivation of AI systems Ubiquitous computing, pp. 195–204.
that excel not only in technical prowess but also in ethical in- Lundberg, S.M., Lee, S.I., 2017. A unified approach to interpreting model
tegrity and social responsibility. predictions. Advances in neural information processing systems 30.
Makridis, G., Fatouros, G., Kiourtis, A., Kotios, D., Koukos, V., Kyriazis, D.,
Soldatos, J., 2023a. Towards a unified multidimensional explainability met-
Acknowledgements ric: Evaluating trustworthiness in ai models, in: 2023 19th International
Conference on Distributed Computing in Smart Systems and the Internet of
The research leading to the results presented in this paper has Things (DCOSS-IoT), IEEE. pp. 504–511.
Makridis, G., Fatouros, G., Koukos, V., Kotios, D., Kyriazis, D., Soldatos, I.,
received funding from the Europeans Union’s funded Project 2023b. Xai for time-series classification leveraging image highlight meth-
HumAIne under grant agreement no 101120218. ods. arXiv preprint arXiv:2311.17110 .
Makridis, G., Heyrman, E., Kotios, D., Mavrepis, P., Callens, B., Van De Vijver,
R., Maselyne, J., Aluwé, M., Kyriazis, D., 2022. Evaluating machine learn-
References ing techniques to define the factors related to boar taint. Livestock Science
264, 105045.
Ali, T., Kostakos, P., 2023. Huntgpt: Integrating machine learning-based Makridis, G., Kyriazis, D., Plitsos, S., 2020. Predictive maintenance leverag-
anomaly detection and explainable ai with large language models (llms). ing machine learning for time-series forecasting in the maritime industry,
arXiv preprint arXiv:2309.16021 . in: 2020 IEEE 23rd International Conference on Intelligent Transportation
Arrieta, A.B., Dı́az-Rodrı́guez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, Systems (ITSC), IEEE. pp. 1–8.
A., Garcı́a, S., Gil-López, S., Molina, D., Benjamins, R., et al., 2020. Ex- Molnar, C., Casalicchio, G., Bischl, B., 2020. Interpretable machine learning–a
plainable artificial intelligence (xai): Concepts, taxonomies, opportunities brief history, state-of-the-art and challenges, in: Joint European conference
and challenges toward responsible ai. Information fusion 58, 82–115. on machine learning and knowledge discovery in databases, Springer. pp.
Baxter, K., 2018. How to meet user expectations for artifcial intelligence. 417–431.
Medium. Retrieved September . Moosbauer, J., Herbinger, J., Casalicchio, G., Lindauer, M., Bischl, B., 2021.
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Nee- Explaining hyperparameter optimization via partial dependence plots. Ad-
lakantan, A., Shyam, P., Sastry, G., Askell, A., et al., 2020. Language vances in Neural Information Processing Systems 34, 2280–2291.
models are few-shot learners. Advances in neural information processing Moujahid, H., Cherradi, B., Al-Sarem, M., Bahatti, L., Eljialy, A.B.A.M.Y.,
systems 33, 1877–1901. Alsaeedi, A., Saeed, F., 2022. Combining cnn and grad-cam for covid-19
Chakraborty, S., Tomsett, R., Raghavendra, R., Harborne, D., Alzantot, M., disease prediction and visual explanation. Intelligent Automation & Soft
Cerutti, F., Srivastava, M., Preece, A., Julier, S., Rao, R.M., et al., Computing 32.
2017. Interpretability of deep learning models: A survey of results, in: Ribeiro, M.T., Singh, S., Guestrin, C., 2016. ” why should i trust you?” ex-
2017 IEEE smartworld, ubiquitous intelligence & computing, advanced plaining the predictions of any classifier, in: Proceedings of the 22nd ACM
& trusted computed, scalable computing & communications, cloud & SIGKDD international conference on knowledge discovery and data mining,
big data computing, Internet of people and smart city innovation (smart- pp. 1135–1144.
world/SCALCOM/UIC/ATC/CBDcom/IOP/SCI), IEEE. pp. 1–6. Ribeiro, M.T., Singh, S., Guestrin, C., 2018. Anchors: High-precision model-
Choo, J., Liu, S., 2018. Visual analytics for explainable deep learning. IEEE agnostic explanations, in: Proceedings of the AAAI conference on artificial
computer graphics and applications 38, 84–92. intelligence.
Chun, J., Elkins, K., 2023. explainable ai with gpt4 for story analysis and gen- Saeed, W., Omlin, C., 2023. Explainable ai (xai): A systematic meta-survey
eration: A novel framework for diachronic sentiment analysis. International of current challenges and future opportunities. Knowledge-Based Systems
Journal of Digital Humanities , 1–26. 263, 110273.
Dakhel, A.M., Majdinasab, V., Nikanjam, A., Khomh, F., Desmarais, M.C., Sallam, M., 2023. Chatgpt utility in healthcare education, research, and prac-
Jiang, Z.M., 2023. Github copilot ai pair programmer: Asset or liability? tice: systematic review on the promising perspectives and valid concerns,
Journal of Systems and Software , 111734. in: Healthcare, MDPI. p. 887.
Devlin, J., Chang, M.W., Lee, K., Toutanova, K., 2018. Bert: Pre-training of Scao, T.L., Fan, A., Akiki, C., Pavlick, E., Ilić, S., Hesslow, D., Castagné, R.,
deep bidirectional transformers for language understanding. arXiv preprint Luccioni, A.S., Yvon, F., Gallé, M., et al., 2022. Bloom: A 176b-parameter
arXiv:1810.04805 . open-access multilingual language model. arXiv preprint arXiv:2211.05100
Fatouros, G., Metaxas, K., Soldatos, J., Kyriazis, D., 2024. Can large language .
models beat wall street? unveiling the potential of ai in stock selection. Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.,
arXiv preprint arXiv:2401.03737 . 2017. Grad-cam: Visual explanations from deep networks via gradient-
Fatouros, G., Soldatos, J., Kouroumali, K., Makridis, G., Kyriazis, D., 2023. based localization, in: Proceedings of the IEEE international conference on
Transforming sentiment analysis in the financial domain with chatgpt. Ma- computer vision, pp. 618–626.
chine Learning with Applications 14, 100508. Soldatos, J., Kyriazis, D., 2021. Trusted artificial intelligence in manufacturing;
Feldhus, N., Hennig, L., Nasert, M., Ebert, C., Schwarzenberg, R., Mller, S., trusted artificial intelligence in manufacturing: A review of the emerging
2023. Saliency map verbalization: Comparing feature importance represen- wave of ethical and human centric ai technologies for smart production; a
tations from model-free and instruction-based methods, in: Proceedings of review of the emerging wave of ethical and human centric ai technologies
the 1st Workshop on Natural Language Reasoning and Structured Explana- for smart production .
tions (NLRSE), pp. 30–46. Stolfo, S., Fan, W., Lee, W., Prodromidis, A., Chan, P., 1999.
Gramegna, A., Giudici, P., 2021. Shap and lime: an evaluation of discriminative Kdd cup 1999 data. UCI Machine Learning Repository. DOI:
power in credit risk. Frontiers in Artificial Intelligence 4, 752558.

9
https://doi.org/10.24432/C51C7N.
Susnjak, T., 2023. Beyond predictive learning analytics modelling and onto
explainable artificial intelligence with prescriptive analytics and chatgpt. In-
ternational Journal of Artificial Intelligence in Education , 1–31.
Szczepański, M., Pawlicki, M., Kozik, R., Choraś, M., 2021. New explainabil-
ity method for bert-based model in fake news detection. Scientific reports
11, 23705.
Tan, J.T., 2023. Causal abstraction for chain-of-thought reasoning in arithmetic
word problems, in: Proceedings of the 6th BlackboxNLP Workshop: Ana-
lyzing and Interpreting Neural Networks for NLP, pp. 155–168.
Thoppilan, R., De Freitas, D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng,
H.T., Jin, A., Bos, T., Baker, L., Du, Y., et al., 2022. Lamda: Language
models for dialog applications. arXiv preprint arXiv:2201.08239 .
Torralba, A., Efros, A.A., 2011. Unbiased look at dataset bias, in: CVPR 2011,
IEEE. pp. 1521–1528.
Virtue, E., 2017. Designing with ai. Retrieved July 29, 2022.
Wachter, S., Mittelstadt, B., Russell, C., 2017. Counterfactual explanations
without opening the black box: Automated decisions and the gdpr. Harv. JL
& Tech. 31, 841.
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q.V.,
Zhou, D., et al., 2022. Chain-of-thought prompting elicits reasoning in large
language models. Advances in Neural Information Processing Systems 35,
24824–24837.
Wu, S., Irsoy, O., Lu, S., Dabravolski, V., Dredze, M., Gehrmann, S., Kam-
badur, P., Rosenberg, D., Mann, G., 2023. Bloomberggpt: A large language
model for finance. arXiv preprint arXiv:2303.17564 .
Zahavy, T., Ben-Zrihem, N., Mannor, S., 2016. Graying the black box: Under-
standing dqns, in: International conference on machine learning, PMLR. pp.
1899–1908.

10

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy