0% found this document useful (0 votes)
87 views10 pages

Kim Et Al., 2024

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views10 pages

Kim Et Al., 2024

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

New Ideas in Psychology 73 (2024) 101074

Contents lists available at ScienceDirect

New Ideas in Psychology


journal homepage: www.elsevier.com/locate/newideapsych

Exploring artificial intelligence approach to art therapy assessment: A case


study on the classification and the estimation of psychological state based
on a drawing
Seong-in Kim a, Kee-Eung Kim b, Seunghwan Song a, *
a
Department of Industrial Management Engineering, Korea University, Seoul, Republic of Korea
b
Kim Jaechul Graduate School of AI, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea

A R T I C L E I N F O A B S T R A C T

Keywords: The art therapy assessment involves the classification of the psychological state of the drawer into several groups
Artificial intelligence approach (e.g., normal or abnormal) and the estimation of it in numeric (e.g., psychological examination score) based on
Deep convolutional neural networks the interpretation of his or her drawing. Based on a qualitative approach to these tasks, a statistical approach
Art therapy assessment
relying various quantitative features of drawings has broadened the scope and methods in the analysis of the
Stepwise and logistic regressions
Classification and estimation of psychological
psychological states through drawings. In this paper, we explore an artificial intelligence approach and discuss its
states superiority over the statistical approach and also identify its limitations. The synergistic effects of the interdis­
ciplinary framework combining qualitative, statistical, and artificial intelligence approaches is expected to make
a critical contribution to the development of art therapy assessment.

1. Introduction therapy poses challenges for practitioner decision-making. In this


contest, AI-based expert system integration could prove invaluable
Artificial Intelligence (AI) encompasses diverse research domains, (Giarratano & Riley, 2005). Earlier efforts in this direction include work
including language understanding, voice recognition, image recogni­ by Kim, Kim, et al. (2006) and Kim, Ryu, et al. (2006), who introduced
tion, and knowledge utilization. The expert system, a traditional area of general-purpose expert systems to formalize expertise in art therapy.
AI, which strives to harness and apply expert knowledge, has witnessed Moreover, Kim et al. (2011) developed a specialized expert system for
notable practical successes. However, recent advancements in Deep the Kinetic Family Drawing (KFD) methodology.
Neural Networks (DNN) have spurred exceptional progress in AI do­ Following these approaches, Kim (2008) and Mattson (2010)
mains like natural language processing, intelligent robotics, and image emphasized the significance of a quantitative approach in art therapy.
understanding. Consequently, DNN has become nearly synonymous Subsequent research has extensively focused on the quantitative eval­
with AI. For instance, it outperformed humans in recognizing objects in uation and interpretation of drawings (Kim, 2016; Kim, 2017; Mattson,
images (He et al., 2016). In the medical realm, the prowess of AI in 2010; Mattson, 2012a). Historically, the perspective on interpreting
breast cancer detection from X-ray imaging has surpassed that of radi­ drawings shifted from symbolic elements like people, eyes, houses, and
ologists (Reod, 2020). Given AI’s remarkable impact on image under­ windows, to more formative aspects such as the number of colors used,
standing, its potential influence on art therapy assessment, where placement, accuracy, and dominant color (Gantt and Tabone, 2003;
drawings serve as the primary medium, is also noteworthy. Kim, 2010). Recent methodologies have utilized computer techniques to
The qualitative approach to interpret drawings and their correlation quantify these elements, including basic computer functions (Kim &
with psychological states is rooted in the empirical, heuristic and sub­ Hameed, 2009), digital image processing (Kim, 2018; Kim et al., 2012a),
jective knowledge of an art therapist. This qualitative knowledge is often and specific computer algorithms (Kim, Kang, & Kim., 2009). In the field
derived from the therapist’s individual professional background and of AI, this process of identifying and quantifying drawing elements is
experience. Gussak and Nyce (1999) emphasized that art therapy is termed “feature extraction” from drawings. Expanding on this, statisti­
inherently eclectic in its theory and practice, resisting simplification into cal regression models using extracted drawing features have been
a singular algorithmic framework. The inherent complexity of art employed to assess psychological states. Kim et al. (2014) categorized

* Corresponding author. Korea University, 145, Anam-ro, Seongbuk-gu, Seoul, 02841, Republic of Korea.
E-mail address: ss-hwan@korea.ac.kr (S. Song).

https://doi.org/10.1016/j.newideapsych.2024.101074
Received 8 November 2022; Received in revised form 19 November 2023; Accepted 3 January 2024
Available online 9 January 2024
0732-118X/© 2024 Elsevier Ltd. All rights reserved.
S.-i. Kim et al. New Ideas in Psychology 73 (2024) 101074

the psychological state of an individual based on their drawings, while used to extract and quantify these features. For instance, digital image
Kim, Betts, et al. (2009) numerically gauged dementia levels. Further­ processing blurs and clusters colors. Computer algorithms then assess
more, Kim, Kang, et al. (2012) evaluated the efficacy of various art placement, concentration, and coloring accuracy, either numerically or
therapy assessment tools in determining psychological conditions. These through grading. Notably, the referenced studies employ regression
statistical techniques are also termed as statistical pattern recognition. models to discern or predict psychological states based on these quan­
This study focuses on two art therapy assessment tasks: classifying an tified attributes. The performance of classification is gauged by its ac­
individual’s psychological state into distinct categories and estimating it curacy, while the performance of numerical estimation is assessed by the
as a numerical value through drawing interpretation. All sample draw­ coefficient of determination, R2 (0 ≤ R2 ≤ 1) (Walpole et al., 2007).
ings of the three art therapy tools, Structured Mandala Coloring (SMC) Typically, higher metric values indicate superior performance. Howev­
(Curry & Kasser, 2005), the Person Picking an Apple from a Tree (PPAT) er, caution is warranted. It usually reflects the accuracy for the given
(Gantt, 1990), and the Face Stimulus Assessment (FSA) (Betts, 2003), for dataset, not necessarily for a new data - a phenomenon termed “over­
this AI approach are from the datasets in Kim, Betts, et al. (2009) and fitting”. All statistical models risk overfitting, especially with numerous
Kim, Kang, et al. (2012). The SMC represents a mandala that in­ features. The SR model aptly selects the right number of features. For
corporates a predefined geometric pattern for coloring. The PPAT is a classifying psychological states, the LR model with an exponential
single picture assessment where participants illustrate a person regression function is used. Meanwhile, the SR model with a linear
retrieving an apple from a tree. The FSA offers a standardized human function estimates the numerical value of the psychological state (Kut­
face outline for clients to complete (Mattson, 2012b). The detailed ner et al., 2005).
participant selection and consent procedures can be referenced in the Our AI methodology utilizes the ResNet (RN) model (He et al., 2016),
aforementioned studies. The participating patients (abnormal) were the winner of the 2015 ImageNet Large Scale Visual Recognition Chal­
diagnosed by a psychiatrist based on American Psychiatric Association lenge (ILSVRC). This model leverages a deep Convolutional Neural
(APA, 2022) criteria and the non-patients (normal) were employees of a Network (CNN) trained on the ImageNet dataset, consisting of 1.28
hospital with no current or past mental illness as determined by the same million images, for classification into 1000 categories. Its distinctive
psychiatrist based on Symptom Checklist-90-Revision (Derogatis. 1992). feature is residual learning via skip connections which mitigates training
For the classification task, Kim et al. (2014) utilized Logistic efficacy issues in deeper architectures. We employ the RN model to
Regression (LR) models. Their aim was to categorize individuals as replicate experiments, specifically classifications by the LR model (Kim
either in a normal state or within one of three specific abnormal states: et al., 2014) and numerical estimations by the SR model (Kim, Kang,
anxiety, depression, or schizophrenia. This classification was based on et al., 2012), comparing their performances. As illustrated in Fig. 1, we
the individual’s SMC. Notably, all 32 features of an SMC, including adapt the pre-trained RN model for each dataset. Each drawing is resized
number of colors used, ratios, accuracy, concentration, and completeness, to a 224 × 224 RGB image, labeled “Input, 224 × 224 × 3” in Fig. 1, and
inherently pertain to color. We denote features in italic. These features is processed through the “Black-box”, representing the RN model. This
can be automatically and quantitatively analyzed by the Computerized results in either classification into one of the ‘n’ groups or a numerical
Color-Related Elements Art Therapy Evaluation System (C_CREATES) value termed “Output n-node”. Within this black-box, across 50 layers,
(Kim et al., 2007). For numerical estimation task, Kim, Kang, et al. features distinct from the statistical method are extracted and stored in a
(2012) utilized the Stepwise Regression (SR) model to estimate the 1000-node layer (Fully connected layer). To evaluate the impact of
Mini-Mental State Examination-Korean (MMSE-K) score (Park & Kwon, quantitative features from previous statistical studies on classification
1990) across above three aet therapy tools. The MMSE-K consists of and estimation, an additional layer (Quantitative features layer) is
twelve questions, yielding a maximum score of 30, encompassing ele­ incorporated when required. The RN model is accompanied by visually
ments like time orientation, place orientation, registration, and recall. depicting the significance of different drawing regions for classification
Scores ranging from 24 to 30 indicate a “definitely normal” cognitive or estimation through color-coded “Class Activation Map (CAM)
state, 0 to 10 signifies “definite dementia”, and 11 to 23 suggests “po­ output”. As with regression models, overfitting can occur in the RN
tential dementia”. model. As the number of training iterations increases, accuracy may rise
In this paper, exploring AI’s role in art therapy assessment, we pro­ for the training dataset, but not necessarily for the validation dataset -
pose DNN for classification and numerical estimation tasks. By representing the drawings by new patients. Determining the ideal
comparing the results from statistical and DNN models, we underscore number of iterations, or epochs, requires monitoring the convergence
the potential, limitations, and prerequisites of AI. However, while the pattern of accuracy.
statistical approach elucidates its process transparently, the AI meth­ Models are trained for 60 epochs using a batch size of 128. To vali­
odology lacks this clarity. This limitation of the AI method highlights the date the model, we employ a 5-fold cross-validation. The model’s per­
value of incorporating statistical insights. The potential of AI in art formance is assessed by averaging the accuracies across these 5 folds.
therapy assessment necessitates robust data acquisition strategies and Experiments were conducted in a consistent hardware and software
specialized researchers for effective implementation. Addressing the environment to guarantee result reproducibility and research trans­
unique challenges in art therapy mandates collaboration across art parency. The experiments were all implemented using the PyTorch
therapy, statistics, and AI disciplines. software package and scikits learn (Sklean), Pandas library, together
with Python3 language. Training procedures were conducted in the Intel
2. Methods (R) Core(TM) i9-10900K CPU @ 3.60 GHz, 32 GB RAM with NVIDIA
GeForce RTX-3090 24 GB.
Samples and tools for classification. Kim et al. (2014) collected a We compare the RN model’s classification results with those previ­
total of 495 SMCs drawn by Group-N of 201 normal persons and Group-P ously reported by the LR model (Kim et al., 2014). The RN model’s
of 294 psychosis patients. Group-P is further divided into Group-A of classifications align with “Output 2-node” in Fig. 1. For a broader
100 anxiety patients, Group-D of 94 depression patients, and Group-S of analysis, classifications into one of three groups, anxiety, depression,
100 schizophrenia patients. They classified the drawers into one of the and schizophrenia, and into one of four groups, normal, anxiety,
two groups in the following 4 cases: Group-N and Group-P, Group-N and depression and schizophrenia, are performed by both the LR and RN
Group-A, Group-N and Group-D, and Group-N and Group-S. The models and their results are compared. These two classifications of the
informed consent from the 495 participants to use their drawings for RN model correspond to “Output 3-node” and “Output 4-node” in Fig. 1,
academic research papers was obtained. respectively.
The statistical approach detailed previously utilizes quantitative Samples and tools for estimation. Using the SR model for numerical
drawing features. Digital image processing and computer algorithms are estimation of MMSE-K scores (0–30), Kim, Kang, et al. (2012) compared

2
S.-i. Kim et al. New Ideas in Psychology 73 (2024) 101074

Fig. 1. CNN model for classification and estimation.

the performances of three art therapy tools: SMC, PPAT, and FSA. yellow of an SMC, LR model of statistical approach classifying the person
Fifty-eight people with their own MMSE-K scores drew a set of 3 pic­ who drew it into one of the two groups, Group-N and Group-P, is as
tures, using each of these three tools. Their informed consent to use their follows (Kim et al., 2014): when the value of
drawings for academic research papers was obtained. Based on the 58
drawings of each tool, the MMSE-K scores were estimated by the SR exp(Z) / [1 + exp(Z)]
model, and the performance of three tools, R2s, were compared. In this is less than the cutoff criteria of 0.5, the person is classified into Group-
study the RN model with output of only one node (Output 1-node) in N, and otherwise into Group-P, where
Fig. 1 is applied to estimate the MMSE-K score and compared to the SR
model. In addition, when the data size is small as in this case, the Z = 7.586 - 0.226 x Number of colors used (0.012) - 6.633 x Accuracy (0.000) -
MMSE-K score is divided into two groups, Group-0 from 1 to 10 and 5.252 x Ratio of yellow (0.000).
Group-1 from 11 to 30, and the classification into one of these two
The values in the parentheses indicate the p-values of the variables.
groups is obtained by both the LR model and the RN model with “Output
Here, the cutoff criteria of 0.5 for classification can be set differently.
2-node”, and their results are compared.
The functions indicate an increased likelihood of classification into
The SR and LR models transparently convey their classification or
Group-P as the Number of colors used, Accuracy, and Ratio of yellow
estimation processes and their statistical reliability. For example, if the
decrease.
number of colors used in a drawing increase by one, the probability of
From “Input 224x224x3” images of 495 SMCs in RN model of AI
being normal person is estimated to increase by 0.23, and its 95%
approach, the values of “Output 2-node” are determined. The 5-fold
confidence interval is 0.23 ± 0.11. Conversely, the RN model inherently
cross-validation suggests the appropriateness of the model with the
functions as a black-box, offering no explicit decision-making process.
epoch to be 20. Table 1 summarizes the results of the existing LR model
Given its reliance on raw RGB values without predefined features, direct
(Kim et al., 2014) and the RN model. The LR model shows an accuracy of
interpretation remains elusive. Thus, active research to explain this
0.8159 in Group-N and 0.8163 in Group-P, and overall accuracy of
process is being done under the topic of eXplainable AI (Adadi & Ber­
0.8162. On the other hand, the RN model shows significantly better
rada, 2018). The CAM (Zhou et al., 2016) provides explanations via
accuracy in all folds, with an accuracy of 0.8806 in Group-N, 0.9422 in
heatmap on the input image to show which part of the drawing was
Group-P, and overall, 0.9172. When this is subdivided and expressed in
considered and how important it was for the decision. The importance of
the number of misclassifications, in the two LR and RN models, the
a part increases from light blue to dark red. In this paper, we employ
numbers of misclassifications of Group-N as Group-P are 37 and 24,
Grad-CAM (Selvaraju et al., 2017) to derive the CAMs. We show the
respectively, and 12 in both models. The numbers of misclassifications
difference between statistical (LR model and SR model) and AI (RN
of Group-P as Group-N are 54 and 17, respectively, and 12 in both
model) approaches in providing the information on their decision
models.
processes.
We randomly selected 30 SMCs, and their RN model classification
results are presented in Fig. 2. Among the 12 SMCs of Group-N (two
3. Classification of psychological state into several groups columns on the left), 8 SMCs in the upper four rows are correctly clas­
sified into Group-N and 4 SMCs in the bottom two rows are misclassified
In this section, we compare the accuracies of the LR and RN models into Group-P, and among the 18 SMCs of Group-P (6 SMCs each of
in classifying SMC drawers into Group-N or Group-P, evaluating the Group-A, Group-D and Group-S in 3rd, 4th, and 5th columns, respec­
efficacy of the RN model. The same is done in other two cases of tively), 12 SMCs in the upper four rows are correctly classified into
classifications. Group-P and 6 SMCs in the bottom two rows are misclassified into
Group-N. Several observers, including the authors, have made the
3.1. Classification into Group-N or Group-P following intuitive deductions: In Group-N, a discrepancy exists between
the 8 upper-left SMCs, correctly classified, and 4 lower-left ones, mis­
Given the values of Number of colors used, Accuracy, and Ratio of classified. In Group-P, a discrepancy exists between the 12 upper-right

3
S.-i. Kim et al. New Ideas in Psychology 73 (2024) 101074

Table 1
Comparison of classification accuracies of Group-N and Group-P between the LR and RN models.
Statistical LR Model Artificial Intelligence RN model

Fold-1 (40a, 59b) Fold-2 (40a, 59b) Fold-3 (40a, 59b) Fold-4 (40a, 59b) Fold-5 (41a, 58b) Total (201a, 294b)
a a
Group-N(201) .8159 (164) .875 (35) .800 (32) .925 (37) .875 (35) .927 (38) .8806 (177)
Group-P(294)b .8163 (240)b .932 (55) .949 (56) .932 (55) .949 (56) .948 (55) .9422 (277)
Total(495) .8162 (404) .909 (90) .889 (88) .929 (92) .919 (91) .939 (93) .9172 (454)
a
number of non-patients.
b
number of patients.

Fig. 2. SMC Examples of classifications into two groups by RN model.

4
S.-i. Kim et al. New Ideas in Psychology 73 (2024) 101074

SMCs, correctly classified, and the 6 lower-right SMCs, misclassified. Table 3


Also, there is a difference between the 8 upper-left SMCs of Group-N and Classifications by RN model among three and four groups.
12 upper-right SMCs of Group-P, which are correctly classified. There is 3 Groups Accuracy 4 Groups Accuracy
also a difference between 4 lower left SMCs of Group-N and 6 lower right
Group-N(210) .890(187)
SMCs of Group-P, which are misclassified. On the contrary, there is a Group-A(100) .600(60) Group-A(100) .530(53)
similarity between the 8 upper-left SMCs of Group-N, correctly classi­ Group-D(94) .660(62) Group-D(94) .532(50)
fied, and 6 lower-right SMCs of Group-P, misclassified. Group-S(100) .900(90) Group-S(100) .830(83)
AI appears to employ judgment standards reminiscent of human Total(294) .721(212) Total(495) .754(373)

intuition. If distinguishing properties exist between Group-N and Group-


P drawings, and AI accurately detects them, it could achieve perfect 0.721 and 0.754, respectively, which means the usefulness of the RN
classification. However, the AI might fail to recognize such properties, model. As observed in the prior two-group classification, schizophrenia
or the drawings might inherently lack them. Thus, AI’s misclassifications again displayed atypical results. The RN model shows the highest ac­
may be due to its lack of performance or due to the nature of the curacy in distinguishing schizophrenia from normal and other psychosis
drawings as a medium. In Fig. 2, the different classifications by the LR types. The RN model of AI approach indicates that SMCs from schizo­
model from the RN model are marked with asterisk. Investigation of phrenia patients possess distinct characteristics, setting them apart from
these SMCs could give some information on the properties of SMCs. those of normal, anxiety, and depression individuals. This result appears
to be a promising subject for further study.
3.2. Classifications into Group-N or one of Group-A, Group-D, and In summary, while both the RN model of AI approach and the LR
Group-S model of statistical approach are useful in the classification tasks, the RN
model significantly outperforms the LR model, especially in dis­
Group-P is divided into 100 anxiety patients (Group-A), 94 depres­ tinguishing Group-S from others.
sion patients (Group-D), and 100 schizophrenia patients (Group-S). Kim
et al. (2014) applied the LR model to the classifications into one of the 4. Estimation of the psychological state in a numerical value
two groups in the following three cases: Group-N or Group-A, Group-N
or Group-D, and Group-N or Group-S. Classification into Group-N or Now, we compare the results of Kim, Kang, et al. (2012) using the SR
Group-A follows the prior procedure, differing only in the function Z: model in the estimation task of psychological state in a numerical value
Z = 4.725 + 9.649 x Ratio of cool colors - 10.285 x Accuracy + 3.581 x Ratio with those of the RN model. Fifty-eight individuals with MMSE-K scores
of orange - 8.849 x Ratio of yellow + 8.640 x Ratio of green created a set of three drawings in art therapy tools: SMC, PPAT, and FSA.
Various features are extracted numerically from these drawings by a
where the confidence intervals and the p values are omitted. Next, the quantitative method, and the scores of MMSE-K are estimated from them
RN model is applied. The methodology for the five-fold cross-validation by the SR model. In the case of SMC, the MMSE-K score is estimated by
and epoch determination remains consistent. The results of the two the following regression function with the feature accuracy in the SMC as
models are summarized on the left of Table 2. The results for each fold in an explanatory variable:
the RN model are omitted. In addition, the results of classifications in
other cases, Group-N and Group-D, and Group-N and Group-S, are MMSE-K score (SMC) = 7.341 + 0.157 x Accuracy.
shown in the middle and on the right of Table 2, respectively. The p values are not presented. In this case, R2 = 0.494, which means
The LR model classified Group-N and Group-A with accuracies of insufficient for practical use. This limitation stems from the small sample
0.935 and 0.850, averaging 0.907. In contrast, the RN model yielded size of 58. In the case of PPAT, realism and line quality are selected as
accuracies of 0.955 and 0.850, averaging 0.920. In the classification into explanatory variables:
Group-N or Group-S, it is notable that the accuracy is lowest in the LR
model, while it is highest in the RN model, showing that the difference is MMSE-K score (PPAT) = 3.002 + 2.367 x Realism + 1.835 x Line quality.
especially large in the case of schizophrenia. The findings suggest
In this case, R2 = 0.591, which means not satisfactory, but barely
distinct characteristics in SMCs by schizophrenia patients, with the RN
useable. In the case of FSA, R2 = 0.390, which means impossible to use
model proficiently detecting them.
it. Fig. 3 displays the MMSE-K scores and their SR model estimations for
the three tools, represented by triangles. An ideal estimation aligns all
3.3. Classifications into three and four groups
plots on the 45-degree line.
For the estimation task, the RN model with “Output 1-Node” in Fig. 1
Table 3 shows the results of the classification into one of three
can be used. However, the model yielded no significant results for all
groups, Group-A, Group-D, and Group-S, and into one of four groups,
three tools, represented by plots with full circles. In all tools, R2 = 0. To
Group-N, Group-A, Group-D, and Group-S by the RN model. The
supplement this, the RN model is used with the addition of a layer
multinomial LR model of the statistical approach can be applied but is
composed of the quantitative features used in the SR model. The results
omitted. If the results are estimated intuitively in advance, the classifi­
are the plots with empty circles. While superior to the image-only RN
cation into the one of three groups would be difficult due to the possible
model for the SMC and FSA, this approach remains largely ineffective.
small differences in the SMCs among the three psychosis types, and the
Furthermore, the results for the PPAT were notably anomalous, with
classification into the one of four groups is likely to be easier than the
R2 < 0. This shows the limitations of deep CNNs, like the RN model,
three groups due to the inclusion of Group-N with the possible quite
when dealing with small datasets.
large differences from the other three psychosis types. As expected, the
There is one thing to note. In Fig. 3, the three drawings
accuracies of the classifications among the three and four groups are

Table 2
Classifications between two groups including Group-N.
LR RN LR RN LR RN

Group-N(201) .935(188) .955(192) Group-N(201) .965(194) .940(189) Group-N(201) .945(190) .990(199)


Group-A(100) .850(85) .850(85) Group-D(94) .745(70) .904(85) Group-S(100) .660(66) .980(98)
Total (301) .907(273) .920(277) Total (295) .895(264) .929(274) Total(301) .850(256) .987(297)

5
S.-i. Kim et al. New Ideas in Psychology 73 (2024) 101074

Fig. 3. Estimation of psychological states by SR, RN, and RN with features in SR.

corresponding to the three plots surrounded by a circle that fall below underperformed compared to the LR model, except in the case of SMC.
are drawn by the same person with relatively high MMSE-K score of 23. However, the integration of quantitative features improved the RN
The drawings are shown at the top of Fig. 4. The RN model estimated model’s accuracy. The RN model with quantitative features potentially
MMSE-K scores to be 10.3, 10.2, and 10.6, respectively from these three combines the strengths of both the image-based RN model and the
drawings. Another set of three drawings, enclosed by a square and quantitative-feature LR model.
created by an individual with a relatively low MMSE-K score of 12, are
depicted at the bottom of Fig. 4. The estimated MMSE-K scores for these 5. Discussion
are 9.4, 10.0, and 8.24, respectively. These estimated values markedly
diverge from the SR model results (Kim, Kang, et al., 2012): 18.8, 19.8, 5.1. Comparison of statistical and AI approaches
17.4, and 12.7, 17.4, 16.4, respectively. It is notable that the RN model
of AI approach estimated all three drawings of two people as low For small datasets, the statistical approach outperforms the AI
MMSE-K scores and confirmed the fact that the three pictures in each set method in both classification and estimation. However, for large data­
were drawn by the same person. This suggests that unique characteris­ sets, the statistical approach is inferior to the AI method. The extent to
tics inherent to each individual may be evident across their set of three which the data size matters between the two approaches is difficult to
drawings. This point will be discussed later. find a certain rule because it depends on the kinds of statistical and AI
The results indicate that the RN model is ineffective for numerical models, the types of drawings, the number of classification groups, and
estimations, and the SR model performs poorly for the FSA. Rather than the properties of estimation variables. The statistical approach, regard­
using the SR model for numerical estimation, we can segment the data less of dataset size, elucidates the process and numerically demonstrates
into intervals and utilize the LR model for classification. For instance, result reliability.
instead of estimating MMSE-K scores from 0 to 30, the LR model clas­ For example, the LR model showed lowest performance accuracy of
sifies scores into one of two intervals: [0, 10.5) (Group-0) or [10.5, 30] 0.914 in the classification of MMSE-K score using 58 SMCs. However, for
(Group-1). The results are shown in Table 4. The image-only RN model relatively small data size of 58, the LR model can provide information on

Fig. 4. Two particular cases detected by the RN model.

6
S.-i. Kim et al. New Ideas in Psychology 73 (2024) 101074

Table 4
Effects of adding quantitative features to AI approach.
SMC PPAT FSA

RN RN + Quan. LR RN RN + Quan LR RN RN + Quan LR

Group-0(12) .667(8) 1.00(12) .667(8) .333(4) 1.00(12) .667(8) .333(4) .750(9) .417(5)
Group-1(46) 1.00(46) .913(42) .978(45) .957(44) .870(40) .913(42) .978(45) .935(43) .978(45)
Total(58) .931(54) .931(54) .914(53) .828(48) .897(52) .862(50) .845(49) .897(52) .862(50)

the process by which these decisions were made and also on its reli­ The 95% confidence interval for the coefficient 0.0944 of the accu­
ability. The probability of the person with MMSE-K score of 23 being racy is quite wide, 0.0944 ± 0.0521 = [0.0423, 0.1465]. For the lower
classified into Group-1 is estimated by substituting the feature accuracy confidence limit of 0.0423, the probabilities of the two persons above
of his or her SMC in Fig. 4 to the function, being classified into Group-1 are 0.561 and 0.449, respectively. In the
case of the upper confidence limit of 0.1465, the probabilities become
exp(Z) / [1 +exp(Z)],
0.992 and 0.962, respectively. The intervals are [0.561, 0.992] for the
where Z = - 1.6057[-3.1147, − 0.0967] (0.0370) + 0.0944[0.0423, former, and [0.449, 0.962] for the latter. Both intervals are so wide that
0.1465](0.0004) x Accuracy (0.0004). The numbers in square brackets they cannot form useful information. This is due to the small size of the
are 95% confidence intervals, and the numbers in round brackets are p data. As the size of the data increases, the width of the confidence level
values. narrows, and information becomes more useful. In this way, the effect of
The value of feature accuracy of this SMC is 43.8 (%), thus Z = the coefficient of accuracy, 0.0944, on the classification process can be
2.52902, and the probability of Group-1 is estimated to be 0.919, which numerically and statistically explained. There are other meanings of this
is more than 0.5, so the person is correctly classified into Group-1. On coefficient as well. The ratio of probability being Group-1 to probability
the other hand, the value of accuracy of the SMC of a person close to being Group-0 is one of useful information. As the accuracy increases by
having a dementia with MMSE-K score of 12 is 33.1, Z = 1.51894, and 1, the estimate of this ratio is multiplied by exp (0.0944) = 1.099. As the
probability of 0.820, which means that person is correctly classified also accuracy of the previous SMC increases from 43.8 to 44.8, the estimate of
into Group-1. If the accuracy is as high as 50.0, this probability is esti­ this ratio 0.919/(1–0.919) = 11.346 increases to 11.346 x 1.099 =
mated as high as 0.957, so the person is classified into Group-1, and if 12.469. Further statistical analysis is omitted.
the accuracy is as low as 15.0, the probability is estimated as low as Conversely, the AI method surpasses the statistical approach in
0.457, and so the person is classified into Group-0. The cutoff value of classification tasks. Its accuracy seems to be able to reach the highest
the accuracy that distinguishes between the two groups is 17.0. limit the drawings themselves have, when the data size is large enough.

Fig. 5. CAMs of SMC samples of classification types in Fig. 2 and notable SMC, PPAT, and FSA in Fig. 4.

7
S.-i. Kim et al. New Ideas in Psychology 73 (2024) 101074

In estimation tasks, similar superiority emerged when combined with 6. Conclusion


quantitative features. Also, it showed some reasonable and notable re­
sults, for example, showing the classification results acceptable by 6.1. Summary and remarks
human common sense, and identifying the drawer and estimating his or
her three drawings in low values consistently. However, its drawback In this study, we introduced a deep learning-based AI methodology
relative to the statistical method is the difficulty in explicating its de­ for art therapy assessments and evaluated its efficacy against a tradi­
cision process, prompting various studies including CAM. Fig. 5 shows tional statistical approach. A notable advantage of the AI approach is
the CAMs of the randomly chosen 4 SMCs in Fig. 2, and 6 drawings in that it negates the necessity for manual feature engineering. Instead of
Fig. 4. Collaboration of art therapists, psychologists, psychiatrists, stat­ relying on engineers to handpick and extract pertinent features from
isticians, color analysts, and AI experts in reviewing and analyzing the drawings via various image processing modules, the AI model autono­
CAMs could offer answers, clues, or inspirations about how AI reach mously constructs relevant features. This not only streamlines the en­
these decisions. gineering process but also enables the AI model to outperform the
For a small dataset size of 58, the AI method’s estimation is inef­ statistical counterpart, as it is not confined to human-selected features.
fective. The reason is that the learning is biased with only 58 drawings in However, a caveat is that the AI model typically requires a more
the CNN, which usually requires large data. There are many studies to extensive dataset than its statistical counterpart. When working with
solve the problem. One of them is the data augmentation that can limited data, the AI’s performance might lag behind that of the statistical
compensate for this weakness of CNNs. (Kim et al., 2020). It has not been approach, as observed in the two art therapy assessment tasks detailed in
applied in this study. For small data in the estimation task, we provided this paper.
AI with additional information of quantitative features and let AI use it. In the classification task using extensive data, the AI approach using
Then AI yielded the ‘ridiculous’ result of R2 < 0. The ‘intelligent’ sta­ RN model, grounded solely on SMC images, consistently outperformed
tistical model would prevent this by disregarding non-contributory in­ the LR model of the statistical approach. This was especially evident
formation, assigning it a coefficient of 0. This is an example showing the when differentiating schizophrenia from other forms of psychosis. This
necessity of collaboration between AI and statistical approaches. suggests the RN model may have effectively discerned nuanced char­
In summary, for small and complex task dataset (e.g., numerical acteristics unique to schizophrenia. Even with a smaller dataset, aug­
estimation of MMSE-K using 58 SMC drawings), the statistical approach menting the images with quantitative features resulted in the AI model
excels over the AI method. Conversely, for larger dataset and simple task outperforming the statistical one.
(e.g., psychological state classification with 495 SMC drawings), the AI On the other hand, in the numerical estimation of MMSE-K scores
method surpasses the statistical approach. This finding is consistent with using the limited dataset, the AI approach proved to be minimally
the common practice behind the choice of deep learning versus shallow effective. However, the AI model did yield noteworthy findings when
learning in AI, that simple statistical estimation models using well- the scores are grouped into categories and it was correspondingly tasked
engineered features perform better than deep learning models in data- with classification. This suggests potential commonalities in certain
scarce settings. features across the three drawings, correlating to the individual drawers.
Active research efforts are underway to unpack the AI’s decision-making
5.2. Practical implications processes. These explorations might unearth new insights or inspire
fresh perspectives on drawing interpretation. The qualitative informa­
Assessing mental states based on drawings using statistical and AI tion deduced by subjective knowledge and experience of art therapists,
approaches could have both positive and negative practical implica­ the numerical values given by the mathematical functions of statistics,
tions. There are a few positive practical implications Statistical and AI and the insight of AI approach are complementary to each other in the
approaches provide a more objective assessment of mental state. This art therapy assessment. Art therapy assessment will advance with
reduces potential human evaluation bias. Both approaches efficiently interdisciplinary collaborations across art, psychology, psychiatry, sta­
process large amounts of images. This scalability can be particularly tistics, and AI.
useful in research, clinical settings, and online mental health support
platforms. They can evaluate mental states without invasive data such as 6.2. Limitations of the study
self-reports, safeguarding, and individual privacy. On the other hand,
these methods present potentially negative practical implications. The A critical factor behind the recent advancements in AI using DNNs,
use of statistical and AI approaches raises ethical concerns, including where results often surpass human expertise, is the availability of
consent and data privacy. Furthermore, non-diverse training data may extensive annotated datasets. For instance, Google DeepMind demon­
introduce bias, and model reliability isn’t guaranteed. In addition, user strated an AI capability that outperformed human experts in diagnosing
skepticism towards statistical and AI evaluations may affect acceptance retinal diseases (De Fauw et al., 2018). Similarly, researchers at Stanford
and trust. University highlighted an AI’s prowess in skin cancer classification,
Variations in participants’ gender, age, education, etc. may affect the matching the expertise of dermatologists (Esteva et al., 2017). These
dependent variables. Participants had an average age of 81.6 years, with breakthroughs were realized through close collaboration with medical
a standard deviation of 7.77 years, and 28.6% were male. We omitted professionals, facilitating the generation of large annotated datasets for
this information from the data for training and evaluation because the AI training, subsequent diagnostic result analyses, and strategizing
correlation coefficients between the MMSE-K and both age and gender effective applications of the findings. In this context, enhancing MMSE-K
are negligible. The correlation coefficients show statistically negligible score estimations necessitate collecting more annotated data. The
relationships between them. However, the other factor, e. g. culture, requisite data size for achieving satisfactory outcomes varies depending
may have a significant relationship with the dependent variable. The on application domains and the specific art therapy tools in use,
samples in this study are confined to the same culture. Including par­ necessitating empirical determination through experimentation. Such
ticipants from different cultures broadens the applicability of these scientific investigation would be facilitated by standardized and open
approaches. datasets spearheaded by the art therapy community. We can find a
similar initiative in the medical research domain, exemplified by the
Medical Information Mart for Intensive Care (MIMIC) datasets (Johnson
et al., 2016).
A significant limitation of current DNN models is their lack of
transparency in explaining their prediction processes. This stands in

8
S.-i. Kim et al. New Ideas in Psychology 73 (2024) 101074

contrast to the statistical approach, which offers clarity regarding its capture the nuanced, qualitative nature inherent in art therapy. By
classification or estimation results and their reliability, regardless of the integrating the qualitative AI expert system, the quantitative deep CNN,
dataset’s size. Even with interpretative tools for DNNs, like CAM, and the statistical methods of SR and LR models, we foresee advances in
modern AI models often lack a comprehensive explanation of their art therapy.
decision-making mechanisms. mechanisms. The pursuit of explainable
AI prioritizes addressing this transparency challenge. Significant efforts CRediT authorship contribution statement
are underway in this domain, and we stand to gain from the advance­
ments in this area. Seong-in Kim: Conceptualization, Data curation, Supervision,
Writing – original draft. Kee-Eung Kim: Investigation, Validation,
6.3. Directions for future work Writing – review & editing. Seunghwan Song: Investigation, Visuali­
zation, Writing – original draft.
This study focused only on three specific psychosis types: anxiety,
depression, and schizophrenia, and examined three art therapy assess­
ment tools: the SMC, the PPAT, and the FSA. The scope can be expanded Declaration of competing interest
to encompass a wider range of psychological states, such as attention-
deficit/hyperactivity disorder, autism, and dementia. Additionally, All authors report no financial interests or potential conflicts of
various other art therapy tools, including the House, Tree, and Person interest.
(HTP), Draw A Person (DAP), KFD, and Kinetic School Drawing (KSD)
could be integrated. The deep CNN, an instance of DNN, offers potential Data availability
in interpreting drawings across diverse art therapy mediums. Mean­
while, the expert system, another AI methodology, is aptly designed to Data will be made available on request.

Appendix. List of abbreviations

AI Artificial Intelligence
C_CREATES Computerized Color-Related Elements Art Therapy Evaluation System
CAM Class Activation Map
CNN Convolutional Neural Network
DAP Draw A Person
DNN Deep Neural Networks
FSA Face Stimulus Assessment
Group-A Anxiety patients
Group-D Depression patients
Group-N Normal persons
Group-P Psychosis patients
Group-S Schizophrenia patients
HTP House, Tree, and Person
ILSVRC ImageNet Large Scale Visual Recognition Challenge
KFD Kinetic Family Drawing
KSD Kinetic School Drawing
LR Logistic Regression
MIMIC Medical Information Mart for Intensive Care
MMSE-K Mini-Mental State Examination-Korean
PPAT Person Picking an Apple from a Tree
RN ResNet
Sklean Scikits learn
SMC Structured Mandala Coloring

References Esteva, A., Kuprel, B., Novoa, R. A., Ko, J., Swetter, S. M., Blau, H. M., & Thrun, S.
(2017). Dermatologist-level classification of skin cancer with deep neural networks.
Nature, 542(7639), 115–118. https://doi.org/10.1038/nature21056
Adadi, A., & Berrada, M. (2018). Peeking inside the black-box: A survey on explainable
Gantt, L. M. (1990). A validity study of the Formal Elements Art Therapy Scale (FEATS) for
artificial intelligence (XAI). IEEE Access, 6, 52138–52160. https://doi.org/10.1109/
diagnostic information in patients’ drawings. Unpublished dissertation: University of
ACCESS.2018.2870052
Pittsburgh.
American Psychiatric Association. (2022). Diagnostic and statistical manual of mental
Gantt, L. M., & Tabone, C. (2003). The formal elements art therapy scale and “Draw a
disorders, 5. https://doi.org/10.1176/appi.books.9780890425787
Person Picking an Apple from a Tree”. In C. A. Malchiodi (Ed.), Handbook of art
Betts, D. J. (2003). Developing projective drawing test: Experience with the face Stimulus
therapy (pp. 420–427). The Guilford Press.
assessment (FSA). Art Therapy: Journal of the American Art Therapy Association, 20(2),
Giarratano, J., & Riley, G. (2005). Expert systems, principles and programming (5th ed.).
77–82. https://doi.org/10.1080/07421656.2003.10129393
Course Technology.
Curry, N. A., & Kasser, T. (2005). Can coloring mandalas reduce anxiety? Art Therapy:
Gussak, D. E., & Nyce, J. M. (1999). To bridge art therapy and computer technology: The
Journal of the American Art Therapy Association, 22(2), 81–85. https://doi.org/
visual toolbox. Art Therapy: Journal of the American Art Therapy Association, 16(4),
10.1080/07421656.2005.10129441
194–196. https://doi.org/10.1080/07421656.1999.10129478
De Fauw, J., Ledsam, J. R., Romera-Paredes, B., Nikolov, S., Tomasev, N., Blackwell, S.,
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition.
Askham, H., Glorot, X., O’Donoghue, B., & Visentin, D. (2018). Clinically applicable
In Proceedings of the IEEE conference on computer vision and pattern recognition (pp.
deep learning for diagnosis and referral in retinal disease. Nature Medicine, 24(9),
770–778). https://doi.org/10.1109/CVPR.2016.90
1342–1350. https://doi.org/10.1038/s41591-018-0107-6
Johnson, A. E., Pollard, T. J., Shen, L., Lehman, L. W. H., Feng, M., Ghassemi, M., …
Derogatis, L. R. (1992). SCL-90-R: Administration, scoring & procedures manual-II for the
Mark, R. G. (2016). MIMIC-III, a freely accessible critical care database. Scientific
(revised) version and other instruments of the psychopathology rating scale series (pp.
Data, 3(1), 1–9. https://doi.org/10.1038/sdata.2016.35
1–16). Clinical Psychometric Research.
Kim, S. I. (2008). Commentaries [To the editor]. Art Therapy: Journal of the American Art
Therapy Association, 25(1), 41.

9
S.-i. Kim et al. New Ideas in Psychology 73 (2024) 101074

Kim, S. I. (2010). A computer system for the analysis of color-related elements in art Kim, S. I., Kang, H. S., & Kim, Y. H. (2009b). A computer system for art therapy
therapy assessment: Computer Color-Related Elements Art Therapy Evaluation assessment of elements in structured mandala. The Arts in Psychotherapy, 36(1),
System (C_CREATES). The Arts in Psychotherapy, 37(5), 378–386. https://doi.org/ 19–28. https://doi.org/10.1016/j.aip.2008.09.002
10.1016/j.aip.2010.09.002 Kim, S. I., Kim, K. E., Lee, Y., Lee, S. K., & Yoo, S. (2006a). How to make a machine think
Kim, S. I. (2016). Assessments and computer technology. In D. Gussak, & M. Rosal (Eds.), in art psychotherapy: An expert system’s reasoning process. The Arts in
The Wiley handbook of art therapy (pp. 587–599). Wiley-Blackwell. Psychotherapy, 33(5), 383–394. https://doi.org/10.1016/j.aip.2006.06.003
Kim, S. I. (2017). Computational art therapy. Charles C. Thomas. Kim, S. I., Ryu, H. J., Hwang, J. O., & Kim, M. S. H. (2006b). An expert system approach
Kim, S. I. (2018). Computational art therapy in art therapy assessment research. In to art psychotherapy. The Arts in Psychotherapy, 33(1), 59–75. https://doi.org/
C. Malchiodi (Ed.), The handbook of art therapy and digital Technology (pp. 348–372). 10.1016/j.aip.2005.07.004
Jessica Kingsley Publishers. Kutner, M. H., Nachtsheim, C. J., Neter, N., & Li, W. (2005). Applied linear statistical
Kim, S. I., Bae, J., & Lee, Y. (2007). A computer system to rate the color-related formal models, 5. McGraw-Hill.
elements in art therapy assessments. The Arts in Psychotherapy, 34(3), 223–237. Mattson, D. C. (2010). Issues in computerized art therapy assessment. The Arts in
https://doi.org/10.1016/j.aip.2007.02.002 Psychotherapy, 37, 328–334. https://doi.org/10.1016/j.aip.2010.05.008
Kim, S. I., Betts, D. J., Kim, H. M., & Kang, H. S. (2009a). Statistical models to estimate Mattson, D. C. (2012a). An introduction to the computerized assessment of art-based
level of psychological disorder based on a computer rating system: An application to instruments. Art Therapy: Journal of the American Art Therapy Association, 29(1),
dementia using structured mandala drawings. The Arts in Psychotherapy, 36(4), 27–32. https://doi.org/10.1080/07421656.2012.648091
214–221. https://doi.org/10.1016/j.aip.2009.03.002 Mattson, D. C. (2012b). Constructing the computer-rated face Stimulus assessment-
Kim, J. H., Choo, W., & Song, H. O. (2020). Puzzle mix: Exploiting saliency and local revised (FSA-R) to assess formal elements of major depressive disorder (MDD). The
statistics for optimal mixup. In International conference on machine learning (pp. Arts in Psychotherapy, 39(1), 31–37. https://doi.org/10.1016/j.aip.2011.11.003
5275–5285). PMLR. http://proceedings.mlr.press/v119/kim20b.html. Park, J.-H., & Kwon, Y. C. (1990). Modification of the mini-mental state examination for
Kim, S. I., Ghil, J. H., Choi, E. Y., Kwon, O. S., & Kong, M. (2014). A computer system use in the elderly in a non-western society. Part 1. Development of Korean version of
using a structured mandala to differentiate and identify psychological disorders. The mini-mental state examination. International Journal of Geriatric Psychiatry, 5(6),
Arts in Psychotherapy, 41(2), 181–186. https://doi.org/10.1016/j.aip.2014.02.003 381–387. https://doi.org/10.1002/gps.930050606
Kim, S. I., & Hameed, I. A. (2009). A computer system to rate the variety of color in Reod, D. (2020). Google’s DeepMind A.I. beats doctors in breast cancer screening trial.
drawing. Art Therapy: Journal of the American Art Therapy Association, 26(2), 73–79. Health Science. Retrieved https://www.cnbc.com/2020/01/02/googles-deepmin
https://doi.org/10.1016/j.aip.2007.02.002 d-ai-beats-doctors-in-breast-cancer-screening-trial.html. (Accessed 18 January
Kim, S. I., Han, J., Kim, Y. H., & Oh, Y. J. (2011). A computer art therapy system for 2002).
kinetic family drawing (CATS_KFD). The Arts in Psychotherapy, 38(1), 17–28. https:// Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017).
doi.org/10.1016/j.aip.2010.10.002 Grad-cam: Visual explanations from deep networks via gradient-based localization.
Kim, S. I., Han, J., & Oh, Y. J. (2012a). A computer art assessment system for the In Proceedings of the IEEE international conference on computer vision (pp. 618–626).
evaluation of space usage in drawings with application to the analysis of its https://doi.org/10.1007/s11263-019-01228-7
relationship to level of dementia. New Ideas in Psychology, 30(5), 300–307. https:// Walpole, R. E., Myers, R. H., Myers, S. L., & Ye, K. (2007). Probability & statistics for
doi.org/10.1016/j.newideapsych.2012.02.002 engineers & scientists, 9. Pearson Education, Inc.
Kim, S. I., Kang, H. S., Chung, S., & Hong, E. J. (2012). A statistical approach to Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2016). Learning deep
comparing the effectiveness of several art therapy tools in estimating the level of a features for discriminative localization. In Proceedings of the IEEE conference on
psychological state. The Arts in Psychotherapy, 39(5), 397–403. https://doi.org/ computer vision and pattern recognition (pp. 2921–2929).
10.1016/j.aip.2012.07.004

10

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy