%0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e60520 %T Usefulness of Automatic Speech Recognition Assessment of Children With Speech Sound Disorders: Validation Study %A Kim,Do Hyung %A Jeong,Joo Won %A Kang,Dayoung %A Ahn,Taekyung %A Hong,Yeonjung %A Im,Younggon %A Kim,Jaewon %A Kim,Min Jung %A Jang,Dae-Hyun %+ Department of Rehabilitation Medicine, Incheon St Mary’s Hospital, College of Medicine, The Catholic University of Korea, 22 Banpo-daero, Seocho-gu, Seoul, 06591, Republic of Korea, 82 0322806601, dhjangmd@naver.com %K speech sound disorder %K speech recognition software %K speech articulation tests %K speech-language pathology %K child %D 2025 %7 14.1.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Speech sound disorders (SSDs) are common communication challenges in children, typically assessed by speech-language pathologists (SLPs) using standardized tools. However, traditional evaluation methods are time-intensive and prone to variability, raising concerns about reliability. Objective: This study aimed to compare the evaluation outcomes of SLPs and an automatic speech recognition (ASR) model using two standardized SSD assessments in South Korea, evaluating the ASR model’s performance. Methods: A fine-tuned wav2vec 2.0 XLS-R model, pretrained on 436,000 hours of adult voice data spanning 128 languages, was used. The model was further trained on 93.6 minutes of children’s voices with articulation errors to improve error detection. Participants included children referred to the Department of Rehabilitation Medicine at a general hospital in Incheon, South Korea, from August 19, 2022, to June 14, 2023. Two standardized assessments—the Assessment of Phonology and Articulation for Children (APAC) and the Urimal Test of Articulation and Phonology (U-TAP)—were used, with ASR transcriptions compared to SLP transcriptions. Results: This study included 30 children aged 3-7 years who were suspected of having SSDs. The phoneme error rates for the APAC and U-TAP were 8.42% (457/5430) and 8.91% (402/4514), respectively, indicating discrepancies between the ASR model and SLP transcriptions across all phonemes. Consonant error rates were 10.58% (327/3090) and 11.86% (331/2790) for the APAC and U-TAP, respectively. On average, there were 2.60 (SD 1.54) and 3.07 (SD 1.39) discrepancies per child for correctly produced phonemes, and 7.87 (SD 3.66) and 7.57 (SD 4.85) discrepancies per child for incorrectly produced phonemes, based on the APAC and U-TAP, respectively. The correlation between SLPs and the ASR model in terms of the percentage of consonants correct was excellent, with an intraclass correlation coefficient of 0.984 (95% CI 0.953-0.994) and 0.978 (95% CI 0.941-0.990) for the APAC and UTAP, respectively. The z scores between SLPs and ASR showed more pronounced differences with the APAC than the U-TAP, with 8 individuals showing discrepancies in the APAC compared to 2 in the U-TAP. Conclusions: The results demonstrate the potential of the ASR model in assessing children with SSDs. However, its performance varied based on phoneme or word characteristics, highlighting areas for refinement. Future research should include more diverse speech samples, clinical settings, and speech data to strengthen the model’s refinement and ensure broader clinical applicability. %M 39576242 %R 10.2196/60520 %U https://www.jmir.org/2025/1/e60520 %U https://doi.org/10.2196/60520 %U http://www.ncbi.nlm.nih.gov/pubmed/39576242
Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.
Alternative Proxies: