TY - JOUR AU - Kim, Do Hyung AU - Jeong, Joo Won AU - Kang, Dayoung AU - Ahn, Taekyung AU - Hong, Yeonjung AU - Im, Younggon AU - Kim, Jaewon AU - Kim, Min Jung AU - Jang, Dae-Hyun PY - 2025 DA - 2025/1/14 TI - Usefulness of Automatic Speech Recognition Assessment of Children With Speech Sound Disorders: Validation Study JO - J Med Internet Res SP - e60520 VL - 27 KW - speech sound disorder KW - speech recognition software KW - speech articulation tests KW - speech-language pathology KW - child AB - Background: Speech sound disorders (SSDs) are common communication challenges in children, typically assessed by speech-language pathologists (SLPs) using standardized tools. However, traditional evaluation methods are time-intensive and prone to variability, raising concerns about reliability. Objective: This study aimed to compare the evaluation outcomes of SLPs and an automatic speech recognition (ASR) model using two standardized SSD assessments in South Korea, evaluating the ASR model’s performance. Methods: A fine-tuned wav2vec 2.0 XLS-R model, pretrained on 436,000 hours of adult voice data spanning 128 languages, was used. The model was further trained on 93.6 minutes of children’s voices with articulation errors to improve error detection. Participants included children referred to the Department of Rehabilitation Medicine at a general hospital in Incheon, South Korea, from August 19, 2022, to June 14, 2023. Two standardized assessments—the Assessment of Phonology and Articulation for Children (APAC) and the Urimal Test of Articulation and Phonology (U-TAP)—were used, with ASR transcriptions compared to SLP transcriptions. Results: This study included 30 children aged 3-7 years who were suspected of having SSDs. The phoneme error rates for the APAC and U-TAP were 8.42% (457/5430) and 8.91% (402/4514), respectively, indicating discrepancies between the ASR model and SLP transcriptions across all phonemes. Consonant error rates were 10.58% (327/3090) and 11.86% (331/2790) for the APAC and U-TAP, respectively. On average, there were 2.60 (SD 1.54) and 3.07 (SD 1.39) discrepancies per child for correctly produced phonemes, and 7.87 (SD 3.66) and 7.57 (SD 4.85) discrepancies per child for incorrectly produced phonemes, based on the APAC and U-TAP, respectively. The correlation between SLPs and the ASR model in terms of the percentage of consonants correct was excellent, with an intraclass correlation coefficient of 0.984 (95% CI 0.953-0.994) and 0.978 (95% CI 0.941-0.990) for the APAC and UTAP, respectively. The z scores between SLPs and ASR showed more pronounced differences with the APAC than the U-TAP, with 8 individuals showing discrepancies in the APAC compared to 2 in the U-TAP. Conclusions: The results demonstrate the potential of the ASR model in assessing children with SSDs. However, its performance varied based on phoneme or word characteristics, highlighting areas for refinement. Future research should include more diverse speech samples, clinical settings, and speech data to strengthen the model’s refinement and ensure broader clinical applicability. SN - 1438-8871 UR - https://www.jmir.org/2025/1/e60520 UR - https://doi.org/10.2196/60520 UR - http://www.ncbi.nlm.nih.gov/pubmed/39576242 DO - 10.2196/60520 ID - info:doi/10.2196/60520 ER -
Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.
Alternative Proxies: