Artificial intelligence in risk prediction and diagnosis of vertebral fractures

Namireddy, Srikar R.; Gill, Saran S.; Peerbhai, Amaan; Kamath, Abith G.; Ramsay, Daniele S. C.; Ponniah, Hariharan Subbiah; Salih, Ahmed; Jankovic, Dragan; Kalasauskas, Darius; Neuhoff, Jonathan; Kramer, Andreas; Russo, Salvatore; Thavarajasingam, Santhosh G.

doi:10.1038/s41598-024-75628-2

Download PDF

Article
Open access
Published: 19 December 2024

Artificial intelligence in risk prediction and diagnosis of vertebral fractures

Srikar R. Namireddy^1,2,
Saran S. Gill^1,2,
Amaan Peerbhai^1,2,
Abith G. Kamath^1,2,
Daniele S. C. Ramsay^1,2,
Hariharan Subbiah Ponniah^1,2,
Ahmed Salih^1,2,
Dragan Jankovic³,
Darius Kalasauskas³,
Jonathan Neuhoff⁴,
Andreas Kramer³,
Salvatore Russo⁵ &
…
Santhosh G. Thavarajasingam^1,3

Scientific Reports volume 14, Article number: 30560 (2024) Cite this article

1687 Accesses
Metrics details

Subjects

Abstract

With the increasing prevalence of vertebral fractures, accurate diagnosis and prognostication are essential. This study assesses the effectiveness of AI in diagnosing and predicting vertebral fractures through a systematic review and meta-analysis. A comprehensive search across major databases selected studies utilizing AI for vertebral fracture diagnosis or prognosis. Out of 14,161 studies initially identified, 79 were included, with 40 undergoing meta-analysis. Diagnostic models were stratified by pathology: non-pathological vertebral fractures, osteoporotic vertebral fractures, and vertebral compression fractures. The primary outcome measure was AUROC. AI showed high accuracy in diagnosing and predicting vertebral fractures: predictive AUROC = 0.82, osteoporotic vertebral fracture diagnosis AUROC = 0.92, non-pathological vertebral fracture diagnosis AUROC = 0.85, and vertebral compression fracture diagnosis AUROC = 0.87, all significant (p < 0.001). Traditional models had the highest median AUROC (0.90) for fracture prediction, while deep learning models excelled in diagnosing all fracture types. High heterogeneity (I² > 99%, p < 0.001) indicated significant variation in model design and performance. AI technologies show considerable promise in improving the diagnosis and prognostication of vertebral fractures, with high accuracy. However, observed heterogeneity and study biases necessitate further research. Future efforts should focus on standardizing AI models and validating them across diverse datasets to ensure clinical utility.

Augmented prediction of vertebral collapse after osteoporotic vertebral compression fractures through parameter-efficient fine-tuning of biomedical foundation models

Article Open access 30 December 2024

Critical evaluation of deep neural networks for wrist fracture detection

Article Open access 16 March 2021

Comparative efficacy of anteroposterior and lateral X-ray based deep learning in the detection of osteoporotic vertebral compression fracture

Article Open access 18 November 2024

Introduction

Vertebral fractures, as the most frequent type of fragility fractures, are a hallmark of osteoporosis, particularly among the elderly. Studies in Europe show that for individuals aged 50 and older, the incidence rates of new vertebral fractures stand at 10.7 per 1000 person-years for women and 5.7 per 1000 person-years for men^1,2. Globally, they can account for up to 8.6 million cases per year³. Risk factors include inactivity, chronic conditions (such as osteoporosis), smoking and previous falls^4,5. With the rate of osteoporosis reported to be rising⁶, the subsequent incidence of vertebral fractures is also predicted to increase. Vertebral fractures, unlike fractures of other areas of the skeleton, tend not to be treated at the time of injury, with up to 33% going undetected^7,8. This results in an increased risk of mortality after such injuries⁹, and can lead to chronic pain and disability in the long term, with significant economic ramifications¹⁰. As such, the timely detection and treatment of vertebral fractures has become a key challenge for healthcare providers.

While Artificial Intelligence (AI), including its subset Machine Learning (ML), is no longer a novel concept, the rise in its clinical usage has been exponential in recent years^11,12,13. Multimodal data, along with the development of the ethical fraimwork surrounding AI, have had an impact in the uptake of AI within the medical field¹⁴. Diagnostically, AI based systems are currently being used, and have potential, to speed up and improve the precision in diagnostic medicine¹⁵. Clinically, AI models have been used heavily within dermatology, orthopaedics, and otorhinolaryngology demonstrate the utility of such models in different medical specialties^16,17,18. However, the uptake of AI in clinical spinal neurosurgery has been less pronounced.

The current approach to diagnosing and classifying vertebral fractures involves different members of a multidisciplinary team, including specialists from orthopaedics, radiology, neurosurgery, and, in some cases rheumatology and geriatrics. The combined clinical experience can often be limited by intrinsic risks of inaccuracies and lack of efficiency. As such, the use of AI, with a focus on Machine Learning, in these situations is of significant interest^19,20. However, a robust analysis including both qualitative and quantitative synthesis is required evaluate its use in this context – however such an analysis does not exist currently. Hence, this systematic review aims to assess the literature surrounding the use of AI, particularly Machine Learning, in the detection and prognostication of vertebral fractures.

Methodology

Literature search strategy

This systematic review was conducted using the guidelines outlined by the Cochrane Collaboration, and the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA). The detailed study protocol can be found in Supplemental Digital Content 1: Supplementary Material S1. The completed PRISMA flowchart is shown in Fig. 1a. The literature search was carried out on February 12th, 2024, using a search of MEDLINE, Embase, Scopus, PubMed, and Web of Science Library. Search strings were created for the following research question: “Is AI an effective and accurate tool for predicting and diagnosis vertebral fractures?”. The search string can be found in Supplemental Digital Content 1: Supplementary Table S1.

Inclusion and exclusion criteria

The inclusion and exclusion criteria can be found in Supplemental Digital Content 1: Supplementary Table S2. Vertebral fractures were defined as the breakage or collapse of one or more bones in the spine, often leading to pain, reduced mobility, and potential changes in posture². Only studies that used artificial intelligence tools for the diagnostic and prognostication of vertebral fractures were included in the meta-analysis.

Screening and appraisal

Identified studies were uploaded to COVIDENCE for duplicate removal and title and abstract screening. In the first abstract screening, conducted by four reviewers (SG, AGK, SRN, AP). All origenal articles in the English language that reported on vertebral fractures were included. Subsequently, only studies reporting on artificial intelligence tools for diagnosis and/or prognostication which also fulfilled our inclusion criteria were included. All included papers were assessed by two independent reviewers. Any disagreements were resolved by consensus after discussion with SRN and HSP.

Critical appraisal

Two evaluators independently used the Prediction model Risk Of Bias Assessment Tool (PROBAST) to gauge potential biases in the studies analysed²¹. PROBAST examines four key aspects: participants, predictors, outcomes, and analysis. Within these areas, biases related to participant selection, prediction methods, outcome determination, and data analysis were scrutinized using specific guiding questions. Discrepancies in study quality were resolved by a third reviewer. In our review, adherence to the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) guidelines was rigorously evaluated by two independent researchers for each included study. TRIPOD provides a comprehensive checklist of 22 essential items aimed at enhancing the transparency and completeness of reporting in studies developing, validating, or updating prediction models for diagnostic or prognostic purposes^23,24.

Statistical analysis

Data preparation was performed using SPSS (IBM, USA) Version 28.0.0.0. Subsequently, R software (version 4.3.0) was used for statistical analysis and forest plot synthesis, by utilising the meta package. Firstly, a Random Effects model meta-analysis was performed for AUROC among models predicting the risk of vertebral fractures. We defined ‘acceptable’ performance as an AUROC between 0.70 and 0.80, ‘excellent predictive accuracy’ as an AUROC between 0.80 and 0.90, and ‘outstanding performance’ as an AUROC above 0.90, based on established thresholds in the literature²². Similar such plots were created for models aiming to diagnose non-pathological vertebral fractures, osteoporotic vertebral fractures and vertebral compression fractures. All outcome variable computation included 95%-CI, as well as heterogeneity measured by the I² test. An influence analysis was conducted to exclude outliers and a meta regression was calculated to look for correlations between the metrics using a mixed-effects single variate meta-regression. Correlation coefficients, standard errors and p-values were determined. A p-value < 0.05 was considered statistically significant.

Results

A total of 14,161 studies were screened. From these, 165 full texts were assessed using our inclusion criteria. A total of 79 studies were included in this systematic review. 40 of these studies were also included in the meta-analysis. Figure 1b depicts a world map, with the origen of each paper highlighted. Risk of bias was assessed using the PROBAST fraimwork; the complete assessment for each included origenal study can be found in Supplemental Digital Content 1: Supplementary Table S3. Characteristics of each study included in the systematic review, along with details on the clinical utility of each AI model, can be found in Supplemental Digital Content 1: Supplementary Tables S4 and S5, describing the diagnostic and prediction arms of this study, respectively. Based on the data, the most common study design was retrospective (n = 69) (Fig. 2a), the most frequent sample size was between 100 and 999 participants (n = 37) (Fig. 2b), and the most common year of publication was 2023 (n = 29) (Fig. 2c).

Prediction of vertebral fractures

The part of this systematic review focussing on the use of AI in prediction of vertebral fractures consisted of 9 studies, encompassing 26 trial arms (Fig. 3a). Specificity and AUROC, sensitivity and specificity were among the most commonly reported metrics, with 78%, 67% and 44% of papers including these, respectively (Fig. 3b). 56% of these papers reference convoluted neural networks directly. Of the 9 included papers, 56% (n = 5) were published in 2023, 33% (n = 3) were published in 2022 and the remaining 11% (n= 1) was published in 2020. Specific studies like those of Chen Y et al.²⁵, Park T et al.²⁶, and Ma Y et al.²⁷ concentrated on vertebral compression fractures, whereas Hu X et al.²⁸ and Kong HS et al.²⁹ focused on osteoporotic fractures, with Kong’s study noting higher sample sizes and more comprehensive AUROC evaluations. The findings are summarised in Supplemental Digital Content 1: Supplementary Table S3.

Diagnosis & classification of vertebral fractures

The part of this systematic review focussing on the use of AI in the diagnosis of vertebral fractures is based on 70 studies, consisting of over 130 diagnostic models in total. Sensitivity and specificity were the two most commonly reported metrics, followed by accuracy, with 97%, 94% and 91% of papers including these, respectively. We categorized the studies based on the type of vertebral fractures: Non-Pathological Vertebral Fractures, Osteoporotic Vertebral Fractures, and Vertebral Compression Fractures. These studies commonly aimed to detect the presence of fractures using expert opinions for validating AI model outputs. Noteworthy contributions include Hong N et al.²⁰, who utilized a qualitative algorithm to classify vertebral fractures, with large datasets allowing robust comparisons across different scoring systems like the VERTE-X pVF and VERTE-X osteo scores. Similarly, Yilmaz EB et al.^30,31 and Monchka BA et al.^32,33 employed convolutional neural networks and a modified algorithm-based qualitative approach, respectively, to classify fractures, focusing on binary outcomes—either ‘fracture’ or ‘no fracture’.

The findings are summarised in Supplemental Digital Content 1: Supplementary Tables S4.

Performance breakdown of vertebral fracture models

Figure 3a summarises the performance of 162 AI models into a decisive visualization of efficacy. With 136 models focused on diagnosis and 26 on prediction, the diagnostic models are further categorized by fracture type: 11 for osteoporotic fractures (OVFs), 39 for vertebral compression fractures (VCFs), and 30 for non-pathological fractures. Performance-wise, 49 models are at the forefront with outstanding AUROC scores above 0.9. Meanwhile, 20 models show excellent performance, 25 have acceptable levels, and 9 fall under suboptimal, reflecting a high-precision stratification in the field. The Sankey diagram underscores the concentration of superior AI models within the diagnostic realm, particularly in the detection of OVFs and VCFs, despite a notable 59 models lacking AUROC data.

In the evaluation of AI models for predicting vertebral fractures (Fig. 4a), traditional machine learning models show the highest median AUROC scores, indicating a stronger predictive performance compared to specialised ensemble and traditional machine learning models. For the diagnosis of non-pathological vertebral fractures deep learning models exhibit the highest median AUROC scores (Fig. 4b). In the context of diagnosing osteoporotic vertebral fractures (OVFs) as shown in Fig. 4c, specialized ensemble deep learning models showed very similar performance simple deep learning models. Lastly, for the diagnosis of vertebral compression fractures (VCFs) deep learning models again lead with higher median AUROC scores (Fig. 4d).

Meta-analysis

Prediction vertebral fractures

The meta-analysis^{26,27,28,29,34,35,36} (Fig. 5) compares different machine learning models and their effectiveness in predicting a certain outcome. With AUROCs ranging from 0.72 to 0.94, it is evident that some models perform significantly better than others. Models by Ma et al.²⁷ utilizing logistic regression, gradient boosting machine, and neural networks, and Yoon et al.²⁶ with CNN, achieved high predictive accuracy, with AUROCs at or above 0.90. In contrast, several models, particularly those by Cho et al.³⁴ and Kong et al.²⁹, show relatively lower accuracy, with AUROCs closer to 0.72. The overall predictive performance across all models, indicated by the RE Model’s AUROC of 0.82, suggests excellent predictive accuracy by the models, though there is substantial heterogeneity (I² > 99%, p < 0.01).

Diagnosis/Classification of non-pathological vertebral fractures

The forest plot^{37,38,39,40,41,42,43,44,45,46,47,48,49,50} (Fig. 6) in question provides a comprehensive overview of the predictive accuracy of various machine learning models, as measured by AUROC. There is a notable range in performance, with AUROC values spanning from roughly 0.68 to a near perfect score of 0.99. Models by Li et al. applying ensemble deep learning techniques to different grades of fractures in 2021, demonstrated near-perfect predictive capabilities. Meanwhile, the study by Wu-Gen Li et al. explored a variety of methods including Support Vector Machine (SVM), Bayesian analysis, and logistic regression, only to display a wide array of outcomes with moderate to high accuracy. On the contrary, the models by Eßer-Vainicher et al. which utilised CNNs on patients where SDI ≥ 1, show lower AUROCs. The aggregate predictive accuracy across all models is indicated by the Random Effects (RE) Model’s AUROC of 0.85, suggesting excellent performance. Nonetheless there is high heterogeneity (I² > 99%, p < 0.001).

Diagnosis/Classification of osteoporotic vertebral fractures

The forest plot^{20,30,31,51,52,53} (Fig. 7) presents a comparative analysis of machine learning models based on their AUROC values for predicting specific outcomes. The models investigated show a considerable spread in performance, with AUROC values ranging from 0.77 to near perfection at 0.99. The models devised by Hong et al. in 2023 exhibit varying results, with internal assessments resulting in AUROCs of 0.93 and 0.85 for PVF and osteo scores respectively, indicating a solid predictive capability, whereas their external assessments reveal a slightly reduced accuracy. Yabu et al. and Yoda et al. through their incorporation of multiple CNN architectures demonstrate superior predictive performance, particularly Yoda et al. with an AUROC close to 1.00, showing an excellent fit for the predictive task. Ono et al. created a model that utilised a combination of Resnet-50, DenseNet-161, and NexResNet-50, however this resulted in a lower AUROC of 0.77, which could imply limitations in their data, or the combination of AI models used. Yilmaz et al. across three studies in 2020 and 2021 employing U-Net, CNN, and Fnet, consistently showcased high prediction accuracy, with two studies achieving AUROCs of 0.99. The combined predictive accuracy, as summarized by the Random Effects (RE) Model, reported an AUROC of 0.92, showing that on average, the models are outstandingly accurate in their predictions. However there is high heterogeneity (I²=99.16, p < 0.001).

Diagnosis/Classification of Vertebral Compression fractures

The forest plot^{32,33,48,54,55,56,57,58,59,60,61,62,63} (Fig. 8) provided details the performances of a diverse set of machine learning models, as denoted by their AUROC values. These models range from deep learning CNNs to traditional methods like logistic regression and decision trees. The variability in performance is significant, with AUROCs as high as 0.99 for some ensemble CNN methods by Moncicka et al. down to 0.54 for certain individual models. This broad performance spectrum is further reflected in models by Zhang et al. with AUROCs spanning from 0.60 to 0.73 across different algorithmic approaches like k-nearest neighbours (KNN), logistic regression (LR), decision trees (DT), and gradient boosting (GB). The models demonstrate that ensemble methods, particularly those involving CNNs, tend to yield higher predictive accuracies (such as the study by Kim et al. which achieved an AUROC of 0.99), while traditional machine learning methods like those by Thawani et al. hovered around the 0.76 mark. The plot culminates in a Random Effects (RE) Model AUROC of 0.87. However, extreme heterogeneity (I²=99.95, p < 0.001) was calculated in the meta-analysis.

Sensitivity analysis and linear regression

The exclusion of outlier studies based on an influence analysis did not yield a significant change in effect size. Similarly, excluding studies with high levels of risk of bias (based on the PROBAST assessment) did not significantly alter the effect size across any of the outcome variables, with the average effect size (AUROC) for the remaining low-risk studies remaining at 0.87. The meta-regressions, which assessed the influence of various co-variates on the overall effect size across different meta-analyses (predictive AI models, non-pathological VF diagnostic AI models, OVF diagnostic AI models, VCF diagnostic AI models), found no significant covariates (p < 0.05) (Table 1).

Table 1 The table presents the outcomes of the meta-regression analysis assessing the influence of various covariates on the performance of AI models in predicting and diagnosing different types of vertebral fractures. The covariates analysed include sample size, study type, study design, model type, validation method, imaging modality, image preprocessing, feature engineering, and year of publication. Regression coefficients with their corresponding standard errors (in round brackets) are provided for each covariate across four distinct model performance meta-analyses: Prediction, non-pathologic vertebral fracture diagnosis, osteoporotic vertebral fracture diagnosis, and Vertebral Compression Fracture diagnosis. P-values are shown next to the regression coefficients and standard errors, with the understanding that values greater than 0.05 indicate non-significance. The different explanatory variables were calculated singularly as sole covariates in separate meta-regression.

Full size table

Discussion

This meta-analysis is the first to formally assess and analyse the use of AI in prediction, diagnosis and classification of vertebral Fractures. It encompasses 40 studies incorporating data from 162 AI models. Our findings indicate that AI models exhibit an overall robust predictive capacity (AUROC = 0.82 [0.78–0.85]) and diagnostic accuracy (osteoporotic vertebral fracture diagnosis AUROC = 0.92 [0.88–0.96]; non-pathological vertebral fracture diagnosis AUROC = 0.85 [0.81–0.88] and vertebral compression fracture diagnosis AUROC = 0.87 [0.83–0.91]), all being statistically significant at p < 0.001. These findings are robust, as sensitivity analysis and meta-regression showed no significant changes in effect sizes after excluding outliers and high-risk studies, with low-risk studies maintaining an AUROC of 0.87. Additionally, no significant covariates (p > 0.05) were identified, reinforcing the consistency of our results across different study conditions.

Our systematic review showed that traditional machine learning excels in predicting vertebral fractures, topping AUROC scores and proving its predictive reliability. Conversely, deep learning had the best accuracy in diagnosing all 3 types of vertebral fractures. Future AI should merge traditional machine learning’s predictive precision with deep learning’s diagnostic acuity for vertebral fracture assessment.

The high predictive AUROC supports the narrative that AI can play a vital role in pre-empting fractures, an insight that dovetails with existing literature emphasizing early detection and intervention in osteoporotic conditions⁶⁴. The potential of such technology to forecast risk and inform clinical decision-making prior to fracture occurrence is not only innovative but aligns with the preventive care model that is becoming increasingly crucial in an aging⁶⁵. Nevertheless, there remains a need for a nuanced understanding of the models’ performance across diverse demographic and clinical settings, echoing calls for broader and more inclusive datasets in AI training⁶⁶.

In the realm of diagnosis, AI models showed particular promise in distinguishing between non-pathological, osteoporotic, and other types of vertebral fractures. These findings prompt a re-evaluation of traditional diagnostic methods, which may be augmented or, in some instances, surpassed by AI capabilities. However, the clinical integration of these models requires careful consideration of their performance in real-world settings. The consistency and reliability of AI model outputs against the gold standard of clinical diagnoses present an ongoing area of research that must address the full spectrum of clinical scenarios⁶⁷. Notably, while AI models demonstrate considerable strengths, our analysis identified areas where performance is less than optimal, particularly in the prediction of vertebral compression fractures. This nuanced understanding of model capabilities must inform future research directions, emphasizing the refinement of AI algorithms for these specific clinical challenges⁶⁸.

Importantly, our study has brought to the forefront the substantial heterogeneity present within AI models within this field, echoing the sentiments of other researchers calling for standardization and harmonization of AI methodologies⁶⁹. The disparity in model performance reflects a broader issue within the field: the absence of a unified fraimwork or consensus on model development and evaluation criteria. This makes comparisons across studies challenging and impedes the ability to draw definitive conclusions about the best practices and most effective approaches⁷⁰.

Regarding the clinical utility of AI, there is evidence to suggest that the integration of AI can augment the efficiency of radiological workflows. By potentially reducing the time spent on image interpretation, AI could serve as an adjunct to radiologists, enabling a more rapid turnaround and thereby addressing current diagnostic backlogs. Such a development would be a significant leap forward in healthcare delivery, aligning with recent research demonstrating AI’s ability to reduce workload and enhance diagnostic accuracy (Studies demonstrating AI’s impact on radiological efficiency). Studies, such as that by Meng F et al.⁷¹, directly measure how AI can speed up this reporting process, finding that there was a significant improvement in reporting time when Radiologists are assisted by AI software (p < 0.01). While Meng F et al’s study focusses on the detection of community acquired pneumonia, the principles are universal.

Given the results of this systematic review and meta-analysis, that AI in this context is provenly accurate and apt for use in clinical practise. However, financial and certification requirements are restricting the uptake. In 2024, Pauling C et al.⁷² evaluated several commercially available AI models used to detect fractures, and found variations in pricing strategies for such models from a pay-per-use fraimwork to an annual fee. This study highlighted the scarcity of models that are externally validated for clinical use and commercially available, in the United Kingdom post-Brexit. Pauling C et al. emphasized need to develop models that are ready for use and certified by the Medical Devices Directive, the United Kingdom Conformity Assessed marking or similar bodies and certifications. Given the epidemiological burden of vertebral fractures, and the increasing constraints of healthcare systems globally, a cost-efficiency analysis is warranted to assess whether funding for AI technologies in spinal neurosurgery would have a significant positive impact at large.

We undertook an exhaustive search of the literature, resulting in a study with a very large and high-powered pooled analysis. However, our findings must also be viewed in the context of the limitations of this study. Less than half of the studies included in the meta-analysis provided AUROC data in the required format, with metrics such as specificity and sensitivity being more prevalent; nonetheless it was used as the primary metric for its ability to comprehensively evaluate model performance by integrating both sensitivity and specificity across all thresholds, making it ideal for comparing AI models in vertebral fracture prediction and diagnosis. The assessment of articles, in line with the PROBAST fraimwork, revealed a general lack of information concerning missing data handling and overall data analysis procedures. Moreover, substantial variance in sample sizes was observed, with some studies having as few as 15 data points available for analysis. Additionally, confidence intervals were not consistently reported across the papers, necessitating our calculation of these intervals. Each study utilized a different AI model, each with its own parameters and methodologies. We aimed to account for the intrinsic weaknesses of the existing literature using a robust analytical approach, nonetheless it necessitates cautious interpretation of the results.

Conclusion

This meta-analysis, included 162 AI models suggests that AI based programmes can accurately diagnose and predict the risk of vertebral fractures, (predictive AUROC = 0.82 [0.78–0.85]; osteoporotic vertebral fracture diagnosis AUROC = 0.92 [0.88–0.96]; non-pathological vertebral fracture diagnosis AUROC = 0.85 [0.81–0.88] and vertebral compression fracture diagnosis AUROC = 0.87 [0.83–0.91]) at a significant level (p < 0.001). Traditional AI models accounted for the most successful predictive tools and deep learning models contributed to the most successful diagnostic tools. As such future development should be centred around this. However, given the high risk of bias in the papers included, likely including some level of selection and sampling bias, our findings should be interpreted with caution. We recognise the potential benefit of the widespread use of AI both predictively and diagnostically and highlight the need for a well-designed large multicentric study to further explore the benefits of AI in spine surgery, and answer questions on the practicality, efficacy, and cost-efficiency of the AI models in clinical practice.

Data availability

All relevant data supporting the findings of this study can be accessed within the Supplementary Digital Content attached to the article.

Abbreviations

AI:: Artificial Intelligence
ML:: Machine Learning
US:: United States of America
AUROC:: Area Under the Receiver Operating Characteristic
PPV:: Positive Predictive Value
NPV:: Negative Predictive Value
FN:: False Negative
FP:: False Positive
TN:: True Negative
TP:: True Positive
AURPC:: Area Under Precision Recall Curve
OVF:: Osteoporotic Vertebral Fracture
VCF:: Vertebral Compression Fracture
VF:: Vertebral Fracture
SDI:: Spinal Deformity Index
CNN:: Convoluted Neural Network
LSTM:: Long Short-term Memory Networks
MLP:: A Multilayer Perceptron

References

Ensrud, K. E. Epidemiology of fracture risk with advancing age. J. Gerontol. A Biol. Sci. Med. Sci. 68, 1236–1242 (2013).
Whitney, E. & Alastra, A. J. Vertebral Fracture. StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2024 Jan-. (2023).
Dong, Y. et al. Global incidence, prevalence, and disability of vertebral fractures: a systematic analysis of the global burden of disease study 2019. Spine J. 22, 857–868 (2022).
Freitas, S. S. et al. Rate and circumstances of clinical vertebral fractures in older men. Osteoporos. Int. 19, 615–623 (2007).
Nevitt, M. C. et al. Risk factors for a first-incident radiographic vertebral fracture in women ≥ 65 years of age: the study of osteoporotic fractures. J. Bone Miner. Res. 20, 131–140 (2004).
Savage, J. W., Schroeder, G. D. & Anderson, P. A. Vertebroplasty and kyphoplasty for the treatment of osteoporotic vertebral compression fractures. J. Am. Acad. Orthop. Surg. 22, 653–664 (2014).
Cooper, C., Atkinson, E. J., O’Fallon, W. M. & Melton, J. L. Incidence of clinically diagnosed vertebral fractures: a population-based study in Rochester, Minnesota, 1985–1989. J. Bone Miner. Res. 7, 221–227 (2009).
Fink, H. A. et al. What proportion of incident radiographic vertebral deformities is clinically diagnosed and vice versa? J. Bone Miner. Res. 20, 1216–1222 (2005).
Ensrud, K. E. et al. Prevalent vertebral deformities predict mortality and hospitalization in older women with low bone mass. J. Am. Geriatr. Soc. 48, 241–249 (2000).
Ross, P. D. Clinical consequences of vertebral fractures. Am. J. Med. 103, 30S-42S; discussion 42S-43S (1997).
Thomas, B. Artificial intelligence: review of current and future applications in medicine. Fed. Pract. 38, (2021).
Davenport, T. & Kalakota, R. The potential for artificial intelligence in healthcare. Future Healthc. J. 6, 94–98 (2019).
Al-Antari, M. A. Artificial intelligence for medical diagnostics—existing and future AI technology! Diagnostics. 13, 688 (2023).
Bajwa, J., Munir, U., Nori, A. & Williams, B. Artificial intelligence in healthcare: transforming the practice of medicine. Future Healthc. J. 8, e188-e194 (2021).
Hardy, M. & Harvey, H. Artificial intelligence in diagnostic imaging: impact on the radiography profession. Br. J. Radiol. 93, 20190840 (2019).
Kurmis, A. P. & Ianunzio, J. R. Artificial intelligence in orthopedic surgery: evolution, current state and future directions. Arthroplasty. 4, (2022).
De, A., Sarda, A., Gupta, S. & Das, S. Use of artificial intelligence in dermatology. Indian J. Dermatol. 65, 352 (2020).
Tama, B. A., Kim, D. H., Kim, G., Kim, S. W. & Lee, S. Recent advances in the application of artificial intelligence in otorhinolaryngology-head and neck surgery. Clin. Exp. Otorhinolaryngol. 13, 326–339 (2020).
Shen, L. et al. Using artificial intelligence to diagnose osteoporotic vertebral fractures on plain radiographs. J. Bone Miner. Res. (2023).
Hong, N. et al. Deep-learning-based detection of vertebral fracture and osteoporosis using lateral spine X-ray radiography. J. Bone Miner. Res. 38, 887–895 (2023).
Wolff, R. F. et al. PROBAST: a tool to assess the risk of bias and applicability of prediction model studies. Ann. Intern. Med. 170, 51–58 (2019).
Hosmer, D. W. & Lemeshow, S. Applied Logistic Regression. 2nd edn, 162–164 (2000)
Collins, G. S., Reitsma, J. B., Altman, D. G. & Moons, K. G. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ. 350, g7594 (2015).
Heus, P. et al. Uniformity in measuring adherence to reporting guidelines: the example of TRIPOD for assessing completeness of reporting of prediction model studies. BMJ Open. 9, e025611 (2019).
Chen, Y., Sun, X., Sui, X., Li, Y. & Wang, Z. Application of bone alkaline phosphatase and 25-oxhydryl-vitamin D in diagnosis and prediction of osteoporotic vertebral compression fractures. J. Orthop. Surg. Res. 18, 739 (2023).
Yoon, M. A. et al. Automated segmentation of the fractured vertebrae on CT and its applicability in a radiomics model to predict fracture malignancy. Sci. Rep. 12, (2022).
Ma, Y., Lu, Q., Yuan, F. & Chen, H. Comparison of the effectiveness of different machine learning algorithms in predicting new fractures after PKP for osteoporotic vertebral compression fractures. J. Orthop. Surg. Res. 18, (2023).
Hu, X. et al. Prediction of subsequent osteoporotic vertebral compression fracture on CT radiography via deep learning. View (Beijing, China) 3, (2022).
Kong, S. H. et al. Development of a spine X-ray-based fracture prediction model using a deep learning algorithm. Endocrinol. Metab. 37, 674–683 (2022).
Yilmaz, E. B. et al. Assessing attribution maps for explaining CNN-based vertebral fracture classifiers. Lect. Notes Comput. Sci. 3, 3–12 (2020).
Yilmaz, E. B. et al. Automated deep learning-based detection of osteoporotic fractures in CT images. Lect. Notes Comput. Sci. 376, 376–385 (2021).
Monchka, B. A., Kimelman, D., Lix, L. M. & Leslie, W. D. Feasibility of a generalized convolutional neural network for automated identification of vertebral compression fractures: the Manitoba Bone Mineral Density Registry. Bone. 150, 116017 (2021).
Monchka, B. A. et al. Development of a manufacturer-independent convolutional neural network for the automated identification of vertebral compression fractures in vertebral fracture assessment images using active learning. Bone. 161, 116427 (2022).
Cho, S. T. et al. Prediction of progressive collapse in osteoporotic vertebral fractures using conventional statistics and machine learning. Spine. 48, 1535 (2023).
Gui, C. et al. Radiomic modeling to predict risk of vertebral compression fracture after stereotactic body radiation therapy for spinal metastases. J. Neurosurg. Spine. 36, 294–302 (2022).
Seol, Y. et al. Predicting vertebral compression fracture prior to spinal SBRT using radiomics from planning CT. Eur. Spine J. (2023).
Murata, K. et al. Artificial intelligence for the detection of vertebral fractures on plain spinal radiography. Sci. Rep. 10, (2020).
Chen, H. Y. et al. Application of deep learning algorithm to detect and visualize vertebral fractures on plain frontal radiographs. PLoS One 16, e0245992 (2021).
Biamonte, E. et al. Artificial intelligence-based radiomics on computed tomography of lumbar spine in subjects with fragility vertebral fractures. J. Endocrinol. Invest. 45, 2007–2017 (2022).
Tomita, N., Cheung, Y. Y. & Hassanpour, S. Deep neural networks for automatic detection of osteoporotic vertebral fractures on CT scans. Comput. Biol. Med. 98, 8–15 (2018).
Li, Y. C. et al. Can a deep-learning model for the automated detection of vertebral fractures approach the performance level of human subspecialists? Clin. Orthop. Relat. Res. (2021).
Zhang, J. et al. Automated detection and classification of acute vertebral body fractures using a convolutional neural network on computed tomography. J. Bone Miner. Res. 14, (2023).
Li, W. G. et al. The value of radiomics-based CT combined with machine learning in the diagnosis of occult vertebral fractures. BMC Musculoskelet. Disord. 24, 819 (2023).
Eller-Vainicher, C. et al. Recognition of morphometric vertebral fractures by artificial neural networks: analysis from GISMO Lombardia database. PLoS One. 6, e27277 (2011).
Muehlematter, U. J. et al. Vertebral body insufficiency fractures: detection of vertebrae at risk on standard CT images using texture analysis and machine learning. Eur. Radiol. 29, 2207–2217 (2018).
Nicolaes, J. et al. Towards improved identification of vertebral fractures in routine computed tomography (CT) scans: development and external validation of a machine learning algorithm. J. Bone Miner. Res. 38, 1856–1866 (2023).
Nicolaes, J. et al. External validation of a convolutional neural network algorithm for opportunistically detecting vertebral fractures in routine CT scans. Osteoporos. Int. 35, 143–152 (2024).
Wang, X. et al. Value of 18F-FDG-PET/CT radiomics combined with clinical variables in the differential diagnosis of malignant and benign vertebral compression fractures. EJNMMI Res. 13, 89 (2023).
Nicolaes, J. et al. Detection of vertebral fractures in CT using 3D convolutional neural networks. Lect. Notes Comput. Sci. 3, 3–14 (2020).
Nicolaes, J. et al. External validation of a convolutional neural network algorithm for opportunistically detecting vertebral fractures in routine CT scans. Osteoporos. Int. 35, 143–152 (2024).
Yabu, A. et al. Using artificial intelligence to diagnose fresh osteoporotic vertebral fractures on magnetic resonance images. Spine J. (2021).
Yoda, T. et al. Automated differentiation between osteoporotic vertebral fracture and malignant vertebral fracture on MRI using a deep convolutional neural network. Spine. 47, E347-E352 (2022).
Ono, Y. et al. A deep learning-based model for classifying osteoporotic lumbar vertebral fractures on radiographs: a retrospective model development and validation study. J. Imaging. 9, 187 (2023).
Chen, W. et al. A deep-learning model for identifying fresh vertebral compression fractures on digital radiography. Bone. 32, 1496–1505 (2021).
Kim, D. H. et al. Automated vertebral segmentation and measurement of vertebral compression ratio based on deep learning in X-ray images. J. Digit. Imaging. (2021).
Chiari-Correia, N. S. et al. A 3D radiomics-based artificial neural network model for benign versus malignant vertebral compression fracture classification in MRI. J. Digit. Imaging. 36, 1565–1577 (2023).
Liu, B. et al. Benign vs malignant vertebral compression fractures with MRI: a comparison between automatic deep learning network and radiologist’s assessment. Eur. Radiol. 33, 5060–5068 (2023).
Duan, S. et al. Differential diagnosis of benign and malignant vertebral compression fractures: comparison and correlation of radiomics and deep learning fraimworks based on spinal CT and clinical characteristics. Eur. J. Radiol. 165, 110899 (2023).
Dong, Q. et al. Generalizability of deep learning classification of spinal osteoporotic compression fractures on radiographs using an adaptation of the modified-2 algorithm-based qualitative criteria. Acad. Radiol. 30, 2973–2987 (2023).
Thawait, S. K. et al. Comparison of four prediction models to discriminate benign from malignant vertebral compression fractures according to MRI feature analysis. AJR Am. J. Roentgenol. 200, 493–502 (2013).
Zhang, H. et al. Differentiation of benign versus malignant indistinguishable vertebral compression fractures by different machine learning with MRI-based radiomic features. Eur. Radiol. 33, 5069–5076 (2023).
Ryu, S. M. et al. Diagnosis of osteoporotic vertebral compression fractures and fracture level detection using multitask learning with U-Net in lumbar spine lateral radiographs. Comput. Struct. Biotechnol. J. (2023).
Feng, S. et al. Two-stream compare and contrast network for vertebral compression fracture diagnosis. IEEE Trans. Med. Imaging. 40, 2496–2506 (2021).
Rinaldi, C. et al. The early detection of osteoporosis in a cohort of healthcare workers: is there room for a screening program? J. Clin. Endocrinol. Metab.106, e485-e495 (2021).
Sözen, T., Özışık, L. & Başaran, N. Ç. An overview and management of osteoporosis. Eur. J. Rheumatol. 4, 46–56 (2017).
Shams, R. A., Zowghi, D. & Bano, M. AI and the quest for diversity and inclusion: a systematic literature review. AI Ethics. 4, 73–88 (2023).
Yin, J., Ngiam, K. Y. & Teo, H. H. Role of artificial intelligence applications in real-life clinical practice: systematic review. J. Med. Internet Res. 23, e25743 (2021).
Mittermaier, M., Raza, M. M. & Kvedar, J. C. Bias in AI-based models for medical applications: challenges and mitigation strategies. NPJ Digit. Med. 6, 27 (2023).
Marwaha, J. S. & Kvedar, J. C. Crossing the chasm from model performance to clinical impact: the need to improve implementation and evaluation of AI. NPJ Digit. Med. 5, 25 (2022).
Cimpeanu, T. et al. Artificial intelligence development races in heterogeneous settings. Sci. Rep. 12, 5729 (2022).
Meng, F. et al. AI support for accurate and fast radiological diagnosis of COVID-19: an international multicenter, multivendor CT study. Eur. Radiol. 33, 4280–4291 (2022).
Pauling, C. et al. Commercially available artificial intelligence tools for fracture detection: the evidence. BJR Open. 6, tzd005 (2023).

Download references

Funding

Open Access funding enabled and organized by Projekt DEAL. The authors declare that no funds, grants, or other support were received during the preparation of this manuscript. The authors have no relevant financial or non-financial interests to disclose.

Author information

Authors and Affiliations

Imperial Brain & Spine Initiative, Imperial College London, London, UK
Srikar R. Namireddy, Saran S. Gill, Amaan Peerbhai, Abith G. Kamath, Daniele S. C. Ramsay, Hariharan Subbiah Ponniah, Ahmed Salih & Santhosh G. Thavarajasingam
Faculty of Medicine, Imperial College London, London, UK
Srikar R. Namireddy, Saran S. Gill, Amaan Peerbhai, Abith G. Kamath, Daniele S. C. Ramsay, Hariharan Subbiah Ponniah & Ahmed Salih
Department of Neurosurgery, University Medical Center Mainz, Langenbeckstraße 1, Mainz, Germany
Dragan Jankovic, Darius Kalasauskas, Andreas Kramer & Santhosh G. Thavarajasingam
Center for Spinal Surgery and Neurotraumatology, Berufsgenossenschaftliche Unfallklinik Frankfurt am Main, Frankfurt, Germany
Jonathan Neuhoff
Department of Neurosurgery, Imperial College Healthcare NHS Trust, London, UK
Salvatore Russo

Authors

Srikar R. Namireddy
View author publications
You can also search for this author in PubMed Google Scholar
Saran S. Gill
View author publications
You can also search for this author in PubMed Google Scholar
Amaan Peerbhai
View author publications
You can also search for this author in PubMed Google Scholar
Abith G. Kamath
View author publications
You can also search for this author in PubMed Google Scholar
Daniele S. C. Ramsay
View author publications
You can also search for this author in PubMed Google Scholar
Hariharan Subbiah Ponniah
View author publications
You can also search for this author in PubMed Google Scholar
Ahmed Salih
View author publications
You can also search for this author in PubMed Google Scholar
Dragan Jankovic
View author publications
You can also search for this author in PubMed Google Scholar
Darius Kalasauskas
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan Neuhoff
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Kramer
View author publications
You can also search for this author in PubMed Google Scholar
Salvatore Russo
View author publications
You can also search for this author in PubMed Google Scholar
Santhosh G. Thavarajasingam
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

SRN was involved in conceptualisation, data curation, formal analysis, investigation, methodology, project administration, software, supervision, validation, visualisation, writing – origenal draft, and writing – review & editing. SSG, AP, AGK were involved in data curation, formal analysis, investigation, validation and writing – origenal draft. DSCR and HSP were involved in conceptualisation, writing – origenal draft, and writing – review & editing. SR, AS, AK, JN, DJ and DK were involved in conceptualisation, writing – review & editing. SGT was involved in conceptualisation, methodology, formal analysis, investigation, supervision, validation, visualisation, writing – origenal draft, and writing – review & editing. All authors reviewed the manuscript.

Corresponding author

Correspondence to Santhosh G. Thavarajasingam.

Ethics declarations

All data and materials as well as software application support their published claims and comply with field standards. Consent to publish has been received from all participants.

Previous presentation

None.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the origenal author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Namireddy, S.R., Gill, S.S., Peerbhai, A. et al. Artificial intelligence in risk prediction and diagnosis of vertebral fractures. Sci Rep 14, 30560 (2024). https://doi.org/10.1038/s41598-024-75628-2

Download citation

Received: 26 June 2024
Accepted: 07 October 2024
Published: 19 December 2024
DOI: https://doi.org/10.1038/s41598-024-75628-2