
What the study evaluated
The study evaluated whether Spine Measurement functionality can automatically measure the Cobb angle and classify scoliosis severity on standing anteroposterior whole-spine X-rays.
A total of 103 radiographs from ten hospitals were analyzed. Two musculoskeletal radiologists independently measured the maximal Cobb angle in each case, and the AI output was compared with both readers.
Study results in clinical practice
The AI produced Cobb-angle measurements that were close to expert radiologist measurements and within the range of normal inter-reader variability. The mean absolute error was 3.89° compared with Radiologist 1 and 3.90° compared with Radiologist 2, while the difference between the two radiologists was 3.30°.
The AI also showed strong correlation with both radiologists and comparable agreement in classifying scoliosis severity. In practice, this suggests that automated Cobb-angle measurement can help make scoliosis reporting more consistent, reduce repetitive manual measurements, and support triage in clinical workflows.
Key numbers
Study cohort: 103 standing AP whole-spine radiographs from 10 hospitals
AI vs Radiologist 1: MAE 3.89°, RMSE 4.77°, Pearson r = 0.906
AI vs Radiologist 2: MAE 3.90°, RMSE 5.68°, Pearson r = 0.880
Inter-radiologist comparison: MAE 3.30°, Pearson r = 0.928
Severity grading agreement: Cohen’s κ = 0.51 and 0.64 for AI vs radiologists, compared with κ = 0.59 between radiologists
Patient profile: mean age 18.6 years; 81% of radiographs from patients younger than 20 years
Abstract
Scoliosis assessment depends on precise Cobb-angle measurement, which guides diagnosis, follow-up, bracing, and surgical referral. Manual measurement is time-consuming and can vary between observers. This retrospective multi-centre study evaluated Carebot AI Bones, Spine Measurement functionality, for fully automated Cobb-angle measurement and scoliosis severity grading on 103 standing anteroposterior whole-spine radiographs from ten hospitals. Two musculoskeletal radiologists independently measured each radiograph and served as reference readers. Agreement between the AI and each radiologist was assessed using Bland-Altman analysis, mean absolute error, root mean squared error, Pearson correlation, and Cohen’s kappa for four-grade severity classification. Compared with Radiologist 1, the AI achieved a mean absolute error of 3.89° and RMSE of 4.77°, with a bias of 0.70° and limits of agreement from -8.59° to 9.99°. Compared with Radiologist 2, the AI achieved a mean absolute error of 3.90° and RMSE of 5.68°, with a bias of 2.14° and limits of agreement from -8.23° to 12.50°. Pearson correlations were r = 0.906 and r = 0.880 for AI versus the two radiologists, while inter-reader correlation was r = 0.928. Cohen’s kappa for severity grading reached 0.51 and 0.64 for AI versus radiologists, comparable to inter-reader agreement of 0.59. The results indicate that Carebot AI Bones can reproduce expert-level Cobb-angle measurements and severity grading across multiple centres, supporting more consistent scoliosis assessment, reporting, and triage.




