What the study evaluated
The study evaluated the diagnostic performance of Carebot AI Bones for fracture detection on musculoskeletal X-rays and compared it with six radiologists of varying experience in a blinded multi-reader, multi-case design. Retrospective MSK radiographs from routine clinical practice were used, with expert consensus as ground truth. The aim was to assess whether AI can reduce missed fractures across different anatomical regions.
Study results in clinical practice
Carebot AI Bones achieved high sensitivity for fracture detection and frequently outperformed less experienced radiologists, reducing the risk of missed fractures. Specificity was lower than that of most radiologists, resulting in more false-positive findings. In clinical practice, this positions AI as a safety-support tool that complements radiologists, particularly in high-workload or resource-limited settings.
Key numbers
Images analyzed: 448 MSK X-rays
AI sensitivity: 92%
AI specificity: 90%
Radiologists’ sensitivity range: 66–93%
Negative likelihood ratio (NLR): 0.09
Highest AI sensitivity: elbow (100%), hand/wrist (94%)
Fracture detection on musculoskeletal (MSK) radiographs is critical for both emergency and routine care, yet diagnostic errors remain common due to high workloads and limited radiological expertise. This study evaluates the diagnostic performance of an artificial intelligence (AI) system (Carebot AI Bones 1.8.10; Carebot s.r.o.) in detecting fractures on MSK X-rays, comparing its performance to six radiologists of varying experience levels in a blinded multi-reader, multi-case (MRMC) study. A total of 489 radiographs were retrospectively analyzed from routine clinical practice, with ground truth established for 448 images through consensus among three experienced radiologists. Diagnostic performance was assessed using sensitivity (Se), specificity (Sp), positive likelihood ratio (PLR), and negative likelihood ratio (NLR), with statistical analysis including McNemar’s test and Holm’s method for multiple comparisons. The AI system achieved a sensitivity of 0.921 (95% CI: 0.846–0.961) and specificity of 0.897 (95% CI: 0.861–0.924). Radiologists’ sensitivity ranged from 0.663 to 0.933, and specificity ranged from 0.916 to 0.989. The AI demonstrated consistently high sensitivity across body parts, particularly for elbow and hand/wrist fractures, often exceeding radiologists’ performance. Specificity was slightly lower but remained within an acceptable range, supporting AI’s potential as a complementary diagnostic tool. These findings highlight the clinical utility of AI in MSK fracture detection, particularly in settings with limited resources or high diagnostic workloads. Future research should validate these results in larger, multicentric studies to ensure broader generalizability and evaluate AI integration in real-world workflows.




