What the study evaluated
The study evaluated the feasibility and generalisability of a YOLO-based deep learning model for musculoskeletal fracture detection across datasets from different sources. The model was trained on annotated MSK X-rays and validated on two independent datasets: an external public dataset (FracAtlas) and an internal real-world dataset, assessing performance consistency across centers.
Study results in clinical practice
The model achieved high sensitivity on the external dataset, indicating strong ability to detect fractures and reduce missed findings. Performance varied between datasets, with lower sensitivity but higher specificity on the internal dataset, highlighting the impact of data heterogeneity and real-world variability. In clinical practice, this supports AI as a rule-out and safety-support tool, effective for excluding fractures, while positive findings require radiologist confirmation.
Key numbers
Training data: 720 annotated MSK X-rays
Validation datasets:
Dataset 1 (external): 840 images
Dataset 2 (internal): 124 images
Sensitivity:
Dataset 1: 91.0%
Dataset 2: 62.2%
Specificity:
Dataset 1: 55.7%
Dataset 2: 74.0%
Negative predictive value (NPV):
Dataset 1: 96.8%
Dataset 2: 56.9%
Fractures, often resulting from trauma, overuse, or osteoporosis, pose diagnostic challenges due to their variable clinical manifestations. To address this, we propose a deep learning-based decision support system to enhance the efficacy of fracture detection in radiographic imaging. For the purpose of our study, we utilized 720 annotated musculoskeletal (MSK) X-rays from the MURA dataset, augmented by bounding box-level annotation, for training the YOLO (You Only Look Once) model. The model’s performance was subsequently tested on two datasets, sampled FracAtlas dataset (Dataset 1, 840 images, nNORMAL = 696, nFRACTURE = 144) and own internal dataset (Dataset 2, 124 images, nNORMAL = 50, nFRACTURE = 74), encompassing a diverse range of MSK radiographs. The results showed a Sensitivity (Se) of 0.910 (95% CI: 0.852–0.946) and Specificity (Sp) of 0.557 (95% CI: 0.520–0.594) on the Dataset 1, and a Se of 0.622 (95% CI: 0.508–0.724) and Sp of 0.740 (95% CI: 0.604–0.841) on the Dataset 2. This study underscores the promising role of AI in medical imaging, providing a solid foundation for future research and advancements in the field of radiographic diagnostics.





