
What the study evaluated
The study evaluated Carebot AI MMG in a routine mammography cohort where most examinations were negative and suspicious findings were rare.
A total of 338 consecutive screening mammography examinations from AGEL Hospital Nový Jičín were reviewed. After excluding 29 discordant cases without consensus, 309 examinations were included in the final analysis. The reference standard was established by three senior breast radiologists using BI-RADS-aligned categories.
Two predefined endpoints were evaluated: any-lesion detection and suspicious examination identification.
Study results in clinical practice
Carebot AI MMG showed strong agreement with expert radiologist consensus for identifying mammography examinations with lesion-level findings. For any-lesion detection, the AI achieved sensitivity 0.895 and specificity 0.940, with high negative predictive value.
For suspicious examination identification, the AI correctly flagged 7 of 8 suspicious cases, resulting in sensitivity 0.875. Because suspicious examinations were rare in this cohort, positive predictive value was low, while negative predictive value remained very high at 0.996.
In practice, these results support using AI risk outputs as decision-support and prioritization signals rather than as direct confirmation of malignancy. The findings also show why predictive values should be reported in prevalence-representative screening cohorts.
Key numbers
Initial cohort: 338 screening mammography examinations
Reference standard: consensus of 3 senior breast radiologists
Suspicious finding prevalence: 2.6%
Endpoint 1 - any-lesion detection: Se 0.895, Sp 0.940, BA 0.917, PPV 0.829, NPV 0.965
Endpoint 2 - suspicious examination identification: Se 0.875, Sp 0.857, BA 0.866, PPV 0.140, NPV 0.996
Abstract
AI tools for mammography can support lesion detection and risk stratification, but their clinical value depends on performance in routine screening settings where disease prevalence is low and negative examinations predominate. This retrospective diagnostic-accuracy study evaluated Carebot AI MMG on consecutive screening mammography examinations acquired at AGEL Hospital Nový Jičín in January 2024. The reference standard was established by three senior breast radiologists using BI-RADS-aligned consensus categories. Of 338 reviewed examinations, 29 did not achieve consensus and were excluded, leaving 309 examinations for analysis: 233 Normal, 68 Benign/probably benign, and 8 Suspicious. Two predefined endpoints were evaluated. For any-lesion detection using Medium or High Risk as AI-positive, the AI achieved sensitivity 0.895, specificity 0.940, accuracy 0.929, balanced accuracy 0.917, PPV 0.829, and NPV 0.965. For suspicious examination identification using High Risk as AI-positive, the AI achieved sensitivity 0.875, specificity 0.857, accuracy 0.858, balanced accuracy 0.866, PPV 0.140, and NPV 0.996. The results support the use of Carebot AI MMG as decision-support software for prioritizing examinations and identifying studies that may require closer review. The study also highlights that in low-prevalence screening cohorts, AI-positive outputs should be interpreted in workflow context and confirmed by radiologists.



