
What the study evaluated
The study evaluated whether Carebot AI MMG can achieve examination-level diagnostic performance comparable to breast radiologists on 2D full-field digital mammography.
A total of 4,729 routine screening examinations were collected from four centres in Slovakia and the Czech Republic. For the primary analysis, a fixed analytic subset of 222 examinations was used, including 48 histology-confirmed malignant cases and 174 non-malignant cases.
Six breast radiologists and the AI software independently classified anonymized mammography examinations. Performance was assessed at two predefined operating points: high sensitivity for rule-out use and high specificity for rule-in use.
Study results in clinical practice
Carebot AI MMG achieved diagnostic performance comparable to the radiologist benchmark in both operating modes. The high-sensitivity setting prioritized cancer detection and achieved sensitivity of 0.875 with NPV of 0.957. The high-specificity setting increased specificity to 0.839 while maintaining sensitivity of 0.833 and NPV of 0.948.
In both settings, balanced accuracy met the predefined non-inferiority margin compared with the radiologist benchmark. This supports the potential use of Carebot AI MMG as decision-support software in mammography workflows, with configurable operating points that can be aligned with local priorities for cancer detection, recall management, or workload support.
Because the analytic subset was enriched for malignant cases, PPV and NPV should be interpreted as descriptive for this study dataset rather than as direct population screening values.
Key numbers
Initial dataset: 4,729 screening mammography examinations
Malignant cases: 48 histology-confirmed cancers
Non-malignant cases: 174 examinations
Centres: 4 centres in Slovakia and the Czech Republic
High-sensitivity mode: Se 0.875, Sp 0.770, BA 0.823, NPV 0.957
High-specificity mode: Se 0.833, Sp 0.839, BA 0.836, NPV 0.948
Radiologist benchmark BA: 0.828 in HSe and 0.823 in HSp
Non-inferiority margin: δBA = 0.05, met in both operating modes
Abstract
Interpretation of mammography examinations is affected by inter-reader variability, which can influence recall decisions and patient management. This multicentre, retrospective, multi-reader diagnostic accuracy study evaluated Carebot AI MMG on 2D full-field digital mammography and compared its standalone performance with that of six breast radiologists. From 4,729 routine screening examinations acquired at four centres in Slovakia and the Czech Republic, a fixed analytic subset of 222 examinations was analyzed, including 48 histology-confirmed malignant cases and 174 non-malignant cases. Six radiologists and the AI software independently classified anonymized mammograms, blinded to clinical information and the reference standard. Performance was evaluated at two predefined operating points: high sensitivity and high specificity. At the high-sensitivity operating point, the AI achieved sensitivity 0.875, specificity 0.770, and balanced accuracy 0.823, compared with a radiologist benchmark balanced accuracy of 0.828. At the high-specificity operating point, the AI achieved sensitivity 0.833, specificity 0.839, and balanced accuracy 0.836, compared with a radiologist benchmark of 0.823. Negative predictive value exceeded 0.94 in both settings. The results support non-inferior examination-level performance of Carebot AI MMG compared with the radiologist benchmark on the multicentre case set. The study supports AI as clinically relevant decision-support software for mammography, with operating points that can be adapted to local priorities for cancer detection and recall management.




