Does AI Add Value to Mammography? Evaluating Diagnostic Accuracy in a Multicentre, Multi-Reader Non-Inferiority Study

study

Does AI Add Value to Mammography? Evaluating Diagnostic Accuracy in a Multicentre, Multi-Reader Non-Inferiority Study

Does AI Add Value to Mammography? Evaluating Diagnostic Accuracy in a Multicentre, Multi-Reader Non-Inferiority Study

study

Does AI Add Value to Mammography? Evaluating Diagnostic Accuracy in a Multicentre, Multi-Reader Non-Inferiority Study

Does AI Add Value to Mammography? Evaluating Diagnostic Accuracy in a Multicentre, Multi-Reader Non-Inferiority Study

study

Does AI Add Value to Mammography? Evaluating Diagnostic Accuracy in a Multicentre, Multi-Reader Non-Inferiority Study

What the study evaluated

The study evaluated whether Carebot AI MMG can achieve examination-level diagnostic performance comparable to breast radiologists on 2D full-field digital mammography.

A total of 4,729 routine screening examinations were collected from four centres in Slovakia and the Czech Republic. For the primary analysis, a fixed analytic subset of 222 examinations was used, including 48 histology-confirmed malignant cases and 174 non-malignant cases.

Six breast radiologists and the AI software independently classified anonymized mammography examinations. Performance was assessed at two predefined operating points: high sensitivity for rule-out use and high specificity for rule-in use.

Study results in clinical practice

Carebot AI MMG achieved diagnostic performance comparable to the radiologist benchmark in both operating modes. The high-sensitivity setting prioritized cancer detection and achieved sensitivity of 0.875 with NPV of 0.957. The high-specificity setting increased specificity to 0.839 while maintaining sensitivity of 0.833 and NPV of 0.948.

In both settings, balanced accuracy met the predefined non-inferiority margin compared with the radiologist benchmark. This supports the potential use of Carebot AI MMG as decision-support software in mammography workflows, with configurable operating points that can be aligned with local priorities for cancer detection, recall management, or workload support.

Because the analytic subset was enriched for malignant cases, PPV and NPV should be interpreted as descriptive for this study dataset rather than as direct population screening values.

Key numbers
  • Initial dataset: 4,729 screening mammography examinations

  • Malignant cases: 48 histology-confirmed cancers

  • Non-malignant cases: 174 examinations

  • Centres: 4 centres in Slovakia and the Czech Republic

  • High-sensitivity mode: Se 0.875, Sp 0.770, BA 0.823, NPV 0.957

  • High-specificity mode: Se 0.833, Sp 0.839, BA 0.836, NPV 0.948

  • Radiologist benchmark BA: 0.828 in HSe and 0.823 in HSp

  • Non-inferiority margin: δBA = 0.05, met in both operating modes

Abstract

Interpretation of mammography examinations is affected by inter-reader variability, which can influence recall decisions and patient management. This multicentre, retrospective, multi-reader diagnostic accuracy study evaluated Carebot AI MMG on 2D full-field digital mammography and compared its standalone performance with that of six breast radiologists. From 4,729 routine screening examinations acquired at four centres in Slovakia and the Czech Republic, a fixed analytic subset of 222 examinations was analyzed, including 48 histology-confirmed malignant cases and 174 non-malignant cases. Six radiologists and the AI software independently classified anonymized mammograms, blinded to clinical information and the reference standard. Performance was evaluated at two predefined operating points: high sensitivity and high specificity. At the high-sensitivity operating point, the AI achieved sensitivity 0.875, specificity 0.770, and balanced accuracy 0.823, compared with a radiologist benchmark balanced accuracy of 0.828. At the high-specificity operating point, the AI achieved sensitivity 0.833, specificity 0.839, and balanced accuracy 0.836, compared with a radiologist benchmark of 0.823. Negative predictive value exceeded 0.94 in both settings. The results support non-inferior examination-level performance of Carebot AI MMG compared with the radiologist benchmark on the multicentre case set. The study supports AI as clinically relevant decision-support software for mammography, with operating points that can be adapted to local priorities for cancer detection and recall management.

Would you like to test Carebot directly at your workplace?

Schedule a pilot run. Contact us and our application specialist will guide you through the entire process. Together, we will design a procedure, implement the solution in your PACS, obtain approval from the legal department, and train your doctors. No complicated adjustments, just real benefits.

Would you like to test Carebot directly at your workplace?

Schedule a pilot run. Contact us and our application specialist will guide you through the entire process. Together, we will design a procedure, implement the solution in your PACS, obtain approval from the legal department, and train your doctors. No complicated adjustments, just real benefits.

Would you like to test Carebot directly at your workplace?

Schedule a pilot run. Contact us and our application specialist will guide you through the entire process. Together, we will design a procedure, implement the solution in your PACS, obtain approval from the legal department, and train your doctors. No complicated adjustments, just real benefits.