What the study evaluated
The study evaluated the performance of Carebot AI CXR, a deep learning–based detection system, for identifying seven common radiological findings on chest X-rays in routine clinical practice. AI performance was compared with six radiologists of varying experience in a multi-reader design using 956 consecutive CXRs from a real hospital workflow, with expert consensus as ground truth.
Study results in clinical practice
Across all evaluated findings, AI consistently achieved high sensitivity, particularly for low-prevalence and clinically relevant abnormalities such as pulmonary lesions and pneumothorax. This significantly reduced false negatives compared to junior and intermediate radiologists. Specificity was generally lower than that of experienced radiologists, resulting in more false-positive alerts. In clinical practice, this confirms AI’s role as a safety-focused decision support tool that helps standardise detection quality and mitigates experience-related variability.
Key numbers
CXRs analysed: 956
Findings detected: 7 (atelectasis, consolidation, pleural effusion, pulmonary lesion, subcutaneous emphysema, cardiomegaly, pneumothorax)
Pulmonary lesion sensitivity (AI): 90.5%
Pulmonary lesion sensitivity (radiologists): 23.8–66.7%
Pleural effusion sensitivity (AI): 95.3%
Overall trend: higher sensitivity than most radiologists, lower specificity
Greatest benefit: junior and mid-level radiologists
In this study, we developed a deep-learning-based automatic detection algorithm (DLAD, Carebot AI CXR) to detect and localize seven specific radiological findings (atelectasis (ATE), consolidation (CON), pleural effusion (EFF), pulmonary lesion (LES), subcutaneous emphysema (SCE), cardiomegaly (CMG), pneumothorax (PNO)) on chest X-rays (CXR). We collected 956 CXRs and compared the performance of the DLAD with that of six individual radiologists who assessed the images in a hospital setting. The proposed DLAD achieved high sensitivity (ATE 1.000 (0.624-1.000), CON 0.864 (0.671-0.956), EFF 0.953 (0.887-0.983), LES 0.905 (0.715-0.978), SCE 1.000 (0.366-1.000), CMG 0.837 (0.711-0.917), PNO 0.875 (0.538-0.986)), even when compared to the radiologists (LOWEST: ATE 0.000 (0.000-0.376), CON 0.182 (0.070-0.382), EFF 0.400 (0.302-0.506), LES 0.238 (0.103-0.448), SCE 0.000 (0.000-0.634), CMG 0.347 (0.228-0.486), PNO 0.375 (0.134-0.691), HIGHEST: ATE 1.000 (0.624-1.000), CON 0.864 (0.671-0.956), EFF 0.953 (0.887-0.983), LES 0.667 (0.456-0.830), SCE 1.000 (0.366-1.000), CMG 0.980 (0.896-0.999), PNO 0.875 (0.538-0.986)). The findings of the study demonstrate that the suggested DLAD holds potential for integration into everyday clinical practice as a decision support system, effectively mitigating the false negative rate associated with junior and intermediate radiologists.





