What the study evaluated
This single-site retrospective study assessed a deep learning–based automatic detection algorithm (DLAD) for identifying abnormalities on chest X-rays across 12 pre-selected findings. The model’s performance was compared with five radiologists of different experience levels on a dataset of 127 consecutive CXRs from a municipal hospital. Ground truth was established by an expert central reader with access to clinical records. MDPI+1
Study results in clinical practice
DLAD showed significantly higher sensitivity than radiologists for detecting any abnormality on CXRs, meaning fewer missed abnormal scans. Specificity was lower than radiologists’, resulting in more false-positive predictions. Negative predictive value was significantly higher for DLAD, indicating that scans classified as normal by the model were very likely truly normal. Clinically, this supports using AI to aid triage and prioritisation, helping reduce missed abnormalities and potentially accelerate reporting workflows, while final interpretation remains with the radiologist. MDPI
Key numbers
Images evaluated: 127 CXRs
DLAD sensitivity: 92.5%
Radiologist sensitivity (bootstrap average): 66.1%
DLAD specificity: 64.4%
Radiologist specificity (bootstrap average): 80.3%
Negative predictive value (DLAD): 94.9%
Negative likelihood ratio (DLAD): 0.12 (better than radiologists)
Chest X-ray (CXR) is one of the most common radiological examination for both non-emergent and emergent clinical indications, but human error or lack of prioritization of patients can hinder timely interpretation. Deep learning (DL) algorithms have been proven to be useful in the assessment of various abnormalities including tuberculosis, lung parenchymal lesions or pneumothorax. Deep learning–based automatic detection algorithm (DLAD) was developed to detect visual patterns on CXR for 12 preselected findings. To evaluate the proposed system, we designed a single-site retrospective study comparing the DL algorithm with the performance of 75 differently experienced radiologists. On the assessed dataset (n=127) collected from municipal hospital in the Czech Republic, DLAD achieved sensitivity (Se) of 0.925 and specificity (Sp) of 0.644, compared to bootstrapped radiologists’ Se of 0.661 and Sp of 0.803, respectively, with statistically significant difference. The negative likelihood ratio (NLR) of the proposed software (0.12 (0.04-0.32)) was significantly lower than radiologists’ assessment (0.42 (0.4-0.43), p<0.0001). No critical findings were missed by the software.





