What the study evaluated
The study evaluated the performance of a deep learning–based automatic detection algorithm (DLAD) for pulmonary lesion detection on chest X-rays in a real-world, low-prevalence clinical setting. The algorithm was compared with six radiologists of varying experience using retrospectively collected CXRs from routine hospital practice, with expert consensus as ground truth.
Study results in clinical practice
In a setting with very low disease prevalence, DLAD demonstrated substantially higher sensitivity than all participating radiologists, significantly reducing the risk of missed pulmonary lesions. Specificity was lower than that of most radiologists, resulting in more false-positive findings. In clinical practice, this supports DLAD as a safety-enhancing tool that helps both junior and experienced radiologists detect rare but clinically relevant findings, particularly where prevalence-driven vigilance is challenging.
Key numbers
CXRs analyzed: 901
Pulmonary lesion prevalence: 2.3%
DLAD sensitivity: 90.5%
Radiologists’ sensitivity range: 23.8–66.7%
DLAD specificity: 89.3%
Radiologists’ specificity range: 88.4–99.9%
The rapid advancement of artificial intelligence (AI) in medical imaging has presented an exciting prospect of enhancing diagnostic accuracy and efficiency. One of the active areas of research is the use of deep-learning-based automatic detection algorithms (DLAD) in chest radiography, which has shown tremendous potential in identifying various findings such as tuberculosis or pulmonary lesions. However, despite the promising results in the controlled, high-prevalence simulated conditions typically observed in research settings, there are concerns about the use of these applications in real-world scenarios. For our study, we collected 956 chest X-ray images (CXR) from daily clinical practice at a municipal hospital. Two central readers with access to the patient's previous and subsequent examinations achieved blinded agreement for 901 CXRs, of which 21 were visually confirmed to contain one or more pulmonary lesions (prevalence: 2.3%) and 880 were found to contain no pulmonary lesions. Six radiologists of varying expertise were asked to conduct a retrospective analysis of these images. Subsequently, the performance of each radiologist was benchmarked against the ground truth and the proposed DLAD (2.0.20-v2.01). The proposed DLAD demonstrated higher sensitivity (Se of 0.905 (0.715-0.978)) than that of all assessed radiologists (RAD 1 0.238 (0.103-0.448), p < 0.001, RAD 2 0.333 (0.170-0.544), p< 0.001, RAD 3 0.524 (0.324-0.717), p < 0.001, RAD 4 0.619 (0.410-0.794), p < 0.001, RAD5 0.667 (0.456-0.83), p < 0.001, RAD 6 0.619 (0.41-0.794), p < 0.001), and the difference was statistically significant. The DLAD specificity (Sp of 0.893 (0.871-0.912)) was significantly lower than that of five compared radiologists (RAD 1 0.999 (0.994-1), p < 0.001,RAD 2 0.933 (0.915-0.948), p < 0.001, RAD 4 0.968 (0.955-0.978), p < 0.001, RAD 5 0.991(0.982-0.996), p < 0.001, RAD 6 0.989 (0.979-0.994), p < 0.001), with the exception of one, mid-level experienced radiologist but the difference was not statistically significant (RAD 30.884 (0.861-0.904), p = 0.685). The results of this study demonstrate that the proposedDLAD achieves a high level of sensitivity and a relatively reliable level of specificity even when applied in low-prevalence real-world settings. As a result, the proposed DLAD can be considered beneficial for both junior and more experienced radiologists.





