AI rules out abnormal findings on chest X-rays, significantly reducing workloads
Commercially available artificial intelligence software can correctly exclude pathology on chest radiographs with accuracy rates similar to those of radiologists, new research in the journal Radiology indicates.
What’s more, the tool records fewer critical misses than physicians in many cases, missing over 1% at 95.4% sensitivity. This is an important finding regarding the quality of mistakes AI algorithms make, authors of the new paper suggest.
“Our group and others have previously shown that AI tools are capable of excluding pathology in chest X-rays with high confidence and thereby provide an autonomous normal report without a human in the loop,” lead author Louis Lind Plesner, MD, from the Department of Radiology at Herlev and Gentofte Hospital in Copenhagen, Denmark, and colleagues note. “Such AI algorithms miss very few abnormal chest radiographs. However, before our current study, we didn’t know what the appropriate threshold was for these models.”
The team sought to determine how often the algorithm could correctly identify unremarkable findings on chest X-rays compared to human radiologists, and whether the findings missed by AI were more consequential than those missed by rads. To do this, two thoracic radiologists labeled a group of nearly 2,000 chest X-rays as either remarkable or unremarkable. The algorithm was then adapted to output a chest radiograph’s “remarkableness” probability, which was used to calculate specificity at different sensitivities.
Next, one radiologist blinded to the AI output labeled the radiographs and their accompanying reports for missed findings, grading the miss as critical, clinically significant, or clinically insignificant. The missed findings of both the initial determinations from humans and the algorithm were then compared.
Just under 40% of the X-rays were labeled as unremarkable. At 99.9%, 99% and 98% sensitivity, AI achieved a specificity of 24.5%, 47.1% and 52.7%, respectively. At higher sensitivities (over 95.4%), it also recorded lower rates of critical misses than those identified in the initial radiology reports. This would have resulted in case volume reductions of 9.1%, 17.5% and 19.6% at the facilities where the images were obtained.
However, the misses made by AI were more consequential for patients than those missed by radiologists. This could be due to the differences in how AI and radiologists conduct their interpretations, the authors explain.
“Radiologists adjust their diagnostic thinking based on clinical context and prior imaging,” the group notes. "Currently available AI tools do not take such information as input; consequently, their output is related to the disease distribution of their training but not to the patient-specific clinical context.”
Validating these results could have significant implications for reducing workloads, the group suggests.
The study abstract is available here.