Experts concede that 'overly optimistic' AI imaging studies do not translate to clinical practice

An interpretable AI tool that detects COVID-19 on chest radiographs underwhelmed researchers in a recent analysis published in Radiology: Artificial Intelligence. 

While the tool performed well in patients with severe COVID, it fell short of expectations in differentiating those with milder cases and failed to rise to the level of diagnostic accuracy reached by board-certified radiologists.  

Co-senior author of the study Christopher Tignanelli, MD, an associate professor of surgery at the University of Minnesota Medical School and general surgeon at M Health Fairview, conceded that the results of the study indicate that there is more work to be done before AI can be reliably implemented in real-time clinical practice. 

“This study, which represents the first live investigation of an AI COVID-19 diagnostic model, highlights the potential benefits but also the limitations of AI,” Tignanelli said in a statement released by the University of Minnesota Medical School. “While promising, AI-based tools have not yet reached full diagnostic potential.” 

The model the experts assessed was trained and validated both externally and in real time on more than 95,000 chest radiographs, including a total of 5,335 real-time predictions.  

After 19 weeks of use, experts found that the model performance remained unchanged. The model’s sensitivity was higher in men than women, while specificity was higher in women than men. Its accuracy was reported as 63.5%, compared to 67.8% for the radiologists. 

Tignanelli and colleagues noted that earlier studies published that touted the utility of AI in the COVID era a bit prematurely, citing the use of publicly available datasets as a potential precursor to many of the “overly optimistic” results. 

“We saw the same overly optimistic performance in this study when we validated against two publicly available datasets; however, as we showed in our manuscript, this does not translate to the real world,” Tignanelli said. “It is imperative moving forward that researchers and journals alike develop standards requiring external or real-time prospective validation for peer-reviewed AI manuscripts.” 

In the future, the authors hope to develop an AI model that integrates data, including detailed report notes and structured data, from more than 40 U.S. and European sites. 

The study abstract can be viewed here

Trimed Popup
Trimed Popup