A majority of AI studies don’t adequately validate methods

Matt O'Connor | March 08, 2019 | Health Imaging | Artificial Intelligence

Before an AI algorithm can be put into practice it must be adequately validated on external datasets from multiple institutions. But a new study analyzing literature on the topic found most algorithms aren’t properly tested for real-world use.

Authors of the research, published in the Korean Journal of Radiology analyzed 516 published studies and found only six percent (31 studies) externally validated their AI. Of those 31 studies, zero took the necessary steps to determine if their method was indeed ready for clinical use.

“Nearly all of the studies published in the study period that evaluated the performance of AI algorithms for diagnostic analysis of medical images were designed as proof-of-concept technical feasibility studies and did not have the design features that are recommended for robust validation of the real-world clinical performance of AI algorithms,” wrote Seong Ho Park, MD, PhD, the department of radiology and research institute of radiology at the University of Ulsan College of Medicine, Seoul, Korea, and colleagues.

In order for an algorithm to be properly tested for image analysis, a study must consist of three design features: diagnostic cohort design, inclusion of multiple institutions and prospective data collection for external validation, according to the researchers. It’s recommended to use sizable datasets collected from newly recruited patients or institutions that did not provide training data. The data should reflect relevant variations in patient demographics and diseases in the setting in which the AI will be deployed.

Using data from multiple health systems is also central to becoming clinically-ready, Park et al. noted.

The researchers identified studies published in PubMed MEDLINE and Embase between January 2018 and August 17, 2018. They then determined if the study used external or internal validation methods. If external validation was used, they then noted if data was collected with a diagnostic cohort design rather than diagnostic case-control; from multiple institutions; and if it was gathered prospectively.

“Our results reveal that most recently published studies reporting the performance of AI algorithms for diagnostic analysis of medical images did not have design features that are recommended for robust validation of the clinical performance of AI algorithms, confirming the worries that premier journals have recently raised,” the authors wrote.

Park and colleagues noted some studies did not intend to test the real-world readiness of AI algorithms, but were designed rather to determine the technical feasibility of a method. Therefore, those studies merely testing technical ability were not necessarily poorly designed.

In the future, radiologists and researchers should take it upon themselves to distinguish between proof-of-concept studies and those meant to validate the clinical performance of an AI platform, the researchers wrote.

Matt O'Connor

Matt joined Chicago’s TriMed team in 2018 covering all areas of health imaging after two years reporting on the hospital field. He holds a bachelor’s in English from UIC, and enjoys a good cup of coffee and an interesting documentary.

Around the web

Cardiovascular Business

GE HealthCare launches new cardiac CT scanner with advanced AI capabilities

GE HealthCare designed the new-look Revolution Vibe CT scanner to help hospitals and health systems embrace CCTA and improve overall efficiency.

Cardiovascular Business

Bracco updates HeartSee coronary flow capacity software with new diagnostic features

Clinicians have been using HeartSee to diagnose and treat coronary artery disease since the technology first debuted back in 2018. These latest updates, set to roll out to existing users, are designed to improve diagnostic performance and user access.

Cardiovascular Business

Key trends in diagnostic heart testing: CT on the rise as some traditional techniques fall out of favor

The cardiac technologies clinicians use for CVD evaluations have changed significantly in recent years, according to a new analysis of CMS data. While some modalities are on the rise, others are being utilized much less than ever before.

A majority of AI studies don’t adequately validate methods

Related Content

Around the web