AI performs poorly when tested on data from multiple health systems

Melissa Rohman | November 07, 2018 | Health Imaging | Artificial Intelligence

As computer-aided diagnosis gains popularity in medical imaging, the use of artificial intelligence (AI)-powered computer system frameworks like convolutional neural networks (CNNs) are following suit.

However, researchers at the Icahn School of Medicine at Mount Sinai in New York City found convolutional neural networks (CNNs) trained to detect pneumonia on chest x-rays performed poorly when tested on data from outside health systems, according to a study published online Nov. 6 in PLOS Medicine.

The research suggests that AI must be tested for performance across a wide range of populations and data sets to ensure accuracy in medical diagnosis.

“If CNN systems are to be used for medical diagnosis, they must be tailored to carefully consider clinical questions, tested for a variety of real-world scenarios and carefully assessed to determine how they impact accurate diagnosis,” first author John Zech, a medical student at the Icahn School of Medicine at Mount Sinai, said in a prepared statement.

Zech and colleagues reviewed how accurately AI models created at the Icahn School of Medicine detected cases of pneumonia in 158,000 chest x-rays taken from the National Institutes of Health, Mount Sinai Hospital, and Indiana University Hospital.

In three out of five comparisons, CNNs’ performance was lower in diagnosing diseases on x-rays from hospitals outside of its own network than on x-rays from the original health system, according to the researchers. However, CNNs were able to detect where a chest x-ray was acquired with high accuracy and predictive analysis.

The researchers concluded that deep learning models encounter the most difficultly when using massive amounts of parameters. This, the researchers explained, makes it challenging for AI models like CNNs to identify variables diving predictions such as type of imaging modality or resolution quality of imaging.

“Our findings should give pause to those considering rapid deployment of artificial intelligence platforms without rigorously assessing their performance in real-world clinical settings reflective of where they are being deployed,” said senior author Eric Oermann, MD, professor of neurosurgery at the Icahn School of Medicine at Mount Sinai. “Deep learning models trained to perform medical diagnosis can generalize well, but this cannot be taken for granted since patient populations and imaging techniques differ significantly across institutions.”

Melissa Rohman

A recent graduate from Dominican University (IL) with a bachelor’s in journalism, Melissa joined TriMed’s Chicago team in 2017 covering all aspects of health imaging. She’s a fan of singing and playing guitar, elephants, a good cup of tea, and her golden retriever Cooper.

Around the web

Cardiovascular Business

ASE updates recommendations for assessing right heart function in patients with pulmonary hypertension

The new guidelines were designed to ensure sonographers and other members of the heart team have the information they need to screen patients when appropriate and identify early warnings signs of PH.

Radiology Business

Shift toward imaging outside the hospital could save billions

Harvard’s David A. Rosman, MD, MBA, explains how moving imaging outside of hospitals could save billions of dollars for U.S. healthcare.

Cardiovascular Business

‘A significant milestone’: First US patients receive doses of new PET radiotracer for CAD

Back in September, the FDA approved GE HealthCare’s new PET radiotracer, flurpiridaz F-18, for patients with known or suspected CAD. It is seen by many in the industry as a major step forward in patient care.

AI performs poorly when tested on data from multiple health systems

Related Content

Around the web