Non-experts can create AI to classify radiology images—but should they?

Matt O'Connor | September 06, 2019 | Health Imaging | Artificial Intelligence

A survey conducted by the Ann and Robert H. Lurie Children's Hospital of Chicago found more than 75% of parents are generally receptive to the use of artificial intelligence (AI) tools in the management of children with respiratory illnesses in the emergency department (ED). However, some demographic subgroups, including non-Hispanic black and younger age parents, had greater reservations about the use of these technologies.

Physicians with no coding experience are able to create AI algorithms to classify medical images at levels comparable to state-of-the-art platforms, according to a new study published in The Lancet Digital Health. However, some experts questioned whether those without experience should really be creating such technology.

Two physicians without any deep learning experience completed a 10 hour self-study period to understand the basics of programming before developing and testing their algorithms. Overall, the models performed well at binary classification tasks, but fell short when validated on external data, according to Livia Faes, MD, with Cantonal Hospital Lucerne in Switzerland, and colleagues.

“The availability of automated deep learning might be a cornerstone for the democratization of sophisticated algorithmic modelling in health care. It allows the derivation of classification models without requiring a deep understanding of the mathematical, statistical, and programming principles,” they added. “However, the translation of this technological success to meaningful clinical effect requires concerted efforts and a careful stepwise approach to avoid biasing the results.”

For their study, the researchers utilized five publicly available open-source datasets which were fed into a neural architecture search framework that automatically created a deep learning model to classify common disease. The datasets included: retinal fundus images (MESSIDOR); optical coherence tomography (OCT) images (Guangzhou Medical University and Shiley Eye Institute, version 3); images of skin lesions (Human Against Machine [HAM] 10000), and both pediatric and adult chest x-ray (CXR) images (Guangzhou Medical University and Shiley Eye Institute, version 3 and the National Institute of Health [NIH] dataset, respectively).

The researchers performed a literature review of traditional models to compare their performance against the newly created machine learning models.

Overall, upon internal validations, the non-expert models performed well in the binary classification task. Sensitivity ranged from 73.3% to 97%; specificity was between 67%-100%; and area under the precision recall curve (AURPC) came in at 0.87-1.00. For multiple classifications the models did not perform as well, ranging from 38% to 100% for sensitivity and from 67% to 100% for specificity. The five models ranged from 0.57 to 1.00 in terms of AUPRC.

When the team performed an external validation test, the top performing model demonstrated an AUPRC of .047; a sensitivity of 49% and positive predictive value of 52%. The worst performing model overall was that trained on the NIH chest x-ray dataset.

“From a methodological viewpoint, our results—as is also the case with the results reported in state-of-the-art deep learning studies—might be overly optimistic, because we were not able to test all the models out of sample, as recommended by current guidelines,” the researchers wrote.

Faes and colleagues suggested researchers and clinicians may be able to start producing in-house models based on data within their own institutions, but warned that regulatory guidelines will be important before such models can be used in clinical practice.

In a related editorial, Tom J. Pollard, PhD, with MIT, Cambridge, Massachusetts, and colleagues acknowledged that the study is “compelling,” but also expressed ethical reservations.

“We cautiously share the authors’ optimism that removing obstacles to algorithmic modelling will lead to improvements in patient care, but the risks of bypassing mathematical, statistical, and programming expertise must be emphasized,” Pollard et al. wrote.

“The use of machine learning methods without in-depth knowledge can result in misleading or outright erroneous results that would cause harm if used to guide the delivery of care,” the editorialists added. “A reliance on simple performance metrics alone does not allow the practitioner to interpret other aspects of model development.”

Matt O'Connor

Matt joined Chicago’s TriMed team in 2018 covering all areas of health imaging after two years reporting on the hospital field. He holds a bachelor’s in English from UIC, and enjoys a good cup of coffee and an interesting documentary.

Around the web

Radiology Business

The biggest obstacles facing radiology business managers in 2025

RBMA President Peter Moffatt discusses declining reimbursement rates, recruiting challenges and the role of artificial intelligence in transforming the industry.

Cardiovascular Business

New cardiac prevention paradigm explored in TRANSFORM trial using AI and CCTA

Deepak Bhatt, MD, director of the Mount Sinai Fuster Heart Hospital and principal investigator of the TRANSFORM trial, explains an emerging technique for cardiac screening: combining coronary CT angiography with artificial intelligence for plaque analysis to create an approach similar to mammography.

Cardiovascular Business

Cardiology practices pay $17.7M to settle fraud allegations brought on by whistleblowers

A total of 16 cardiology practices from 12 states settled with the DOJ to resolve allegations they overbilled Medicare for imaging agents used to diagnose cardiovascular disease.

Non-experts can create AI to classify radiology images—but should they?

Related Content

Around the web