Even AI struggles to work under stress, study suggests

Hannah Murphy | April 18, 2024 | Health Imaging | Artificial Intelligence

The ability to adapt to a changing environment is integral in the field of medicine, both for humans and algorithms. The best of both struggle from time to time.

This was the case recently for an award-winning deep learning model touted for its excellent performance in pediatric bone age prediction. When experts subjected the model, which previously won the 2017 RSNA Pediatric Bone Age Machine Learning Challenge, to a “stress test” of sorts, its resultant performance caused researchers to question how it might fare in the real world.

The research was detailed recently in a paper published in Radiology: Artificial Intelligence.

“Despite radiologist-level performance for medical imaging diagnosis, DL models’ robustness to both extreme and clinically encountered image variation has not been thoroughly evaluated,” corresponding author Paul H. Yi, from the Department of Diagnostic Radiology and Nuclear Medicine at the University of Maryland School of Medicine, and co-authors explained. “In clinical practice, image acquisition is variable, and there is no standard of orientation or postprocessing, which could be an overlooked source of error for DL models in radiology.”

Experts initially tested the model on two different datasets: the RSNA validation set, which contains more than 1,400 pediatric hand radiographs, and the Digital Hand Atlas (DHA), which boasts more than 1,200 pediatric hand x-rays. As expected, the model performed well on both, “indicating good model generalization to external data,” the authors wrote.

However, when the test was repeated after altering the images in a way reflective of real-world variations—by being rotated, flipped, inverted, having the marker moved or the contrast, brightness and resolution adjusted—the model’s predictions varied substantially from their baseline, resulting in clinically significant errors for 57% of its interpretations of images from the DHA dataset.

Many of the prediction errors would have resulted in a change in diagnosis and potentially treatment as well. This brings to light the “potential pitfalls in using these models in true clinical practice without physician oversight,” the authors noted, adding that great caution and physician oversight is imperative when deploying similar models into clinical practice.

"This stress testing is crucial for ensuring the clinical readiness of models and for accounting for potential variations in acquisition protocols, vendors, and so on. Rigorous stress testing at several checkpoints in the deployment pipeline can help facilitate improved model development, ultimately leading to the creation of more robust and widely applicable models, thus positively impacting patient care and safety."

ChatGPT both passes and fails at translating free-text into structured reports

AI able to assess invasiveness of lung lesions to aid in surgery

FDA assisting with new imaging 'marketplace' designed to improve AI

New AI model predicts cancer risk based on breast asymmetry

Hannah Murphy

In addition to her background in journalism, Hannah also has patient-facing experience in clinical settings, having spent more than 12 years working as a registered rad tech. She began covering the medical imaging industry for Innovate Healthcare in 2021.

Around the web

Cardiovascular Business

‘A significant milestone’: First US patients receive doses of new PET radiotracer for CAD

Back in September, the FDA approved GE HealthCare’s new PET radiotracer, flurpiridaz F-18, for patients with known or suspected CAD. It is seen by many in the industry as a major step forward in patient care.

Cardiovascular Business

Continuation of 99mTc-PYP shortage prompts need for alternative cardiac amyloidosis imaging

After three years of intermittent shortages of nuclear imaging tracer technetium-99m pyrophosphate, there are no signs of the shortage abating.

Radiology Business

The impact of Trump tariffs on iodine contrast media costs

GE HealthCare said the price of iodine contrast increased by more than 200% between 2017 to 2023. Will new Chinese tariffs drive costs even higher?