Even AI struggles to work under stress, study suggests

Hannah Murphy | April 18, 2024 | Health Imaging | Artificial Intelligence

The ability to adapt to a changing environment is integral in the field of medicine, both for humans and algorithms. The best of both struggle from time to time.

This was the case recently for an award-winning deep learning model touted for its excellent performance in pediatric bone age prediction. When experts subjected the model, which previously won the 2017 RSNA Pediatric Bone Age Machine Learning Challenge, to a “stress test” of sorts, its resultant performance caused researchers to question how it might fare in the real world.

The research was detailed recently in a paper published in Radiology: Artificial Intelligence.

“Despite radiologist-level performance for medical imaging diagnosis, DL models’ robustness to both extreme and clinically encountered image variation has not been thoroughly evaluated,” corresponding author Paul H. Yi, from the Department of Diagnostic Radiology and Nuclear Medicine at the University of Maryland School of Medicine, and co-authors explained. “In clinical practice, image acquisition is variable, and there is no standard of orientation or postprocessing, which could be an overlooked source of error for DL models in radiology.”

Experts initially tested the model on two different datasets: the RSNA validation set, which contains more than 1,400 pediatric hand radiographs, and the Digital Hand Atlas (DHA), which boasts more than 1,200 pediatric hand x-rays. As expected, the model performed well on both, “indicating good model generalization to external data,” the authors wrote.

However, when the test was repeated after altering the images in a way reflective of real-world variations—by being rotated, flipped, inverted, having the marker moved or the contrast, brightness and resolution adjusted—the model’s predictions varied substantially from their baseline, resulting in clinically significant errors for 57% of its interpretations of images from the DHA dataset.

Many of the prediction errors would have resulted in a change in diagnosis and potentially treatment as well. This brings to light the “potential pitfalls in using these models in true clinical practice without physician oversight,” the authors noted, adding that great caution and physician oversight is imperative when deploying similar models into clinical practice.

"This stress testing is crucial for ensuring the clinical readiness of models and for accounting for potential variations in acquisition protocols, vendors, and so on. Rigorous stress testing at several checkpoints in the deployment pipeline can help facilitate improved model development, ultimately leading to the creation of more robust and widely applicable models, thus positively impacting patient care and safety."

ChatGPT both passes and fails at translating free-text into structured reports

AI able to assess invasiveness of lung lesions to aid in surgery

FDA assisting with new imaging 'marketplace' designed to improve AI

New AI model predicts cancer risk based on breast asymmetry

Hannah Murphy

In addition to her background in journalism, Hannah also has patient-facing experience in clinical settings, having spent more than 12 years working as a registered rad tech. She began covering the medical imaging industry for Innovate Healthcare in 2021.

Around the web

Cardiovascular Business

GE HealthCare launches new cardiac CT scanner with advanced AI capabilities

GE HealthCare designed the new-look Revolution Vibe CT scanner to help hospitals and health systems embrace CCTA and improve overall efficiency.

Cardiovascular Business

Bracco updates HeartSee coronary flow capacity software with new diagnostic features

Clinicians have been using HeartSee to diagnose and treat coronary artery disease since the technology first debuted back in 2018. These latest updates, set to roll out to existing users, are designed to improve diagnostic performance and user access.

Cardiovascular Business

Key trends in diagnostic heart testing: CT on the rise as some traditional techniques fall out of favor

The cardiac technologies clinicians use for CVD evaluations have changed significantly in recent years, according to a new analysis of CMS data. While some modalities are on the rise, others are being utilized much less than ever before.

Even AI struggles to work under stress, study suggests

More on Artificial Intelligence in Radiology:

ChatGPT both passes and fails at translating free-text into structured reports

AI able to assess invasiveness of lung lesions to aid in surgery

FDA assisting with new imaging 'marketplace' designed to improve AI

New AI model predicts cancer risk based on breast asymmetry

Related Content

Around the web