GPT-4 confidently struggles on radiology exam

Hannah Murphy | April 26, 2024 | Health Imaging | Artificial Intelligence

The ability of ChatGPT to provide accurate medical information is well documented by now, but if the popular large language model’s most recent radiology exam scores are to be believed, it still cannot hold a candle to human medical professionals.

A new paper in Academic Radiology details ChatGPT’s struggle to compete with students on the American College of Radiology’s (ACR) Diagnostic Radiology In-Training Examination (DXIT). According to the analysis, the chatbot was not lacking in confidence, though it was in accuracy.

Researchers from Stony Brook University Hospital and the Northwestern University Feinberg School of Medicine recently put ChatGPT through a series of experiments, including both text- and image-based assessments, across different time points and with additional training to examine how the large language model could adapt to the new information it was exposed to.

On DXIT, GPT-4 achieved an overall accuracy of 58.5%, which was lower than that of third year post-graduate radiology students but slightly more than second year students.

Image-based questions were significantly more challenging for GPT-4 to answer, despite the latest version of the large language model now being capable of accepting image prompts. It yielded an accuracy of 45.4% on image-based prompts—significantly lower than the 80% accuracy it achieved on text-based prompts.

When experts repeated their experiment at different time intervals and fine-tuned their prompts, GPT-4 still struggled to keep up with students' performance. Even after fine-tuning, GPT-4's answers did not improve. In fact, when questions were repeated, it changed its answer more than 25% of the time, with no improvement in accuracy.

GPT-4 did accurately diagnose numerous critical conditions, but it failed to identify several fatal ones, such as ruptured aortic aneurysm. Despite its fluctuating accuracy, the large language model showed high confidence in over 80% of its answers, regardless of whether they were correct.

The study’s corresponding author David L. Payne, chief radiology resident at Stony Brook University Hospital in New York, and colleagues advised that their findings suggest similar large language models, though potentially very useful, will “require ongoing monitoring to ensure their reliability in clinical settings.”

“Clinical implementers of general (and narrow) AI radiology systems should exercise caution given the possibility of spurious yet confident responses as well as a high degree of output variability with identical inputs across time.”

ChatGPT IDs incidental findings on CT images

Clinics tap GPT-4 to ease charting burden, improve patient care

ChatGPT answers straightforward cardiology questions, but struggles with complex cases

AI program ChatGPT now has a published article in Radiology—is it any good?

Hannah Murphy

In addition to her background in journalism, Hannah also has patient-facing experience in clinical settings, having spent more than 12 years working as a registered rad tech. She began covering the medical imaging industry for Innovate Healthcare in 2021.

Around the web

Cardiovascular Business

ASE updates recommendations for assessing right heart function in patients with pulmonary hypertension

The new guidelines were designed to ensure sonographers and other members of the heart team have the information they need to screen patients when appropriate and identify early warnings signs of PH.

Radiology Business

Shift toward imaging outside the hospital could save billions

Harvard’s David A. Rosman, MD, MBA, explains how moving imaging outside of hospitals could save billions of dollars for U.S. healthcare.

Cardiovascular Business

‘A significant milestone’: First US patients receive doses of new PET radiotracer for CAD

Back in September, the FDA approved GE HealthCare’s new PET radiotracer, flurpiridaz F-18, for patients with known or suspected CAD. It is seen by many in the industry as a major step forward in patient care.