GPT-4 now has vision—can it actually read chest X-rays?

Hannah Murphy | May 07, 2024 | Health Imaging | Artificial Intelligence

Finely tuned, pre-trained large language models are beginning to reliably translate image content into text, but are they ready to take on medical images?

Not quite yet. At least, not according to a new study published Tuesday in Radiology that put GPT-4 with vision (GPT-4V) to the test on a series of chest radiographs. The visual recognition skills of the large language model fell far short of clinical standards, achieving a positive predictive value (PPV) of less than 25% during its best attempt at trying to spot image findings from a set of 100 chest x-rays.

“The emergence of multimodal large language models (LLMs) that can understand both text and images, such as OpenAI’s GPT-4V, shows potential for automated image-text pair generation,” corresponding author Yifan Peng, PhD, an associate professor with the department of population health sciences at Weill Cornell Medicine, and colleagues noted. “Although GPT-4V has shown promise in understanding real-world images, its effectiveness in interpreting real-world chest radiographs was limited.”

To test GPT-4V's image interpretation skills, experts presented it with 100 different radiographs from two public datasets (one from the National Institutes of Health and one from the Medical Imaging and Data Resource Center). Two separate assessments took place—one in a zero-shot setting that did not provide the large language model with prior examples, and one in a few-shot setting that provided two examples. Outcomes were determined using ICD-10 codes for the images’ corresponding clinical conditions.

In the zero-shot setting, GPT-4V achieved a positive predictive value for detecting ICD-10 coded conditions of just over 12% and earned an average true positive rate (TPR) of 5.8% on the NIH dataset. On the MIDRC dataset, GPT-4V recorded its highest PPV (25%) and average TPR of 17%.

A slightly better performance was observed after GPT-4V was given image examples to learn from, but the improvement was not substantial.

Overall, GPT-4V was most accurate in identifying chest drains, air-space disease and lung opacity, and least able to detect endotracheal tubes, central venous catheters and degenerative changes of osseous structures.

“Our results highlight the need for additional comprehensive development and assessment prior to incorporating the GPT-4V model into clinical practice routines,” the authors wrote, adding that larger, more diverse datasets containing real-world images from multiple modalities are needed to more thoroughly train and validate these models.

GPT-4 confidently struggles on radiology exam

GPT-4, a new upgrade from the team behind ChatGPT, can help doctors with difficult diagnoses

ChatGPT offers 'pretty amazing' recommendations on breast cancer screening, but oversight remains critical

AI program ChatGPT now has a published article in Radiology—is it any good?

Hannah Murphy

In addition to her background in journalism, Hannah also has patient-facing experience in clinical settings, having spent more than 12 years working as a registered rad tech. She began covering the medical imaging industry for Innovate Healthcare in 2021.

Around the web

Cardiovascular Business

GE HealthCare launches new cardiac CT scanner with advanced AI capabilities

GE HealthCare designed the new-look Revolution Vibe CT scanner to help hospitals and health systems embrace CCTA and improve overall efficiency.

Cardiovascular Business

Bracco updates HeartSee coronary flow capacity software with new diagnostic features

Clinicians have been using HeartSee to diagnose and treat coronary artery disease since the technology first debuted back in 2018. These latest updates, set to roll out to existing users, are designed to improve diagnostic performance and user access.

Cardiovascular Business

Key trends in diagnostic heart testing: CT on the rise as some traditional techniques fall out of favor

The cardiac technologies clinicians use for CVD evaluations have changed significantly in recent years, according to a new analysis of CMS data. While some modalities are on the rise, others are being utilized much less than ever before.

GPT-4 now has vision—can it actually read chest X-rays?

More on ChatGPT:

GPT-4 confidently struggles on radiology exam

GPT-4, a new upgrade from the team behind ChatGPT, can help doctors with difficult diagnoses

ChatGPT offers 'pretty amazing' recommendations on breast cancer screening, but oversight remains critical

AI program ChatGPT now has a published article in Radiology—is it any good?

Related Content

Around the web