GPT-4 as accurate as neurologists in predicting final diagnosis based on MRI reports

Hannah Murphy | October 01, 2024 | Health Imaging | Artificial Intelligence

As large language models continue to advance, so too does their potential to enhance diagnostic processes within the radiology space. Now, new research is suggesting that LLMs could predict diagnoses based on imaging reports at an accuracy in line with that of human physicians.

Recently published in European Radiology, the study details how OpenAI’s star LLM, GPT-4, was able to surpass radiologists’ accuracy in predicting final diagnoses using preoperative MRI reports describing brain tumors. Researchers involved in the work signaled that their results indicate a role for LLMs in providing second opinions.

“Within the realm of LLMs, the GPT series, in particular, has gained significant attention,” corresponding author Daiju Ueda, an associate professor at Osaka Metropolitan University’s Graduate School of Medicine in Japan, and co-authors noted. “Many applications have been explored within the field of radiology. Among these, the potential of GPT to assist in diagnosis from image findings is noteworthy because such capabilities could complement the essential aspects of daily clinical practice and education.”

For the study, researchers first translated 150 preoperative brain MRI reports—compiled by either a radiologist or neurologist—from Japanese to English. GPT-4 and a group of five radiologists were given the textual findings from the reports and asked to provide a differential and final diagnosis. Those diagnoses were then compared alongside post-operative pathological analyses to determine accuracy.

For final diagnoses, GPT-4 achieved an accuracy of 74%, while the radiologists’ predictions were accurate between 65% and 79% of the time. However, GPT-4's accuracy climbed to 80% when reports were written by neurologists, compared to 60% with reports from radiologists.

Notably, GPT-4's accuracy was markedly better than the radiologists’ in terms of differential diagnosis, at 94%. In comparison, the radiologists’ highest accuracy was recorded at 89%. Differential diagnoses were consistent regardless of which provider wrote the report.

The authors pointed out that their research is the first time GPT-4 has been challenged with actual clinical radiology reports.

“The majority of previous research suggested the utility of GPT-4 in diagnostics, but these relied heavily on hypothetical environments such as quizzes from academic journals or examination questions,” the group noted. “This approach can lead to a cognitive bias since the individuals formulating the imaging findings or exam questions also possess the answers.”

In contrast, using real clinical reports to test LLMs provides a more robust understanding of their accuracy and how they might perform in clinical settings. The group suggested that their findings indicate legitimate clinical potential for GPT-4.

“The encouraging results of this study invite further evaluations of the LLM’s accuracy across a myriad of medical fields and imaging modalities," the authors wrote. “The end goal of such exploration is to pave the way for the development of more versatile, reliable, and powerful tools for healthcare.”

Large language models not quite ready for cancer staging responsibilities

Healthcare AI company launches radiology-specific vision language model

GPT-4 is better at explaining IR procedures than physicians

Hannah Murphy

In addition to her background in journalism, Hannah also has patient-facing experience in clinical settings, having spent more than 12 years working as a registered rad tech. She began covering the medical imaging industry for Innovate Healthcare in 2021.

Around the web

Radiology Business

AI can help radiology standardize image exam data labeling

To fully leverage today's radiology IT systems, standardization is a necessity. Steve Rankin, chief strategy officer for Enlitic, explains how artificial intelligence can help.

Radiology Business

The biggest obstacles facing radiology business managers in 2025

RBMA President Peter Moffatt discusses declining reimbursement rates, recruiting challenges and the role of artificial intelligence in transforming the industry.

Cardiovascular Business

New cardiac prevention paradigm explored in TRANSFORM trial using AI and CCTA

Deepak Bhatt, MD, director of the Mount Sinai Fuster Heart Hospital and principal investigator of the TRANSFORM trial, explains an emerging technique for cardiac screening: combining coronary CT angiography with artificial intelligence for plaque analysis to create an approach similar to mammography.

GPT-4 as accurate as neurologists in predicting final diagnosis based on MRI reports

Large language models not quite ready for cancer staging responsibilities

Healthcare AI company launches radiology-specific vision language model

GPT-4 is better at explaining IR procedures than physicians

Related Content

Around the web