GPT-4 spots radiology report errors in less time, at a lower cost

Hannah Murphy | April 16, 2024 | Health Imaging | Enterprise Imaging

ChatGPT large language models radiology health care

GPT-4, a popular large language model produced by OpenAI, could save organizations time and money when it comes to spotting errors on radiology reports, a new paper published Tuesday in Radiology suggests.

When compared to six radiologists — two senior radiologists, two attending physicians and two residents — GPT-4 was able to identify report errors at a rate similar to that of the professional readers. What’s more, the large language model did so in less time than the fastest human reader, and at a lower correction cost per report.

The study’s lead author Roman J. Gertz, MD, resident in the Department of Radiology at University Hospital of Cologne, in Germany, noted that although the utility of GPT-4 has been studied in other aspects of radiology, this is the first analysis to compare its error detection performance to human readers, both seasoned and less experienced.

For the study, experts gathered 200 radiology reports completed over a six-month period at a single institution and intentionally inserted 150 errors into 100 of them. Both the radiologists and GPT-4 were then tasked with identifying the mistakes, which fell into one of five categories: omission, insertion, spelling, side confusion and “other.”

The senior radiologists achieved the highest detection rate, at 89.3%, while GPT-4 recorded an accuracy of 82.7% and the attending and resident radiologists spotted an average of 80% of the errors.

Though the senior readers had an edge in accuracy over GPT-4, the large language model identified the errors more quickly, at an average of 3.5 seconds compared to 25.1 seconds, and at lower correction cost of $0.03 versus $0.42, respectively.

“This efficiency in detecting errors may hint at a future where AI can help optimize the workflow within radiology departments, ensuring that reports are both accurate and promptly available,” Gertz said in a release on the findings, “thus enhancing the radiology department’s capacity to deliver timely and reliable diagnostics.”

Gertz went on to say that this latest research comes at a critical time when radiologists’ time is being stretched thin by an increase in demand that could open the door for more report errors, which would inevitably bleed into operational costs. AI, particularly large language models like GPT-4, could help to ease this burden, he suggested.

“Ultimately, our research provides a concrete example of how AI, specifically through applications like GPT-4, can revolutionize health care by boosting efficiency, minimizing errors and ensuring broader access to reliable, affordable diagnostic services—fundamental steps toward improving patient care outcomes.”

ChatGPT IDs incidental findings on CT images

ChatGPT knows a lot about PET scans, but its advice is inconsistent

Radiologist- and technologist-related errors to blame for most musculoskeletal MRI recalls

Radiologists who attend tumor boards record fewer errors

Hannah Murphy

In addition to her background in journalism, Hannah also has patient-facing experience in clinical settings, having spent more than 12 years working as a registered rad tech. She began covering the medical imaging industry for Innovate Healthcare in 2021.

Around the web

Cardiovascular Business

GE HealthCare launches new cardiac CT scanner with advanced AI capabilities

GE HealthCare designed the new-look Revolution Vibe CT scanner to help hospitals and health systems embrace CCTA and improve overall efficiency.

Cardiovascular Business

Bracco updates HeartSee coronary flow capacity software with new diagnostic features

Clinicians have been using HeartSee to diagnose and treat coronary artery disease since the technology first debuted back in 2018. These latest updates, set to roll out to existing users, are designed to improve diagnostic performance and user access.

Cardiovascular Business

Key trends in diagnostic heart testing: CT on the rise as some traditional techniques fall out of favor

The cardiac technologies clinicians use for CVD evaluations have changed significantly in recent years, according to a new analysis of CMS data. While some modalities are on the rise, others are being utilized much less than ever before.

GPT-4 spots radiology report errors in less time, at a lower cost

Related:

ChatGPT IDs incidental findings on CT images

ChatGPT knows a lot about PET scans, but its advice is inconsistent

Radiologist- and technologist-related errors to blame for most musculoskeletal MRI recalls

Radiologists who attend tumor boards record fewer errors

Related Content

Around the web