GPT-4 spots radiology report errors in less time, at a lower cost

Hannah Murphy | April 16, 2024 | Health Imaging | Enterprise Imaging

ChatGPT large language models radiology health care

GPT-4, a popular large language model produced by OpenAI, could save organizations time and money when it comes to spotting errors on radiology reports, a new paper published Tuesday in Radiology suggests.

When compared to six radiologists — two senior radiologists, two attending physicians and two residents — GPT-4 was able to identify report errors at a rate similar to that of the professional readers. What’s more, the large language model did so in less time than the fastest human reader, and at a lower correction cost per report.

The study’s lead author Roman J. Gertz, MD, resident in the Department of Radiology at University Hospital of Cologne, in Germany, noted that although the utility of GPT-4 has been studied in other aspects of radiology, this is the first analysis to compare its error detection performance to human readers, both seasoned and less experienced.

For the study, experts gathered 200 radiology reports completed over a six-month period at a single institution and intentionally inserted 150 errors into 100 of them. Both the radiologists and GPT-4 were then tasked with identifying the mistakes, which fell into one of five categories: omission, insertion, spelling, side confusion and “other.”

The senior radiologists achieved the highest detection rate, at 89.3%, while GPT-4 recorded an accuracy of 82.7% and the attending and resident radiologists spotted an average of 80% of the errors.

Though the senior readers had an edge in accuracy over GPT-4, the large language model identified the errors more quickly, at an average of 3.5 seconds compared to 25.1 seconds, and at lower correction cost of $0.03 versus $0.42, respectively.

“This efficiency in detecting errors may hint at a future where AI can help optimize the workflow within radiology departments, ensuring that reports are both accurate and promptly available,” Gertz said in a release on the findings, “thus enhancing the radiology department’s capacity to deliver timely and reliable diagnostics.”

Gertz went on to say that this latest research comes at a critical time when radiologists’ time is being stretched thin by an increase in demand that could open the door for more report errors, which would inevitably bleed into operational costs. AI, particularly large language models like GPT-4, could help to ease this burden, he suggested.

“Ultimately, our research provides a concrete example of how AI, specifically through applications like GPT-4, can revolutionize health care by boosting efficiency, minimizing errors and ensuring broader access to reliable, affordable diagnostic services—fundamental steps toward improving patient care outcomes.”

ChatGPT IDs incidental findings on CT images

ChatGPT knows a lot about PET scans, but its advice is inconsistent

Radiologist- and technologist-related errors to blame for most musculoskeletal MRI recalls

Radiologists who attend tumor boards record fewer errors

Hannah Murphy

In addition to her background in journalism, Hannah also has patient-facing experience in clinical settings, having spent more than 12 years working as a registered rad tech. She began covering the medical imaging industry for Innovate Healthcare in 2021.

Around the web

Radiology Business

The impact of Trump tariffs on iodine contrast media costs

GE HealthCare said the price of iodine contrast increased by more than 200% between 2017 to 2023. Will new Chinese tariffs drive costs even higher?

Cardiovascular Business

COVID-19 linked to accelerated plaque growth, long-term risk of heart attack or stroke

These risks appear to be present regardless of a person's age or health at the time of infection.

Radiology Business

Top performing PACS companies based on user feedback

Agfa and Sectra both performed well with end-user satisfaction scores in the 2025 Best in KLAS list of radiology IT systems.