GPT-4 spots radiology report errors in less time, at a lower cost

GPT-4, a popular large language model produced by OpenAI, could save organizations time and money when it comes to spotting errors on radiology reports, a new paper published Tuesday in Radiology suggests. 

When compared to six radiologists — two senior radiologists, two attending physicians and two residents — GPT-4 was able to identify report errors at a rate similar to that of the professional readers. What’s more, the large language model did so in less time than the fastest human reader, and at a lower correction cost per report. 

The study’s lead author Roman J. Gertz, MD, resident in the Department of Radiology at University Hospital of Cologne, in Germany, noted that although the utility of GPT-4 has been studied in other aspects of radiology, this is the first analysis to compare its error detection performance to human readers, both seasoned and less experienced.  

For the study, experts gathered 200 radiology reports completed over a six-month period at a single institution and intentionally inserted 150 errors into 100 of them. Both the radiologists and GPT-4 were then tasked with identifying the mistakes, which fell into one of five categories: omission, insertion, spelling, side confusion and “other.” 

The senior radiologists achieved the highest detection rate, at 89.3%, while GPT-4 recorded an accuracy of 82.7% and the attending and resident radiologists spotted an average of 80% of the errors. 

Though the senior readers had an edge in accuracy over GPT-4, the large language model identified the errors more quickly, at an average of 3.5 seconds compared to 25.1 seconds, and at lower correction cost of $0.03 versus $0.42, respectively. 

“This efficiency in detecting errors may hint at a future where AI can help optimize the workflow within radiology departments, ensuring that reports are both accurate and promptly available,” Gertz said in a release on the findings, “thus enhancing the radiology department’s capacity to deliver timely and reliable diagnostics.” 

Gertz went on to say that this latest research comes at a critical time when radiologists’ time is being stretched thin by an increase in demand that could open the door for more report errors, which would inevitably bleed into operational costs. AI, particularly large language models like GPT-4, could help to ease this burden, he suggested. 

“Ultimately, our research provides a concrete example of how AI, specifically through applications like GPT-4, can revolutionize health care by boosting efficiency, minimizing errors and ensuring broader access to reliable, affordable diagnostic services—fundamental steps toward improving patient care outcomes.” 

Hannah murhphy headshot

In addition to her background in journalism, Hannah also has patient-facing experience in clinical settings, having spent more than 12 years working as a registered rad tech. She joined Innovate Healthcare in 2021 and has since put her unique expertise to use in her editorial role with Health Imaging.

Trimed Popup
Trimed Popup