How GPT-4 can improve radiology resident report feedback

Large language models could be leveraged to help provide feedback on radiology residents’ preliminary reports during independent call. 

With resources stretched thin at many facilities, this type of feedback can often be limited while residents are on call. The growing capabilities of large language models, like OpenAI’s ChatGPT, could help address the issue by identifying and flagging residents’ missed diagnoses, authors of a new paper in Current Problems in Diagnostic Radiology report. 

“Methods for automatically identifying discrepancies between preliminary and final reports have been proposed, primarily using a combination of simple text discrepancy identifiers and standard review macros,” Wasif Bala, MD, with the Department of Radiology and Imaging Sciences at Emory University Hospital, and colleagues note. “Emerging technologies in natural language processing may build on this prior work, enabling new avenues for enhancing feedback.” 

To test the feasibility of using LLMs to identify discrepancies between final reports and radiology residents’ preliminary reports, the team input 250 report pairs (residents’ preliminary and attendings’ final reports) into GPT-4 API. The LLM was prompted to identify important findings that were present in the attendings’ final reports but missed in those input by the residents. 

GPT-4 identified 24 findings missed by residents, achieving nearly 80% accuracy. Further, the model did so across multiple report formats and variations in phrasing. 

Residents who received feedback from the LLM reported mostly positive experiences, rating the model’s notes 3.5 and 3.64 out of 5 for satisfaction and perceived accuracy. Nearly two-thirds preferred integrating LLM feedback with traditional report feedback. 

While there are still numerous issues—like the inability to identify semantic differences in reports, rather than just textual variations—that need to be addressed before LLMs can be reliably deployed into resident training, the authors maintain their optimism for the model's future potential. 

“The advent of large language models has opened new frontiers for providing effective feedback to medical trainees, enabling the identification of clinically meaningful errors in preliminary radiology reports,” the group writes. “These results highlight both the promising role of LLMs in augmenting traditional feedback mechanisms and the need for further refinement and evaluation of these models prior to widespread adoption.” 

Hannah murhphy headshot

In addition to her background in journalism, Hannah also has patient-facing experience in clinical settings, having spent more than 12 years working as a registered rad tech. She joined Innovate Healthcare in 2021 and has since put her unique expertise to use in her editorial role with Health Imaging.

Around the web

The nuclear imaging isotope shortage of molybdenum-99 may be over now that the sidelined reactor is restarting. ASNC's president says PET and new SPECT technologies helped cardiac imaging labs better weather the storm.

CMS has more than doubled the CCTA payment rate from $175 to $357.13. The move, expected to have a significant impact on the utilization of cardiac CT, received immediate praise from imaging specialists.

The newly cleared offering, AutoChamber, was designed with opportunistic screening in mind. It can evaluate many different kinds of CT images, including those originally gathered to screen patients for lung cancer. 

Trimed Popup
Trimed Popup