ChatGPT shows 'significant promise' in guiding contrast-related decisions

With proper training, popular large language model ChatGPT could help guide radiologists on the use of contrast media in patients who might be at risk of an adverse reaction. 

Researchers recently tested OpenAI’s latest ChatGPT model—GPT-4—to assess how its ability to provide information related to contrast media might change when trained on guideline-backed information. The group observed a significant improvement in the model’s accuracy after using real administration guidelines to train it. 

Experts involved in the study suggested that the large language model’s ability to adapt when given updated information could indicate that it has a promising future as a support tool in clinical settings. This could be especially helpful when timely decisions relative to the use of contrast need to be made. 

“Wrong administration can lead to complications like allergy or post contrast acute kidney injury, highlighting the importance of structured guidelines,” corresponding author Michael Scheschenja, with the department of diagnostic and interventional radiology at University Hospital Marburg in Germany, and co-authors noted. “These evidence-based guidelines provide clarity for radiologists in ambiguous situations. However, the detailed nature of these guidelines can be time-consuming in urgent clinical scenarios, leading to possible nonadherence with potential serious repercussions.” 

For their study, the team initially tested the accuracy of GPT-4's recommendations without any added training by asking it 64 questions related to contrast administration. After that, a plug-in containing official contrast guidelines was used to train the model on specific situations. 

Exposure to the guidelines significantly improved GPT-4's recommendation accuracy. The average quality rating of its responses rose from 3.98 to 4.33, while its utility scores climbed from 4.1 to 4.4. 

Overall, 82.3% of GPT-4's recommendations were “highly” guideline adherent and 14% were considered “moderately” accurate. 

“This implies that GPT-4 is not just capable of generating humanlike textual responses but can also extract relevant information from extensive documents,” the authors wrote. 

While GPT-4 performed well overall, it must be noted that its responses were not without fault. There were several instances when its answers were graded as “insufficient” or “very bad” by seasoned radiologists. This highlights one of the major challenges hindering the advancement of large language models—their dependability heavily relies on the quality and specificity of information with which they are trained. 

Direct training via plugins could help address the issue, but additional research to determine the reasons behind large language models’ fluctuating accuracy is still needed, the authors suggested. 

The study abstract can be viewed in Current Problems in Diagnostic Radiology.

Hannah murhphy headshot

In addition to her background in journalism, Hannah also has patient-facing experience in clinical settings, having spent more than 12 years working as a registered rad tech. She joined Innovate Healthcare in 2021 and has since put her unique expertise to use in her editorial role with Health Imaging.

Trimed Popup
Trimed Popup