Large language models make radiology reports more patient friendly
Generative artificial intelligence can significantly improve patients’ understanding of their radiology reports, in addition to streamlining communication between referring providers and radiologists.
That’s according to new research published in Scientific Reports that analyzed the effectiveness of AI-generated reports in simplifying radiologists’ imaging interpretations into more easily understandable language, as judged by both radiologists and nonphysicians.
“Radiologic reports are indeed intended for communication among medical experts, making it challenging for patients to understand the content of such reports with specialized and complex medical terminologies,” co-corresponding authors Kyunghwa Han and Young Han Lee, both from the Center for Clinical Imaging Data Science at Yonsei University College of Medicine in South Korea, explain. “Nevertheless, it would be difficult, given the current medical system, to directly assign radiologists the task of providing patient-friendly reports, despite the reported benefits for patients to adhere to treatment plans and achieve better outcomes when they can comprehend their current radiological findings and their disease process.”
For the study, experts pulled 685 spinal MRI reports from a single hospital’s database. The team input the reports into GPT-3.5 (OpenAI’s updated turbo model of ChatGPT) and then prompted it to use the information from the reports to create three text documents— a summary of the findings, a patient-friendly version of the report and recommendations based on information contained in the original report.
The results were evaluated by four people—two radiologists and two nonphysicians. The patient friendliness of the reports was rated on a five-point scale and then compared with the raters’ understanding of the original reports.
Overall, their understanding of the original reports was scored at 2.71, while the AI-generated reports scored significantly better at 4.69.
AI hallucinations—when large language models create false, inaccurate or misleading information—were observed in just over 1% of the reports, and 7.4% contained “potentially harmful” translations of the original radiologist-provided information. Although these figures are not high, “they cannot be disregarded in medicine, which relies heavily on truthfulness, especially in nonhealthcare professionals who could not [be] aware of the potential issues they can pose,” the authors caution.
Acknowledging these challenges, the group maintains their perspective that, once appropriately refined to a medical setting, large language models will have the potential to bring more good than harm to both patients and providers.
The study is available here.