GPT-4 is better at explaining IR procedures than physicians
ChatGPT’s knack for reformulating medical jargon into content that is easily digestible for patients keeps improving.
The large language model most recently took on educational materials related to interventional radiology—a specialty for which health literacy is lacking, despite the growing demand for IR services. Experts recently suggested that the LLM might be able to help fill these gaps, making their case in the Journal of the American College of Radiology.
“Most procedural-related instructions are complex and often require a more developed literacy level, sometimes with a medical knowledge base, to fully understand the procedural steps and associated risks and benefits," corresponding author Tarig Elhakim, MD, with the Perelman School of Medicine at the University of Pennsylvania, and colleagues cautioned. "The added complexity of interventional radiology procedures presents an additional challenge for the patients’ comprehension.”
The emergence of large language models has piqued the interest of many experts in the medical field due, at least in part, to their ability to quickly translate medical information into lay language. ChatGPT—OpenAI's LLM—quickly gained popularity for this reason and has been the subject of numerous research papers probing the reliability of the information it provides.
The latest version of the LLM, GPT-4, is said to dispense the most accurate and reliable medical responses to date. Researchers recently tested GPT-4's ability to translate educational information on 10 common IR procedures into layman’s terms that patients could more easily understand. Its answers were assessed by clinical and nonclinical individuals and then compared to currently available patient instructions on the same procedures.
GPT-4 provided responses for paracentesis, thoracentesis, port placement, CT-guided biopsy, dialysis catheter placement, ultrasound-guided biopsy, nephrostomy tube placement, biliary drains, thrombectomy and arterial embolization. Physicians rated nine of the responses as fully appropriate and one (arterial embolization) as somewhat appropriate.
The nonclinical participants’ assessments varied a bit more. Responses related to paracentesis, dialysis catheter placement, thrombectomy, ultrasound-guided biopsy and nephrostomy tube instructions were rated excellent by 57%, and good by 43%, while arterial embolization and biliary drain instructions were rated excellent by 28.6% and good by 71.4%.
The LLM’s thoracentesis, port placement and CT-guided biopsy instructions were the most difficult for nonclinicians to understand and were rated more poorly than the others.
Encouragingly, the readability of GPT-4's instructions was rated much higher than currently available patient education materials. GPT-4 provided responses at more appropriate reading levels compared to information provided by radiologyinfo.org, the group noted.
“Simplifying language to meet the patient's level of understanding can potentially contribute to improved health outcomes,” the group wrote. “In essence, the integration of LLMs in clinical environments stands poised as a possible strategy to enhance health outcomes within a dynamically evolving society, duly considering a myriad of factors that may impact a patient's comprehension of their health status.”
The group suggested that future work should focus on the ability of LLM’s to provide medical information using different languages and dialects.