Why practices might want to think twice before using ChatGPT to create patient education materials

Hannah Murphy | June 05, 2024 | Health Imaging | Patient Care

Practices considering utilizing ChatGPT to create educational pamphlets for patients might want to think twice before using the large language model to cut corners.

According to a new analysis in Academic Radiology, ChatGPT-generated educational materials intended to inform patients on various interventional radiology procedures are inaccurate at best, and misleading at worst.

“Despite the remarkable capabilities of ChatGPT and similar LLMs, they are not without their challenges, particularly in the realm of healthcare. The potential for disseminating inaccurate information and the occurrence of 'hallucinations'—responses that are generated without grounding in factual data—are significant concerns,” Arash Bedayat, MD, with the department of radiological sciences at the David Geffen School of Medicine, UCLA, and co-authors caution. “These hallucinations, as they are termed, refer to instances where the model produces information that is entirely fabricated or not verifiable against the training data.”

For the study, experts tested the large language model’s knowledge of interventional radiology procedures by having five users (three radiologists and two radiologists-in-training) prompt it to create educational pamphlets on 20 common IR procedures using identical commands. Two independent radiologists then assessed the materials for accuracy, quality and consistency.

A vast, but misconstrued vocabulary

ChatGPT consistently referenced appropriate medical terminology, but the materials it produced using said terminology were rampant with inaccuracies related to the procedures. Experts observed issues in 30% of the pamphlets generated by the large language model.

The most common inaccuracies were related to information on potential procedural complications and whether sedation was required.

“The omission of sedation information can result in uninformed consent, where patients are not fully aware of the procedure's experience or the risks involved," the group cautions. "The absence of pre-procedural preparation details could lead to procedural delays or increased risks during the procedure.”

A line-by-line comparison also revealed inconsistencies between the materials, with their structure and formatting having significant variations, despite different users prompting ChatGPT with the exact same commands.

“One of the major obstacles in adopting ChatGPT in healthcare is the need for up-to-date and current medical data,” the authors note. “This underscores the importance of ongoing human supervision and expert validation in utilizing large language models for medical educational purposes.”

Future studies should address how to fine tune the data large language models like ChatGPT utilize, the authors suggest. Some studies have begun to test the utility of plug-ins containing data specific to a certain topic (radiology appropriateness criteria, for example) for training large language models to provide accurate health-related information.

The study abstract can be found here.

GPT-4 now has vision—can it actually read chest X-rays?

New research offers reminder of why ChatGPT should not be used for second opinions

GPT-4 confidently struggles on radiology exam

Hannah Murphy

In addition to her background in journalism, Hannah also has patient-facing experience in clinical settings, having spent more than 12 years working as a registered rad tech. She began covering the medical imaging industry for Innovate Healthcare in 2021.

Around the web

Cardiovascular Business

GE HealthCare launches new cardiac CT scanner with advanced AI capabilities

GE HealthCare designed the new-look Revolution Vibe CT scanner to help hospitals and health systems embrace CCTA and improve overall efficiency.

Cardiovascular Business

Bracco updates HeartSee coronary flow capacity software with new diagnostic features

Clinicians have been using HeartSee to diagnose and treat coronary artery disease since the technology first debuted back in 2018. These latest updates, set to roll out to existing users, are designed to improve diagnostic performance and user access.

Cardiovascular Business

Key trends in diagnostic heart testing: CT on the rise as some traditional techniques fall out of favor

The cardiac technologies clinicians use for CVD evaluations have changed significantly in recent years, according to a new analysis of CMS data. While some modalities are on the rise, others are being utilized much less than ever before.

Why practices might want to think twice before using ChatGPT to create patient education materials

A vast, but misconstrued vocabulary

GPT-4 now has vision—can it actually read chest X-rays?

New research offers reminder of why ChatGPT should not be used for second opinions

GPT-4 confidently struggles on radiology exam

Related Content

Around the web