Society of Interventional Radiology bests ChatGPT at informing patients—but contest reveals shortcomings on both sides
As a source of patient information, human-authored SIRweb.org beats ChatGPT on readability and, in a word, helpfulness. However, the website needs work on those scores too.
Harvard researchers at Beth Israel Deaconess Medical Center found as much when they organized the website’s content into questions and posed them to ChatGPT.
Analyzing the outputs for word and sentence count, accuracy, readability (per several established scales) and suitability for patient education (per the HHS’s PEMAT instrument), the team found ChatGPT produced materials that ran longer, contained more difficult words and posed a greater reading challenge than competing content on SIRweb.org.
Also hurting the AI’s performance were a lack of visual aids and a rate of incorrect or incomplete content of 11.5% (12 of 104 answers).
At the same time, the SIR website tied ChatGPT on accessibility: Both were written at a grade level higher than the guideline-recommended fifth or sixth grade.
Corresponding study author Colin McCarthy, MD, and colleagues had their work published June 15 in the Journal of Vascular and Interventional Radiology [1].
In total, the researchers analyzed more than 21,000 words. These included almost 8,000 from the website and more than 13,000 generated by ChatGPT (across 22 text passages).
Four of five readability scales judged ChatGPT harder to read than SIRweb.org, and PEMAT scored the former lower than the latter.
In their discussion, McCarthy and co-authors remark that, as patients and their caregivers continue turning to digital outlets offering health information, the medical community “should recognize that their first stop may not be a societal website or other sources of ground truth. As was seen in this study, such sources may themselves benefit from improvements, specifically to ensure the content is understandable by the majority of readers, ensuring equitable access to healthcare information.”
The authors note the likelihood that additional tests of generative AI’s utility for patient education will soon follow.
That’s a good thing, given the speed at which large-language AI has evolved and captured the public’s imagination since ChatGPT’s introduction last fall.
“We propose that the use of existing, validated instruments such as those outlined herein may serve as a framework for future research in this field,” McCarthy and co-authors write.
More:
For now, the early indicators suggest that this technology may have potential use cases for both physicians and patients alike. However, current versions of [ChatGPT] may produce incomplete or inaccurate patient educational content, and therefore opportunities may exist to develop customized chatbots for patient education, based on finetuning existing large language models.”
Abstract here, full text behind paywall.