ChatGPT will take your patient’s questions now—just not solo

What does “clinical correlation” mean in a radiology report? Ask ChatGPT.

Researchers have done just that, putting the model through its paces with the above query and 21 other patient-level questions about medical imaging.

Their conclusion: ChatGPT has potential for assisting with patient communications in radiology, but its accuracy is inconsistent enough to require human oversight.

What’s more, the digital tool tends to “speak” in language that may be inaccessible to patients who don’t read at high comprehension levels. And adding a prompt requesting easy-to-understand responses—or within recommended reading levels for patient education—helps but doesn’t fully solve this problem.

The study was led by radiologists at the University of Pittsburgh Medical Center and published in JACR Oct. 18 [1].

Lead author Emile Gordon, MD, senior author Alessandro Furlan, MD, and colleagues grouped their 22 test questions into six patient-friendly categories—procedure safety, the procedure itself, preparation before imaging, meaning of terms, medical staff and the radiology report.

They fed the questions to ChatGPT version 3.5 with and without a brief prompt: “Please provide an accurate and easy-to-understand response suitable for the average person.”

The team had four board-certified radiologists check the machine’s answers for not only accuracy but also consistency and relevance.

They also had two patient advocates weigh in on the likely helpfulness of the responses to patients, and they assessed readability using Flesch Kincaid Grade Level (FKGL) measures.

Assessing 264 ChatGPT answers to the 22 questions, the researchers found unprompted responses were accurate at a rate of 83% (218 of 264 answers).

Prompted responses did not significantly change this rate—87% (229 of 264).

However, the consistency of the responses increased from 72% to 86% when questions were augmented by prompts.  

In addition:

  • Nearly all responses—99% (261 of 264)—were at least partially relevant for both question types.
  • 80% of prompted responses vs. 67% of unprompted responses were considered fully relevant.
  • The average FKGL was too high for the average reader regardless of prompted (13.0) vs. unprompted (13.6). Further, none of the responses reached the eighth-grade readability recommended for patient-facing materials.

In their discussion the authors state these findings add to the evidence showing ChatGPT is a promising tool for automating time-consuming tasks in healthcare.

“Automating the development of patient health educational materials and providing on-demand access to medical questions holds great promise to improve patient access to health information,” Gordon et al. add.

More:

“While the accuracy, consistency and relevance of the ChatGPT responses to imaging-related questions are impressive for a generative pre-trained language model, they are imperfect; by clinical standards, the frequency of inaccurate statements that we observed precludes its use without careful human supervision or review.”

Dave Pearson

Dave P. has worked in journalism, marketing and public relations for more than 30 years, frequently concentrating on hospitals, healthcare technology and Catholic communications. He has also specialized in fundraising communications, ghostwriting for CEOs of local, national and global charities, nonprofits and foundations.

Trimed Popup
Trimed Popup