ChatGPT's medical writing is getting so good that it may soon fool AI detectors

Hannah Murphy | August 22, 2024 | Health Imaging | Artificial Intelligence

nonclinical augmented intelligence american medical association

National Cancer Institute photo via Unsplash

Large language model ChatGPT has polished its medical writing skills so well that the papers it authors may soon be able to fool the AI-enabled software intended to detect manuscripts not written by humans.

OpenAI's star LLM has authored numerous medical papers since its rollout nearly two years ago. Although its earliest work showed great promise, experts were quick to call out ChatGPT’s shortcomings. Among them were fictitious references, inaccurate data and “AI hallucinations” deemed misleading and, at times, potentially harmful to patients who might have taken the bot’s medical advice as truth.

But it appears as though the LLM has been fine-tuning its writing chops. A new analysis in the Journal of Nuclear Medicine indicates that the model’s medical manuscripts are so well constructed that it can be difficult to distinguish them from those compiled by humans.

“The level of evidence is consistently increasing; the ability of chatbots to mimic humans is impressive, even in the scientific and medical domains,” wrote co-authors Irène Buvat, with the Laboratory of Translational Imaging in Oncology in France, and Wolfgang A. Weber, from the Technical University of Munich in Germany.

It might also be a bit concerning, the group cautioned.

The duo recently challenged two versions of the LLM—GPT 3.5 and GPT-4—to write, review and revise a medical paper on the threats of having chatbots compile, edit and review scientific works. GPT 3.5 was prompted to write the paper and make it worthy of scientific publication, adhering to all related ethical and legal considerations.

GPT-4 was then prompted to review the paper, highlighting its strengths and weaknesses and providing suggestions to the author of the original text. The next set of prompts challenged GPT-4 to reply to the “reviewer” and revise the article to address the editor’s comments.

Weber and Duvat described GPT’s work, which was compiled in just a matter of seconds, as “comprehensive and synthetic.” Though it was thorough, it did not provide citations for the “facts” it detailed, and the writing was somewhat “bland and robotic.”

The bot’s review and response, however, was much more impressive.

“The review of the manuscript written by the chatbot is surprisingly relevant and mimics very well a human-written manuscript review,” the team noted. “The fact that it is the work of a chatbot could almost go unnoticed.”

Its responses to the review and subsequent updates to the manuscript were “even more amazing.” So impressive, in fact, that the group suggested that the work may even be able to evade not just humans, but AI content detector tools as well.

“This almost questions the relevance of publishing review papers that can be produced almost instantly by a chatbot, with which it is even possible to engage in conversation,” the authors wrote.

ChatGPT is improving at rapid pace, but it still commits some of the same faults it did in its early days, such as not including concrete references, AI hallucinations and writing generally bland work, the group noted. They suggested that it could serve as a reliable “junior ghost writer,” but will still require substantial oversight in any scientific written works.

Meta's new large language model excels at board-style radiology prompts

Developing board-style radiology questions is resource intensive. Large language models could help

‘Publication pressure’ evident albeit not overwhelming in medical imaging

Hannah Murphy

In addition to her background in journalism, Hannah also has patient-facing experience in clinical settings, having spent more than 12 years working as a registered rad tech. She began covering the medical imaging industry for Innovate Healthcare in 2021.

Around the web

Cardiovascular Business

GE HealthCare launches new cardiac CT scanner with advanced AI capabilities

GE HealthCare designed the new-look Revolution Vibe CT scanner to help hospitals and health systems embrace CCTA and improve overall efficiency.

Cardiovascular Business

Bracco updates HeartSee coronary flow capacity software with new diagnostic features

Clinicians have been using HeartSee to diagnose and treat coronary artery disease since the technology first debuted back in 2018. These latest updates, set to roll out to existing users, are designed to improve diagnostic performance and user access.

Cardiovascular Business

Key trends in diagnostic heart testing: CT on the rise as some traditional techniques fall out of favor

The cardiac technologies clinicians use for CVD evaluations have changed significantly in recent years, according to a new analysis of CMS data. While some modalities are on the rise, others are being utilized much less than ever before.

ChatGPT's medical writing is getting so good that it may soon fool AI detectors

Meta's new large language model excels at board-style radiology prompts

Developing board-style radiology questions is resource intensive. Large language models could help

‘Publication pressure’ evident albeit not overwhelming in medical imaging

Related Content

Around the web