ChatGPT's medical writing is getting so good that it may soon fool AI detectors

Hannah Murphy | August 22, 2024 | Health Imaging | Artificial Intelligence

nonclinical augmented intelligence american medical association

National Cancer Institute photo via Unsplash

Large language model ChatGPT has polished its medical writing skills so well that the papers it authors may soon be able to fool the AI-enabled software intended to detect manuscripts not written by humans.

OpenAI's star LLM has authored numerous medical papers since its rollout nearly two years ago. Although its earliest work showed great promise, experts were quick to call out ChatGPT’s shortcomings. Among them were fictitious references, inaccurate data and “AI hallucinations” deemed misleading and, at times, potentially harmful to patients who might have taken the bot’s medical advice as truth.

But it appears as though the LLM has been fine-tuning its writing chops. A new analysis in the Journal of Nuclear Medicine indicates that the model’s medical manuscripts are so well constructed that it can be difficult to distinguish them from those compiled by humans.

“The level of evidence is consistently increasing; the ability of chatbots to mimic humans is impressive, even in the scientific and medical domains,” wrote co-authors Irène Buvat, with the Laboratory of Translational Imaging in Oncology in France, and Wolfgang A. Weber, from the Technical University of Munich in Germany.

It might also be a bit concerning, the group cautioned.

The duo recently challenged two versions of the LLM—GPT 3.5 and GPT-4—to write, review and revise a medical paper on the threats of having chatbots compile, edit and review scientific works. GPT 3.5 was prompted to write the paper and make it worthy of scientific publication, adhering to all related ethical and legal considerations.

GPT-4 was then prompted to review the paper, highlighting its strengths and weaknesses and providing suggestions to the author of the original text. The next set of prompts challenged GPT-4 to reply to the “reviewer” and revise the article to address the editor’s comments.

Weber and Duvat described GPT’s work, which was compiled in just a matter of seconds, as “comprehensive and synthetic.” Though it was thorough, it did not provide citations for the “facts” it detailed, and the writing was somewhat “bland and robotic.”

The bot’s review and response, however, was much more impressive.

“The review of the manuscript written by the chatbot is surprisingly relevant and mimics very well a human-written manuscript review,” the team noted. “The fact that it is the work of a chatbot could almost go unnoticed.”

Its responses to the review and subsequent updates to the manuscript were “even more amazing.” So impressive, in fact, that the group suggested that the work may even be able to evade not just humans, but AI content detector tools as well.

“This almost questions the relevance of publishing review papers that can be produced almost instantly by a chatbot, with which it is even possible to engage in conversation,” the authors wrote.

ChatGPT is improving at rapid pace, but it still commits some of the same faults it did in its early days, such as not including concrete references, AI hallucinations and writing generally bland work, the group noted. They suggested that it could serve as a reliable “junior ghost writer,” but will still require substantial oversight in any scientific written works.

Meta's new large language model excels at board-style radiology prompts

Developing board-style radiology questions is resource intensive. Large language models could help

‘Publication pressure’ evident albeit not overwhelming in medical imaging

Hannah Murphy

In addition to her background in journalism, Hannah also has patient-facing experience in clinical settings, having spent more than 12 years working as a registered rad tech. She began covering the medical imaging industry for Innovate Healthcare in 2021.

Around the web

Radiology Business

The impact of Trump tariffs on iodine contrast media costs

GE HealthCare said the price of iodine contrast increased by more than 200% between 2017 to 2023. Will new Chinese tariffs drive costs even higher?

Cardiovascular Business

COVID-19 linked to accelerated plaque growth, long-term risk of heart attack or stroke

These risks appear to be present regardless of a person's age or health at the time of infection.

Radiology Business

Top performing PACS companies based on user feedback

Agfa and Sectra both performed well with end-user satisfaction scores in the 2025 Best in KLAS list of radiology IT systems.

ChatGPT's medical writing is getting so good that it may soon fool AI detectors

Meta's new large language model excels at board-style radiology prompts

Developing board-style radiology questions is resource intensive. Large language models could help

‘Publication pressure’ evident albeit not overwhelming in medical imaging

Related Content

Around the web