ChatGPT's medical writing is getting so good that it may soon fool AI detectors
Large language model ChatGPT has polished its medical writing skills so well that the papers it authors may soon be able to fool the AI-enabled software intended to detect manuscripts not written by humans.
OpenAI's star LLM has authored numerous medical papers since its rollout nearly two years ago. Although its earliest work showed great promise, experts were quick to call out ChatGPT’s shortcomings. Among them were fictitious references, inaccurate data and “AI hallucinations” deemed misleading and, at times, potentially harmful to patients who might have taken the bot’s medical advice as truth.
But it appears as though the LLM has been fine-tuning its writing chops. A new analysis in the Journal of Nuclear Medicine indicates that the model’s medical manuscripts are so well constructed that it can be difficult to distinguish them from those compiled by humans.
“The level of evidence is consistently increasing; the ability of chatbots to mimic humans is impressive, even in the scientific and medical domains,” wrote co-authors Irène Buvat, with the Laboratory of Translational Imaging in Oncology in France, and Wolfgang A. Weber, from the Technical University of Munich in Germany.
It might also be a bit concerning, the group cautioned.
The duo recently challenged two versions of the LLM—GPT 3.5 and GPT-4—to write, review and revise a medical paper on the threats of having chatbots compile, edit and review scientific works. GPT 3.5 was prompted to write the paper and make it worthy of scientific publication, adhering to all related ethical and legal considerations.
GPT-4 was then prompted to review the paper, highlighting its strengths and weaknesses and providing suggestions to the author of the original text. The next set of prompts challenged GPT-4 to reply to the “reviewer” and revise the article to address the editor’s comments.
Weber and Duvat described GPT’s work, which was compiled in just a matter of seconds, as “comprehensive and synthetic.” Though it was thorough, it did not provide citations for the “facts” it detailed, and the writing was somewhat “bland and robotic.”
The bot’s review and response, however, was much more impressive.
“The review of the manuscript written by the chatbot is surprisingly relevant and mimics very well a human-written manuscript review,” the team noted. “The fact that it is the work of a chatbot could almost go unnoticed.”
Its responses to the review and subsequent updates to the manuscript were “even more amazing.” So impressive, in fact, that the group suggested that the work may even be able to evade not just humans, but AI content detector tools as well.
“This almost questions the relevance of publishing review papers that can be produced almost instantly by a chatbot, with which it is even possible to engage in conversation,” the authors wrote.
ChatGPT is improving at rapid pace, but it still commits some of the same faults it did in its early days, such as not including concrete references, AI hallucinations and writing generally bland work, the group noted. They suggested that it could serve as a reliable “junior ghost writer,” but will still require substantial oversight in any scientific written works.