Providing chatbots with guideline context significantly improves their imaging recs
With proper prompts and appropriate context, chatbots can play a valuable role in imaging referrals.
Using specialized information on the American College of Radiology’s imaging guidelines, experts were recently able to fine-tune OpenAI’s GPT-4 so that it provided exam recommendations in line with those delivered by human medical professionals. This bigstep toward greater reliability was achieved, in part, through zero-shot learning, authors of a new paper describing the chatbot’s enhancements explained.
“Zero-shot learning refers to the ability of a model to make accurate predictions on tasks it has not been explicitly trained on by leveraging generalized knowledge and integrated textual information during prompting. This approach is also known as retrieval-augmented generation,” corresponding author Alexander Rau, with the Department of Diagnostic and Interventional Radiology at the University of Freiburg in Germany, and co-authors noted. “This appropriateness criteria context-aware GPT (accGPT) surpassed generic chatbots and general radiologists in applying the ACR appropriateness criteria to clinical referral notes.”
For the study, the team refined GPT-3.5-Turbo by incorporating specialized knowledge of the ACR guidelines. Researchers then upgraded the chatbot to GPT-4 and developed an enhanced prompting strategy to test the LLM’s ability to apply ACR appropriateness guidelines to prompts related to imaging referrals based on clinical notes.
The context-aware chatbot outperformed the generic versions of GPT-3.5-Turbo and GPT-4 in providing “usually or may be appropriate” recommendations based on ACR’s guidelines. Its performance was also superior to that of human radiologists, the group noted.
What's more, it surpassed GPT-3.5-Turbo and general radiologists in providing “usually appropriate” recommendations, and appropriately identified cases when no imaging was needed—something that requires “a profound understanding of clinical contexts and guidelines,” and that was a difficult task for the other chatbots.
Its recommendations were consistently accurate, indicating potential for future use in imaging referral guidance, the group suggested.
“Higher consistency is crucial for clinical decision-making as it ensures reliability and reduces variability in diagnostic recommendations. Future research might investigate the performance of other LLM in this regard, as initial results in radiological decision support are promising,” the authors wrote.
The group suggested that, alongside the links and references the chatbot provided with its answers, the contextual adjustments could improve trust in its outputs. This also could allow for more individualized diagnostic workups and greater reliability in chatbots' recommendations, the team indicated.
Learn more about the research here.