Latest version of ChatGPT has potential as a clinical decision support tool
The latest versions of OpenAI’s ChatGPT—GPT-3.5 and GPT-4—could have potential use as clinical support tools that triage patients for imaging services.
A new paper in the Journal of the American College of Radiology details how these large language models are able to refer patients for imaging exams based on various clinical needs. Experts recently put the LLMs to the test in scenarios when women present to discuss breast cancer screening or for concerns about breast pain to see what the infamous chatbot would recommend in terms of imaging. When compared to ACR Appropriateness Criteria for these indications, the LLMs performed well in both open-ended (OE) and select-all-that-apply (SATA) formats, prompting experts involved in the study to suggest that these tools will have clinical feasibility in the future—with additional updates and more training, of course.
“Integration of an AI-based tool into existing clinical workflows and systems could drastically improve efficiency, since such tools could take advantage of the wealth of information available from patient pretest odds, diagnostic likelihood ratios, and the medical records themselves,” corresponding author Marc D. Succi, MD, with the Department of Radiology at Massachusetts General Hospital in Boston, and colleagues suggested.
In the team's analysis, both GPT-3.5 and GPT-4 performed well with OE formats, achieving an average score of 1.83 out of 2 for the breast cancer prompts. Similarly, GPT-3.5 achieved a SATA average percentage correct of 88.9%, while GPT-4's average was 98.4%. Both LLMs saw a slight decrease in performance for breast pain prompts, but the newer model, GPT-4, continued to outperform GPT-3.5.
The authors concluded that their results support LLMs’ “promise for future use as an adjunct for radiologic decision-making at the point of care.” In the future, accuracy could be improved with prompts engineered in a hybrid format that offer both a list of options for ChatGPT to choose from and a request for ChatGPT to rationalize its choices, the group suggested.
The study abstract is available here.