ChatGPT 'mostly' accurate when answering questions on breast biopsies, with 1 major exception

Once again, researchers are looking at how reliable ChatGPT is at dishing out medical advice. A team from Johns Hopkins University conducted a new study that found the popular AI chatbot provided “mostly appropriate” responses to questions related to core-needle biopsy results. The findings are published in the American Journal of Roentgenology. [1]

The team led by Eniola Oluyemi, MD, from Johns Hopkins performed the study utilizing the latest consumer release of ChatGPT (version 3.5), and the accuracy rating was determined by three reviewers who are all capable of making a professional diagnosis. The researchers say, while the results are imperfect, they are accurate enough to signal that ChatGPT may be an adequate way for women to get pressing questions answered as they wait to speak with their healthcare provider.

ChatGPT was asked details about 14 different findings typically seen as a result of a core-needle breast biopsy—four were benign, three malignant, and seven were usually associated with a high risk of future cancer. The reviewers submitted ratings independently, measuring their agreement with the responses provided by the AI. Factors such as accuracy, consistency and the clinical significance of information were all considered. 

In general, the reviewers agreed with ChatGPT’s responses, with all three rating their agreement within a range of 88% to 96% for most answers, signaling that the AI is a fairly reliable source for understanding the pathological diagnosis of a breast biopsy. 

The outlier was ChatGPT’s answers related to recommendations for two lesions at high risk for cancer, where the chatbot failed to mention that surgical excision may be the best course of action. All three reviewers disagreed with the AI on that basis. 

For Oluyemi and her colleagues, this signals the need for patients to “subsequently crosscheck information received” from any large-language model or chatbot with their healthcare provider, especially when checking on responses “regarding management recommendations,” since ChatGPT may not even have access to the latest clinical care guidelines. 

The full study can be found at the link below. 

Chad Van Alstin Health Imaging Health Exec

Chad is an award-winning writer and editor with over 15 years of experience working in media. He has a decade-long professional background in healthcare, working as a writer and in public relations.

Trimed Popup
Trimed Popup