ChatGPT 'mostly' accurate when answering questions on breast biopsies, with 1 major exception

Once again, researchers are looking at how reliable ChatGPT is at dishing out medical advice. A team from Johns Hopkins University conducted a new study that found the popular AI chatbot provided “mostly appropriate” responses to questions related to core-needle biopsy results. The findings are published in the American Journal of Roentgenology. [1]

The team led by Eniola Oluyemi, MD, from Johns Hopkins performed the study utilizing the latest consumer release of ChatGPT (version 3.5), and the accuracy rating was determined by three reviewers who are all capable of making a professional diagnosis. The researchers say, while the results are imperfect, they are accurate enough to signal that ChatGPT may be an adequate way for women to get pressing questions answered as they wait to speak with their healthcare provider.

ChatGPT was asked details about 14 different findings typically seen as a result of a core-needle breast biopsy—four were benign, three malignant, and seven were usually associated with a high risk of future cancer. The reviewers submitted ratings independently, measuring their agreement with the responses provided by the AI. Factors such as accuracy, consistency and the clinical significance of information were all considered. 

In general, the reviewers agreed with ChatGPT’s responses, with all three rating their agreement within a range of 88% to 96% for most answers, signaling that the AI is a fairly reliable source for understanding the pathological diagnosis of a breast biopsy. 

The outlier was ChatGPT’s answers related to recommendations for two lesions at high risk for cancer, where the chatbot failed to mention that surgical excision may be the best course of action. All three reviewers disagreed with the AI on that basis. 

For Oluyemi and her colleagues, this signals the need for patients to “subsequently crosscheck information received” from any large-language model or chatbot with their healthcare provider, especially when checking on responses “regarding management recommendations,” since ChatGPT may not even have access to the latest clinical care guidelines. 

The full study can be found at the link below. 

Chad Van Alstin Health Imaging Health Exec

Chad is an award-winning writer and editor with over 15 years of experience working in media. He has a decade-long professional background in healthcare, working as a writer and in public relations.

Around the web

GE HealthCare designed the new-look Revolution Vibe CT scanner to help hospitals and health systems embrace CCTA and improve overall efficiency.

Clinicians have been using HeartSee to diagnose and treat coronary artery disease since the technology first debuted back in 2018. These latest updates, set to roll out to existing users, are designed to improve diagnostic performance and user access.

The cardiac technologies clinicians use for CVD evaluations have changed significantly in recent years, according to a new analysis of CMS data. While some modalities are on the rise, others are being utilized much less than ever before.