ChatGPT both passes and fails at translating free-text into structured reports

Hannah Murphy | April 17, 2024 | Health Imaging | Enterprise Imaging

chatgpt for patient questions about radiology

Photo by Matheus Bertelli via Pexels

Various versions of OpenAI’s large language model ChatGPT continue to inch their way closer to clinical meaningfulness in radiology, now demonstrating their potential in generating structured reports from free-text dictations.

A new paper in the European Journal of Radiology details the utility of ChatGPT’s latest models—GPT-3.5 and GPT-4—for translating free-text thyroid ultrasound reports into structured reports based on ACR-TIRADS guidelines. Although the results varied between the two models, with each outperforming the other in different aspects of reporting, the authors pose that the clinical potential of similar large language models remains, especially as it pertains to generating structured reports.

“The importance of structured radiology reports has been fully recognized, as they facilitate efficient data extraction and promote collaboration among healthcare professionals,” corresponding author JianQiao Zhou, with the Department of Ultrasound at Ruijin Hospital in Shanghai, China, and co-authors explained. “Despite numerous potential benefits and technical feasibility, the adoption of structured reporting in some countries has remained lukewarm to date. This could be attributed to the presence of many challenges in the process of structuring free-text.”

For the research, the authors compiled 136 free-text thyroid ultrasound reports from 136 patients. In total, 184 nodules were listed in the original reports. The team tasked GPT-3.5 and GPT-4 with generating structured reports from the original versions based on ACR’s TI-RADS guidelines, and had two radiologists review the reports for quality, nodule categorization accuracy and management recommendations.

GPT-3.5 outperformed GPT-4 when creating satisfactory structured reports, generating 202 compared to 69. However, GPT-4 achieved “superior accuracy” and significantly outperformed GPT-3.5 in categorizing thyroid nodules and providing more detailed management recommendations—two integral aspects of TIRADS.

The authors suggested that the differing performances could be owed to the training parameters used to develop each model. GPT-4's parameters are more complex, while GPT-3.5 has fewer specifications, potentially making it more adaptive.

In order to achieve clinical utility, both versions would need to be retrained, but the authors maintain that improving the large language models could pave the way for their use as a supportive tool to radiologists.

GPT-4 spots radiology report errors in less time, at a lower cost

ChatGPT a ‘substantial first step’ toward AI-drafted radiologist reports

ChatGPT offers 'pretty amazing' recommendations on breast cancer screening, but oversight remains critical

More experts weigh in on the use of ChatGPT in radiology: Ethical use is ‘imperative’

Hannah Murphy

In addition to her background in journalism, Hannah also has patient-facing experience in clinical settings, having spent more than 12 years working as a registered rad tech. She began covering the medical imaging industry for Innovate Healthcare in 2021.

Around the web

Cardiovascular Business

GE HealthCare launches new cardiac CT scanner with advanced AI capabilities

GE HealthCare designed the new-look Revolution Vibe CT scanner to help hospitals and health systems embrace CCTA and improve overall efficiency.

Cardiovascular Business

Bracco updates HeartSee coronary flow capacity software with new diagnostic features

Clinicians have been using HeartSee to diagnose and treat coronary artery disease since the technology first debuted back in 2018. These latest updates, set to roll out to existing users, are designed to improve diagnostic performance and user access.

Cardiovascular Business

Key trends in diagnostic heart testing: CT on the rise as some traditional techniques fall out of favor

The cardiac technologies clinicians use for CVD evaluations have changed significantly in recent years, according to a new analysis of CMS data. While some modalities are on the rise, others are being utilized much less than ever before.

ChatGPT both passes and fails at translating free-text into structured reports

More on ChatGPT and AI:

GPT-4 spots radiology report errors in less time, at a lower cost

ChatGPT a ‘substantial first step’ toward AI-drafted radiologist reports

ChatGPT offers 'pretty amazing' recommendations on breast cancer screening, but oversight remains critical

More experts weigh in on the use of ChatGPT in radiology: Ethical use is ‘imperative’

Related Content

Around the web