New scoring systems help grade the accuracy of AI-generated radiology reports

Michael Walter | August 04, 2023 | Health Imaging | Enterprise Imaging

Artificial intelligence (AI) has been one of the biggest stories in healthcare for years, but many clinicians still remain unsure about how, exactly, they should be using AI to help their patients. A new analysis in European Heart Journal explored that exact issue, providing cardiology professionals with a step-by-step breakdown of how to get the most out of this potentially game-changing technology.

Artificial intelligence (AI) algorithms are being used more and more to generate radiology reports, but how can a radiologist trust that the results are accurate and complete?

An international team of researchers—including specialists from Harvard Medical School (HMS) and Stanford University—explored that very question, presenting its findings in Patterns.[1]

“Accurately evaluating AI systems is the critical first step toward generating radiology reports that are clinically useful and trustworthy,” senior author Pranav Rajpurkar, PhD, assistant professor of biomedical informatics with the Blavatnik Institute at HMS, said in a prepared statement.

Rajpurkar et al. started by evaluating different scoring metrics designed to grade AI-generated radiology reports. Automated scoring systems struggled at times, the group found, even missing key errors along the way.

As one might expect, a team of six human radiologists did much better when asked to grade the AI-generated reports—Rajpurkar and his team aimed to design a scoring systems that could deliver a performance comparable to that of radiologists. The group developed a new method for evaluating the algorithms that build radiology reports, RadGraph F1, and a new scoring system that combines multiple quality metrics into a single grade, RadCliQ. Both newly designed metrics, the authors wrote, “demonstrate stronger correlation with radiologists' evaluations” than previous scoring systems. They also described RadGraph F1 and RadCliQ as “meaningful metrics for guiding future research in radiology report generation.”

“Measuring progress is imperative for advancing AI in medicine to the next level,” co-first author Feiyang ‘Kathy’ Yu, a research associate in Rajpurkar’s lab, said in the same statement. “Our quantitative analysis moves us closer to AI that augments radiologists to provide better patient care.”

“By aligning better with radiologists, our new metrics will accelerate development of AI that integrates seamlessly into the clinical workflow to improve patient care,” Rajpurkar added.

Read the team’s analysis in Patterns here.

Big Tech player dives deep into medical imaging with ‘purpose-built’ service

Imaging AI adds value to patient data but also puts it at heightened risk

ImpressionGPT, a ChatGPT-based framework, can accurately summarize radiology reports

Natural language processing can limit report discrepancies between AI and radiologists

ChatGPT excels at radiology referrals in emergency departments

Michael Walter, Managing Editor

Michael has more than 18 years of experience as a professional writer and editor. He has written at length about cardiology, radiology, artificial intelligence and other key healthcare topics.

Around the web

Cardiovascular Business

GE HealthCare launches new cardiac CT scanner with advanced AI capabilities

GE HealthCare designed the new-look Revolution Vibe CT scanner to help hospitals and health systems embrace CCTA and improve overall efficiency.

Cardiovascular Business

Bracco updates HeartSee coronary flow capacity software with new diagnostic features

Clinicians have been using HeartSee to diagnose and treat coronary artery disease since the technology first debuted back in 2018. These latest updates, set to roll out to existing users, are designed to improve diagnostic performance and user access.

Cardiovascular Business

Key trends in diagnostic heart testing: CT on the rise as some traditional techniques fall out of favor

The cardiac technologies clinicians use for CVD evaluations have changed significantly in recent years, according to a new analysis of CMS data. While some modalities are on the rise, others are being utilized much less than ever before.

New scoring systems help grade the accuracy of AI-generated radiology reports

Related Content:

Big Tech player dives deep into medical imaging with ‘purpose-built’ service

Imaging AI adds value to patient data but also puts it at heightened risk

ImpressionGPT, a ChatGPT-based framework, can accurately summarize radiology reports

Natural language processing can limit report discrepancies between AI and radiologists

ChatGPT excels at radiology referrals in emergency departments

Related Content

Around the web