Developing board-style radiology questions is resource intensive. Large language models could help

Hannah Murphy | July 17, 2024 | Health Imaging | Artificial Intelligence

Radiology educators could soon enjoy the assistive benefits of large language models like ChatGPT, as new research highlights the promise of LLMs to create educational materials and board-style questions.

Published in Academic Radiology, the data details LLMs' great competence in drafting multiple choice questions, answers and rationales. Crafting these materials is typically left up to radiologists who draw from their own educational and clinical experiences, but the process is time consuming and can incur significant costs, authors of the new paper noted.

“A robust item bank for a 40-item computerized exam, administered twice a year, with 25 different forms for each administration and a maximum reuse rate of five for any test question over five years, requires at least 2,000 items,” corresponding author Scott J. Adams MD, PhD, with the department of medical imaging at Royal University Hospital in Canada, and co-authors explained. “The costs associated with developing such exam banks by physicians are substantial, ranging from $1,500 to $2,500 for a single test item. Consequently, the projection stands between $3,000,000 to $5,000,000 to develop an item bank for a single computer adaptive test.”

The substantial resources needed to compile these materials highlights a need to explore alternatives. One such alternative could be found within the realm of LLMs.

To determine the potential for LLMs to create radiology education materials, the team put two models—GPT-4 and Llama 2—to the task of developing 104 multiple-choice questions based on the American Board of Radiology exam blueprint. The queries produced were assessed by two board certified radiologists, who analyzed for clarity, relevance, suitability for a board exam based on difficulty, quality of distractors, and adequacy of rationale. They were then compared to ACR in-training exams.

Both models performed well, but GPT-4 achieved higher scores for all criteria than Llama 2. In fact, GPT-4's scores were neck and neck with the ACR DXIT questions, according to the blinded readers. GPT-4 also achieved 100% accuracy with its questions, compared to 69% for Llama 2.

“These findings suggest that GPT-4 holds promise as a valuable tool for enhancing exam preparation materials for radiology residents and expanding question banks for radiology board examinations,” the group wrote. “Further, results from this study suggest that the accessibility and scalability of LLM-generated questions hold the potential to address the perennial issue of limited resources for radiology education.”

Although the findings indicate promise for LLMs to enhance radiology education, the authors caution that they also highlight variability in model performance. The group suggested this should be considered and consistently reevaluated to effectively deploy LLMs in education.

The study abstract can be found here.

Skepticism and optimism: Radiologists are still divided on AI integration

Some communication platforms bring more disruptions than improvements to workflows

Large language models make radiology reports more patient friendly

Hannah Murphy

In addition to her background in journalism, Hannah also has patient-facing experience in clinical settings, having spent more than 12 years working as a registered rad tech. She began covering the medical imaging industry for Innovate Healthcare in 2021.

Around the web

Cardiovascular Business

GE HealthCare launches new cardiac CT scanner with advanced AI capabilities

GE HealthCare designed the new-look Revolution Vibe CT scanner to help hospitals and health systems embrace CCTA and improve overall efficiency.

Cardiovascular Business

Bracco updates HeartSee coronary flow capacity software with new diagnostic features

Clinicians have been using HeartSee to diagnose and treat coronary artery disease since the technology first debuted back in 2018. These latest updates, set to roll out to existing users, are designed to improve diagnostic performance and user access.

Cardiovascular Business

Key trends in diagnostic heart testing: CT on the rise as some traditional techniques fall out of favor

The cardiac technologies clinicians use for CVD evaluations have changed significantly in recent years, according to a new analysis of CMS data. While some modalities are on the rise, others are being utilized much less than ever before.

Developing board-style radiology questions is resource intensive. Large language models could help

Related:

Skepticism and optimism: Radiologists are still divided on AI integration

Some communication platforms bring more disruptions than improvements to workflows

Large language models make radiology reports more patient friendly

Related Content

Around the web