Meta's new large language model excels at board-style radiology prompts

Meta Llama 3—an open-source large language model—may soon be giving other LLMs a run for their money in the medical field, according to new data published in the journal Radiology

The LLM recently performed on par with larger proprietary models, like GPT-4, on a set of board style radiology questions. Though proprietary models have shown great promise within radiology, they require data be sent outside of hospital settings, which raises privacy concerns. That, combined with their sometimes inconsistent performance after updates, limits users’ trust in the models’ reliability. 

Meta’s performance highlights the potential for open-source models in addressing some of these limitations, authors of the new paper suggest. 

“The development of open-source LLMs offers a solution that allows for local operation within hospitals, improving privacy and stability when the LLM system is not maintained and manually updated by staff,” corresponding author Lisa C. Adams, from the Department of Diagnostic and Interventional Radiology at Technical University Munich, and colleagues note. 

Proprietary models typically have outperformed open-source LLMs, but Meta AI’s latest model has 70 billion parameters, positioning it well to compete with other more established models. To test this theory, experts prompted Meta Llama 3 and several other LLMs—OpenAI’s GPT-4 Turbo and GPT-3.5 Turbo, Anthropic’s Claude 3 Opus and Google DeepMind’s Gemini Ultra—to answer 50 questions from the American College of Radiology’s 2022 in-training test, in addition to 85 new board-style questions not previously used to train other LLMs. 

The models’ performances varied widely. For the 50 ACR test questions, Llama 3 performed the best out of all the open-source models, with 74% accuracy. Its performance was in line with both GPT-4 Turbo and Claude 3 Opus, both of which achieved 78% accuracy. 

However, Llama 3 outperformed these models on the additional board-style questions, answering 68 out of 85 correctly. In comparison, GPT-3.5 achieved 61 correct answers. 

“This demonstrates the growing capabilities of open-source LLMs, which offer privacy, customization, and reliability comparable to that of their proprietary counterparts, but with far fewer parameters, potentially lowering operating costs when using optimization techniques such as quantization,” the group suggests, adding that planned expansions of Llama 3 in the near future are encouraging. 

“The growing maturity and competitiveness of open-source models make them promising candidates for future research and application in radiology.” 

The study abstract is available here

Hannah murhphy headshot

In addition to her background in journalism, Hannah also has patient-facing experience in clinical settings, having spent more than 12 years working as a registered rad tech. She began covering the medical imaging industry for Innovate Healthcare in 2021.

Around the web

CCTA is being utilized more and more for the diagnosis and management of suspected coronary artery disease. An international group of specialists shared their perspective on this ongoing trend.

The new technology shows early potential to make a significant impact on imaging workflows and patient care. 

Richard Heller III, MD, RSNA board member and senior VP of policy at Radiology Partners, offers an overview of policies in Congress that are directly impacting imaging.