Meta's new large language model excels at board-style radiology prompts
Meta Llama 3—an open-source large language model—may soon be giving other LLMs a run for their money in the medical field, according to new data published in the journal Radiology.
The LLM recently performed on par with larger proprietary models, like GPT-4, on a set of board style radiology questions. Though proprietary models have shown great promise within radiology, they require data be sent outside of hospital settings, which raises privacy concerns. That, combined with their sometimes inconsistent performance after updates, limits users’ trust in the models’ reliability.
Meta’s performance highlights the potential for open-source models in addressing some of these limitations, authors of the new paper suggest.
“The development of open-source LLMs offers a solution that allows for local operation within hospitals, improving privacy and stability when the LLM system is not maintained and manually updated by staff,” corresponding author Lisa C. Adams, from the Department of Diagnostic and Interventional Radiology at Technical University Munich, and colleagues note.
Proprietary models typically have outperformed open-source LLMs, but Meta AI’s latest model has 70 billion parameters, positioning it well to compete with other more established models. To test this theory, experts prompted Meta Llama 3 and several other LLMs—OpenAI’s GPT-4 Turbo and GPT-3.5 Turbo, Anthropic’s Claude 3 Opus and Google DeepMind’s Gemini Ultra—to answer 50 questions from the American College of Radiology’s 2022 in-training test, in addition to 85 new board-style questions not previously used to train other LLMs.
The models’ performances varied widely. For the 50 ACR test questions, Llama 3 performed the best out of all the open-source models, with 74% accuracy. Its performance was in line with both GPT-4 Turbo and Claude 3 Opus, both of which achieved 78% accuracy.
However, Llama 3 outperformed these models on the additional board-style questions, answering 68 out of 85 correctly. In comparison, GPT-3.5 achieved 61 correct answers.
“This demonstrates the growing capabilities of open-source LLMs, which offer privacy, customization, and reliability comparable to that of their proprietary counterparts, but with far fewer parameters, potentially lowering operating costs when using optimization techniques such as quantization,” the group suggests, adding that planned expansions of Llama 3 in the near future are encouraging.
“The growing maturity and competitiveness of open-source models make them promising candidates for future research and application in radiology.”
The study abstract is available here.