5 steps for evaluating radiology AI applications
Radiology artificial intelligence (AI) continues to gain traction in medical imaging, promising enhanced detection and improved workflow efficiency. However, selecting the right AI tool requires careful evaluation. Jason Poff, MD, a radiologist at Greensboro Radiology and the director of innovation deployment for AI at Radiology Partners, outlined a five-step AI model validation process designed to assess medical imaging tools effectively.
"If we want our radiologists to be excited about an AI tool and to engage with it, we ought to measure things that are relevant to them. And so the real purpose of this five-step process was to look at these AI models through the lens of a radiologist, because value is in their eyes. We focus on trying to generate some metrics that might predict whether they'll like an AI tool, whether they'll use it and whether they'll get the most value out of it so that patients benefit, so health systems actually get value out of these tools," Poff explained.
He said practices and hospitals that do not do this type of critical AI evaluation often find the radiologists do not use the products once they are implemented.
Poff spoke with Health Imaging at the Radiological Society of North America (RSNA) 2024.
The five-step AI model validation process
Poff’s approach focuses on ensuring that AI models provide meaningful value to radiologists while minimizing drawbacks. The five key steps in his evaluation process include:
1. Performance statistics: This step involves assessing the accuracy and efficiency of the AI tool based on standard statistical metrics, ensuring that it meets baseline expectations for clinical use.
2. AI-enhanced detection rate: A crucial metric, this determines whether the AI tool helps radiologists detect findings that they might have otherwise missed, thereby improving diagnostic performance.
3. Wow cases: These are instances where the AI tool significantly enhances diagnostic capabilities, providing clear and impactful examples of its effectiveness.
4. AI pitfalls: Understanding the limitations and potential weaknesses of the AI model helps radiologists navigate its shortcomings and avoid misdiagnoses.
5. Gain-to-pain ratio: The final step evaluates whether the benefits of the AI tool outweigh the challenges of its implementation, such as workflow disruptions or false positives.
Implementation AI in practice
Poff emphasized the importance of investing time and expertise in AI evaluation before deployment. At Radiology Partners, a team of data scientists, IT experts and physician leaders collaborate to conduct retrospective analyses of AI models. By simulating how AI tools would have performed in past cases, they predict their real-world impact on patient care and radiologist efficiency.
"Basically we do all the work upfront. We actually spend a lot of time and effort before we ever put these tools into production, before they ever touch a patient," Poff explained.
He said they perform what they call a "retrospective look back," where they look at patient cases from a few months earlier with the AI. They also imagine what might have happened if their radiologist had access to the new AI tool. He said these tests help gain a deeper understanding of how the AI works and validates its accuracy and its ability to help the radiologist.
"We like to measure what the potential upside of how much we could elevate their standard of care if they had had these AI tools," he explained.
Not all AI models make the cut
Despite FDA clearance being a prerequisite for consideration, not all AI models meet the threshold for deployment. Poff shared an example of a pneumothorax detection AI that, while capable of identifying collapsed lungs, failed to add value because radiologists were already identifying those cases independently. Another example is if an algorithm tends to be wrong more often than it is correct, that quickly builds distrust of the AI and radiologists will not use it.
"Ultimately, it comes down to a human radiologist sitting at the workstation saying, 'I feel like I'm getting value out of this tool.' If they don't feel that way, they're very quick to put it to the side and ignore it," Poff said.
If an AI enhances detection rates, that is a specific metric that might be able to elevate care for patients. However, if he does not see that, then it causes pause before they consider investing time and dollars into that. But, Poff said there are other ways you can get value from an AI tool.
"It can help appropriately bring some patients to the top of the worklist, especially these days when we've got a lot of patients needing care. It can be valuable for that triage benefit alone," Poff said.
Tailoring AI to specific practice needs
Poff highlighted that AI’s value varies based on the nature of a radiology practice. Hospitals with emergency departments may prioritize AI tools for stroke detection, while outpatient practices might find greater value in tools designed for chronic disease monitoring.
“There’s no one-size-fits-all AI solution,” he said. “Even a basic evaluation can help determine whether an AI model aligns with the specific needs of a radiology group.”
For smaller radiology practices lacking dedicated AI evaluation teams, Poff recommends appointing an AI champion—someone responsible for understanding AI tools and their potential impact. This may require dedicating time for that person to become an AI expert and do the research on AI that is relevant to that practice. He also suggests leveraging commercial consulting firms that specialize in AI assessment.