Concerns raised over how hospitals can validate radiology AI algorithms

As artificial intelligence (AI) adoption expands in radiology, there is growing concern that AI algorithms needs to undergo quality assurance (QA) reviews just like imaging systems to ensure they are performing as intended and the data they produce is accurate. The American College of Radiology (ACR) has established the Assess-AI Registry and AI-Lab in an effort help with validating and tracking AI QA for FDA-cleared algorithms.

Today, there are more than 340 AI algorithms cleared by the U.S. Food and Drug Administration (FDA), and a large number of these are for medical imaging. 

"Radiologists want to know AI will be safe and effective for their patients. In other words, they want to make sure it's trust worthy," explained Bibb Allen, MD, FACR, chief medical officer of the ACR Data Science Institute and former ACR president. 

According to Allen, there are lingering concerns that medical imaging AI algorithms may have been validated on datasets that were too small, limited by single center evaluations using specific scanners and settings or with limited diversity in the datasets. Allen said his hospital in Alabama has a much different patient population with a very high rate of obesity and metabolic issues as compared to patients from a hospital in Utah or the West Coast. 

"It's not surprising that an algorithm performs well in a test environment," Allen explained.

He said AI vendors will tell you their products work better than average quality benchmarks based on their testing.

"Well that's great, but come and see what will work at our practice in Alabama where we have rampant hypertension and obesity," Allen said. 

He explained there are not standard industry pathways to validate AI to offer assurances that it will work and be trustworthy in all settings. There are variables in patient populations, between males and females, by region, ethnicity and socio-economic levels. 

"We are recommending to the end users that they evaluate the models using their own patient data and by using an enriched dataset," Allen explained. "We want them to put in the hard cases so they can say 'OK, the AI found this one, and this was a case that was hard for me.' Equally important is the real-world monitoring."

He said data drift, changes in imaging equipment used, settings on the scanners and software updates might have an impact on the AI software. So, like the scanners themselves, or even the PACS monitors used to read studies, QA needs to be performed to guarantee the AI is working properly. This is especially important when it is relied on for critical findings, computer-aided detection (CAD), when modifying images for things like bone removal, or performing complex qualification that is used to determine patient treatments or next steps. 

"The AI is always going to work best the day you plug it in, and then there is going to be some performance degradation over time. So keeping track of that and understanding why it breaks is important," Allen said. 

For example, if you have a portable X-ray system for chest X-rays with a built-in AI pneumothorax detector, it is possible that over time it might not work as well as it used to. 

"If you have the data on the AI model's performance in a registry where you track each portable X-ray system, you might recognize that this is not the same X-ray unit as the one you did a software upgrade, and now the model is not performing as well as it used to," Allen said. "The registry can help you figure out what you changed and what you need to do to fix it."

He said AI developers also could use these insights to build better algorithms and understand what technical factors may impact the performance of their AI.

These are the goals of the ACR Assess-AI Registry, Allen explained. 

Like the medical imaging radiation dose index registry and others created by the ACR, he said these are tools to help set baselines and monitor quality over time. And not just for one site or health system, but for all of radiology, because the pooled data will help identify problems and solutions over time. 

ACR offers radiology AI test platform to evaluate algorithms

The ACR Data Institute has created a searchable online catalog with information on all the FDA-cleared AI algorithms for radiology. Allen said it is too difficult to test all the AI models that are out there, so this can help narrow the list. 

The ACR Data Science Institute also has a website called AI-Lab, which allows people test an algorithm on premises or in a secure cloud using their own dataset. Allen said users also can compare one algorithm to another to test their performance. 

"Because the models can be brittle and they may not work as well outside of where they were trained, you will want to be able to do an evaluation," Allen said. "AI-Lab allows people to query, retrieve and create a test set and then actually run a model and test the performance. You also can compare one algorithm to another."

Integration into workflow is key to AI adoption

"Once an AI algorithm is adopted, it needs to be integrated into the workflow," Allen said. "If you have to go to another computer to use a fracture detection AI, nobody is going to use that." 

Allen said it needs to be clear with any AI algorithm that is adopted that it must be integrated into the usual radiologist or clinical workflow or it will never be used. This is where it is important to bring in the hospital IT team so they can coordinate with the AI vendor to ensure this can happen.

Creating a team approach to AI 

Hospitals first need to decide what is needed to get out of using AI, what is needed to support the software and the specific problems that need to be solved. A team should be created with key stakeholders so they all understand issues related to implementation and how the AI needs to be integrated into the workflow so everyone is on the same page. 

Allen said the level of involvement of the IT department may differ depending on the size of the institution. 

"One thing that we worry about is that while places like Stanford, Emory and other big academic institutions can do a lot of stuff with AI in-house because they have researchers and they have all this infrastructure, for most of us, and particularly in community hospitals, many are suffering from the fact that IT teams are maxed out," Allen explained. "They are busy putting out fires and they don't have the bandwidth for a lot of new projects."

He said this issue was the impetus for ACR to build the AI-Lab to make it easier for IT and radiology departments to do validation testing before they implement an algorithm. 

Hear more of the interview with Allen in the VIDEO: Validation monitoring for radiology AI to ensure accuracy.

Editor's note: Allen is also a radiologist at the Birmingham Radiological Group and the radiology residency program director at Brookwood Baptist Health in Birmingham. He practices at Grandview Medical Center in Birmingham.

Dave Fornell is a digital editor with Cardiovascular Business and Radiology Business magazines. He has been covering healthcare for more than 16 years.

Dave Fornell has covered healthcare for more than 17 years, with a focus in cardiology and radiology. Fornell is a 5-time winner of a Jesse H. Neal Award, the most prestigious editorial honors in the field of specialized journalism. The wins included best technical content, best use of social media and best COVID-19 coverage. Fornell was also a three-time Neal finalist for best range of work by a single author. He produces more than 100 editorial videos each year, most of them interviews with key opinion leaders in medicine. He also writes technical articles, covers key trends, conducts video hospital site visits, and is very involved with social media. E-mail: dfornell@innovatehealthcare.com

Trimed Popup
Trimed Popup