VIDEO: Assessing radiology AI and understanding programatic bias
Interview with Charles E. Kahn, Jr., MD, MS, editor of the the RSNA journal Radiology: Artificial Intelligence, and professor and vice chair of radiology at the University of Pennsylvania Perelman School of Medicine, discusses the need to validate artificial intelligence (AI) algorithms with your patient populations to determine if it is accurate for a specific institution's patients. He also explains how bias can be inadvertently added into an algorithm, and how the AI may take learning shortcuts that can impact the clinical results it displays.
"I think it is really important that people realize that when we hear artificial intelligence and think about AI, they tend to focus on the word 'intelligence,' but the thing they need to focus on is 'artificial,'" Kahn explained. "These are tools that are human built and they have all of the opportunities for error that humans can build into them. We also know systems can acquire biases and prejudices and be led down the primrose path by using short-cut learning where the AI figured out a connection to something that is not a proper connection."
He said radiology AI systems to identify pneumonia used the letter "L" on X-rays to identify the left side of the patient is orientated a certain way. In another AI app that was superb at identifying tuberculosis (TB) on X-rays, it was found the AI assessed positive patients when it saw the words "TB clinic" in the corner of the images that were used to train the system to identify patients who had TB, rather than looking at the clinical images themselves.
"These things mean that we have to be mindful and careful about testing these systems, and we have to be really rigorous in how we approach these things," Kahn explained. "You really have to go through the process of testing these systems on your patient population, because sometimes there are little issues that can arise where the system was not built for a population such as you have in your practice, and as a result it underperforms."
These little things can include incorrect findings because of variability with clinical presentations in some ethnic and racial groups, older and younger patients, males and females, thin verses obese patients, and between people with active verses sedentary lifestyles.
Performance of AI algorithms also might change when new variables are introduced, such as buying a new CT scanner that has a different imaging parameters.
"Maybe you AI algorithm to detect lung nodules is working great, and you buy a new CT scanner, or you upgrade the image reconstruction kernel and now the AI is not performing at the level it did before," Kahn said. "So everything you do requires very careful and thoughtful analysis."
As much as vendors may say how easy it is to use its AI, Kahn explained there is no easy road to double checking the algorithm on your own patient population and checking all the variables that may impact the AI assessments at your own institution.