Age, race and density status influence AI performance on mammogram reads
Certain patient demographic characteristics could affect the outcomes of AI-interpreted breast cancer screenings, a new study suggests.
Screening mammography is an area where artificial intelligence has great potential to reduce radiologists’ workloads. And though studies have shown it to be effective as a support tool, several have also highlighted issues related to the potential for bias in algorithms that have not been trained on diverse datasets.
This issue was again brought to light in this most recent study, published in the journal Radiology.
“There are few demographically diverse databases for AI algorithm training, and the FDA does not require diverse datasets for validation,” Derek L. Nguyen, MD, assistant professor at Duke University in Durham, North Carolina, said in a release on the findings. “Because of the differences among patient populations, it’s important to investigate whether AI software can accommodate and perform at the same level for different patient ages, races and ethnicities.”
For the study, experts tested for bias using an algorithm approved the by the U.S. Food and Drug Administration for generating breast cancer risk scores based on mammographic findings. They included nearly 5,000 cases with confirmed negative screenings to see if the algorithm would flag any as suspicious.
“Our goal was to evaluate whether an AI algorithm’s performance was uniform across age, breast density types and different patient race/ethnicities,” Nguyen said.
The team found that the algorithm was significantly more likely to flag the imaging of Black women as having suspicious findings. These false positives were also more common among older women between 71-80 and in those who had extremely dense breasts.
Asian women and younger women (ages 41-50) had fewer false positives in comparison to white women and those in their 50s and 60s.
“This study is important because it highlights that any AI software purchased by a healthcare institution may not perform equally across all patient ages, races/ethnicities and breast densities. Moving forward, I think AI software upgrades should focus on ensuring demographic diversity,” Nguyen said, adding that institutions looking to purchase similar software need to take their patients’ demographics into consideration before doing so.
“Having a baseline knowledge of your institution’s demographics and asking the vendor about the ethnic and age diversity of their training data will help you understand the limitations you’ll face in clinical practice,” he said.
The study abstract is available here.