Radiologists outperform commercially available AI in PI-RADS scoring
Artificial intelligence applications are often touted as tools that can improve the performance of radiologists with varying experience levels, particularly in lesser experienced readers, but new data indicate that AI does not always live up to its hype.
Experts concluded this recently after testing a commercially available AI software said to improve Prostate Imaging-Reporting and Data System (PI-RADS) scoring consistency on bi-parametric MRI among radiologists with various levels of experience. They shared their findings in Insights into Imaging on March 20.
“The PI-RADS provides guidelines for acquiring and interpreting prostate MRI, and the benefits of the system have been demonstrated in large-scale multi-center studies,” corresponding author of the new paper Deniz Alis, with the Department of Radiology at Acibadem Mehmet Ali Aydinlar University in Turkey, and co-authors wrote. “However, despite the PI-RADS, there are still non-negligible intra-reader and inter-reader differences in interpreting prostate MRI.”
For the research, four different radiologists with experience levels ranging from two years to more than 20 years evaluated 153 bi-parametric prostate MRI scans both with and without the software. Experts found that use of the software had minimal impact on readers’ initial evaluations. In fact, use of the software resulted in just six total score changes (less than 1%) across all four readers.
Additionally, readers with more than five years of experience performed significantly better than the DL software in identifying clinically significant prostate cancer.
The authors noted that their findings contradict prior research that utilized the same software. They suggested that this could be due to out-of-distribution data for the DL software, which could impair its performance.
“Considering the current DL models that were trained on relatively small data, expecting an expert-level performance might be too optimistic at this moment,” the study authors explained.
They also suggested that designing DL software to replace radiologists is not a realistic expectation, but that researchers should instead focus on developing tools that can assist radiologists by standardizing processes (such as assigning PI-RADS scores), as that is the “more reachable target.”
The study abstract is available here.