Researchers cite safety concerns after uncovering 'harmful behavior' of fracture-detecting AI model

Hannah Murphy | April 07, 2022 | Health Imaging | Artificial Intelligence

Researchers are cautioning the overly optimistic AI enthusiasts after a recent algorithmic audit revealed the potentially “harmful behavior” of a validated model intended to detect femoral fractures.

A study in the Lancet Digital Health reports that a previously validated, high performing AI model committed troublesome errors when confronted with atypical anatomy while seeking out subtle proximal femur fractures. Researchers noted that despite the model’s exceptional performance on external validation, its preclinical performance revealed barriers that would inhibit the algorithm’s ability to be safely deployed in clinical practice. Experts involved in the study acknowledged that this is a common obstacle when transitioning artificial intelligence systems into every day, real world clinical practice.

“Historically, computer-aided diagnosis systems have often performed unexpectedly poorly in the clinical setting despite promising preclinical evaluations, a concept known as the implementation gap,” corresponding author Lauren Oakden-Rayner, MBBS, with the Australian Institute for Machine Learning in Adelaide, Australia, and co-authors explained. “Few preclinical artificial intelligence research studies have addressed these concerns; for example, external validation—an assessment of the ability of a model to generalize to new environments—has only been done in around a third of studies.”

Algorithmic auditing has been presented as a possible solution to help overcome the problem of poorly performing computer-aided systems in clinical settings. Auditing can identify and mitigate any issues that would cause the algorithm to maintain diagnostic performance. That’s exactly what researchers did with their preclinical evaluation of a previously validated deep learning model developed to detect proximal femur fractures based off frontal x-ray films in emergency settings.

The researchers conducted a reader study comparing the model’s performance to that of five radiologists. The dataset contained 200 fracture cases and 200 non-fracture cases. An external validation dataset was also used before conducting an algorithmic audit to detect any unusual model behavior.

In the reader study, the model’s AUC was .994 compared to .969 for the radiologists. The model also performed well on the external validation dataset, achieving an AUC of .980. However, in preclinical testing, the model encountered issues when presented with cases of abnormal bones, such as those seen in Paget’s disease. This resulted in an increased rate of error and caused the researchers to question the model’s safety in a clinical setting.

“Given the tendency of artificial intelligence models to behave in unexpected ways (i.e., unlike a human expert would), the inclusion of an algorithmic audit appears to be informative,” the experts said. “Identifying the types of cases an artificial intelligence model fails on might assist in bridging the current gap between apparent high performance in preclinical testing and challenges in the clinical implementation of an artificial intelligence model.”

The researchers concluded by suggesting that these algorithmic audits are necessary to develop safe clinical testing protocols.

More on artificial intelligence in medical imaging:

AI predicts COVID prognosis at near-expert level using CT scoring system

AI software that triages x-rays for pneumothorax receives FDA clearance

AI assists radiologists in detecting fractures, improves workflow

Misuse of public imaging data is producing 'overly optimistic' results in machine learning research

AI tool achieves excellent agreement for knee OA severity classification

Hannah Murphy

In addition to her background in journalism, Hannah also has patient-facing experience in clinical settings, having spent more than 12 years working as a registered rad tech. She began covering the medical imaging industry for Innovate Healthcare in 2021.

Around the web

Cardiovascular Business

GE HealthCare launches new cardiac CT scanner with advanced AI capabilities

GE HealthCare designed the new-look Revolution Vibe CT scanner to help hospitals and health systems embrace CCTA and improve overall efficiency.

Cardiovascular Business

Bracco updates HeartSee coronary flow capacity software with new diagnostic features

Clinicians have been using HeartSee to diagnose and treat coronary artery disease since the technology first debuted back in 2018. These latest updates, set to roll out to existing users, are designed to improve diagnostic performance and user access.

Cardiovascular Business

Key trends in diagnostic heart testing: CT on the rise as some traditional techniques fall out of favor

The cardiac technologies clinicians use for CVD evaluations have changed significantly in recent years, according to a new analysis of CMS data. While some modalities are on the rise, others are being utilized much less than ever before.

Researchers cite safety concerns after uncovering 'harmful behavior' of fracture-detecting AI model

More on artificial intelligence in medical imaging:

Related Content

Around the web