Researchers cite safety concerns after uncovering 'harmful behavior' of fracture-detecting AI model

Hannah Murphy | April 07, 2022 | Health Imaging | Artificial Intelligence

Researchers are cautioning the overly optimistic AI enthusiasts after a recent algorithmic audit revealed the potentially “harmful behavior” of a validated model intended to detect femoral fractures.

A study in the Lancet Digital Health reports that a previously validated, high performing AI model committed troublesome errors when confronted with atypical anatomy while seeking out subtle proximal femur fractures. Researchers noted that despite the model’s exceptional performance on external validation, its preclinical performance revealed barriers that would inhibit the algorithm’s ability to be safely deployed in clinical practice. Experts involved in the study acknowledged that this is a common obstacle when transitioning artificial intelligence systems into every day, real world clinical practice.

“Historically, computer-aided diagnosis systems have often performed unexpectedly poorly in the clinical setting despite promising preclinical evaluations, a concept known as the implementation gap,” corresponding author Lauren Oakden-Rayner, MBBS, with the Australian Institute for Machine Learning in Adelaide, Australia, and co-authors explained. “Few preclinical artificial intelligence research studies have addressed these concerns; for example, external validation—an assessment of the ability of a model to generalize to new environments—has only been done in around a third of studies.”

Algorithmic auditing has been presented as a possible solution to help overcome the problem of poorly performing computer-aided systems in clinical settings. Auditing can identify and mitigate any issues that would cause the algorithm to maintain diagnostic performance. That’s exactly what researchers did with their preclinical evaluation of a previously validated deep learning model developed to detect proximal femur fractures based off frontal x-ray films in emergency settings.

The researchers conducted a reader study comparing the model’s performance to that of five radiologists. The dataset contained 200 fracture cases and 200 non-fracture cases. An external validation dataset was also used before conducting an algorithmic audit to detect any unusual model behavior.

In the reader study, the model’s AUC was .994 compared to .969 for the radiologists. The model also performed well on the external validation dataset, achieving an AUC of .980. However, in preclinical testing, the model encountered issues when presented with cases of abnormal bones, such as those seen in Paget’s disease. This resulted in an increased rate of error and caused the researchers to question the model’s safety in a clinical setting.

“Given the tendency of artificial intelligence models to behave in unexpected ways (i.e., unlike a human expert would), the inclusion of an algorithmic audit appears to be informative,” the experts said. “Identifying the types of cases an artificial intelligence model fails on might assist in bridging the current gap between apparent high performance in preclinical testing and challenges in the clinical implementation of an artificial intelligence model.”

The researchers concluded by suggesting that these algorithmic audits are necessary to develop safe clinical testing protocols.

More on artificial intelligence in medical imaging:

AI predicts COVID prognosis at near-expert level using CT scoring system

AI software that triages x-rays for pneumothorax receives FDA clearance

AI assists radiologists in detecting fractures, improves workflow

Misuse of public imaging data is producing 'overly optimistic' results in machine learning research

AI tool achieves excellent agreement for knee OA severity classification

Hannah Murphy

In addition to her background in journalism, Hannah also has patient-facing experience in clinical settings, having spent more than 12 years working as a registered rad tech. She began covering the medical imaging industry for Innovate Healthcare in 2021.

Around the web

Cardiovascular Business

ASE updates recommendations for assessing right heart function in patients with pulmonary hypertension

The new guidelines were designed to ensure sonographers and other members of the heart team have the information they need to screen patients when appropriate and identify early warnings signs of PH.

Radiology Business

Shift toward imaging outside the hospital could save billions

Harvard’s David A. Rosman, MD, MBA, explains how moving imaging outside of hospitals could save billions of dollars for U.S. healthcare.

Cardiovascular Business

‘A significant milestone’: First US patients receive doses of new PET radiotracer for CAD

Back in September, the FDA approved GE HealthCare’s new PET radiotracer, flurpiridaz F-18, for patients with known or suspected CAD. It is seen by many in the industry as a major step forward in patient care.

Researchers cite safety concerns after uncovering 'harmful behavior' of fracture-detecting AI model

More on artificial intelligence in medical imaging:

Related Content

Around the web