AI shows potential to diagnose wrist fractures as well as radiologists

A new meta-analysis found that artificial intelligence (AI) algorithms, particularly convolutional neural networks (CNNs), accurately detect wrist fractures from plain X-rays, performing comparably to trained healthcare professionals. However, the details of these studies suggest there are methodological limitations that carry a high risk of bias, calling into question the rapid implementation of clinical algorithms for diagnostic purposes. While it's clear these tools are useful, manual validation of findings remains of the utmost importance.

The full findings of the meta-analysis were published in the European Journal of Radiology.[1]

Researchers from Denmark conducted a search of Embase, Medline, PubMed, Scopus, and Web of Science medical databases from January 2012 to March 2023, identifying six relevant studies of deep-learning AI used to diagnose broken wrists, specifically radial and ulnar fractures, based on radiographs. In total, the dataset for the studies contained 33,026 X-ray images.

All of the studies had a similar methodology, with AI being tasked with making an accurate diagnosis of the fractures after being trained on a dataset of images. All six studies used CNN models to make their findings, and in all of them the model was pitted against a healthcare expert trained in fracture diagnostics. 

The researchers noted their narrow scope for this meta-analysis is a result of a high-rate of fractures being undiagnosed during emergency department (ED) visits, as they can be challenging to spot on X-rays. 

“Human and environmental factors can affect the interpretation of the radiograph, such as clinician inexperience, fatigue, distractions, poor viewing conditions, and time pressures,” the authors, led by V. Hansen from the University Hospital of Southern Denmark, wrote. “One study concluded that approximately one percent of all fractures in the ED were not correctly diagnosed.”

After reviewing the collective findings of the six studies, the researchers found that CNN performance, compared against healthcare experts' consensus as the reference standard, exhibited a sensitivity rate of 92% and a specificity rate of 93%, making the model comparable for detecting wrist fractures on radiographs. Effectively, this means CNN is useful as a first eyes for review, and after a check by a healthcare professional, the missed fractures on X-rays would likely be substantially reduced. 

However, considerable heterogeneity was observed among the studies, raising concerns of bias and quality. There isn’t a high level of confidence in control datasets used for training the models, and the actual experience of healthcare experts can be hard to quantify when tasked with reading radiographs. Additionally, the limited number of studies meeting the criteria still calls the actual effectiveness of the CNN into question. 

In other words, there are many variables, and more research is needed to draw any firm conclusions. The authors wrote that future research should prioritize external dataset testing, employ uniform methods, and establish a robust reference standard by independent experts to truly measure the effectiveness of these diagnostic AI algorithms. Further, incorporating patient outcomes as the reference for evaluating CNNs in future studies would provide valuable insights into their true value. 

Pending more data, the authors were narrow in their ultimate conclusion: “For clinicians, AI could potentially be used to enhance diagnostic confidence, especially in fields of radiology. AI algorithms may be particularly useful for less experienced clinicians,” they wrote. 

The full findings from the meta-analysis are available at the link below.

Chad Van Alstin Health Imaging Health Exec

Chad is an award-winning writer and editor with over 15 years of experience working in media. He has a decade-long professional background in healthcare, working as a writer and in public relations.

Trimed Popup
Trimed Popup