Stanford researchers find more data isn’t better when training AI to classify chest x-rays
Researchers from Stanford University in Stanford, California, have determined that convolutional neural networks (CNNs) trained with just 20,000 labeled images can accurately classify chest x-rays as either normal or abnormal, according to a new study published Nov. 13 in Radiology.
“This work could be clinically important both by permitting radiologists to spend more time on abnormal studies and by demonstrating a simple mechanism to combine physician judgment with deep learning algorithms such as CNNs in a manner that can improve interpretation performance,” wrote lead author Jared A. Dunnmon, PhD, from the department of computer science at Stanford University and colleagues—who noted that training the CNN with 200,000 images only produced a “marginal” improvement.
For the study, the researchers used 216,431 frontal chest x-rays obtained between 1998 and 2012 to train CNNs to classify the images as normal or abnormal. The effects of development set size, training set size, initialization strategy and network architecture on end performance were evaluated using standard binary classification metrics, according to the researchers.
To show how the dataset impacted the CNN, the team assessed the receiver operating the CNN’s characteristic curve (AUC) using 200,000 and 2,000 images. When the CNN was trained with 200,000, the average area under the AUC was 0.96, “but was not significantly different from that observed when the model was trained with 20 000 images (AUC = 0.95, P . .05),” according to the researchers. This AUC value was 0.84 when the CNN was trained with 2,000 images.
The researchers concluded that the study shows that CNNs can still be successfully trained with limited datasets if unable to obtain greater amounts.
“The results of our study should be validated in other patient populations,” the researchers wrote. “However, our findings suggest a distinct value to combining deep-learning techniques, such as CNNs, with data sets of sizes already accessible to many institutions to improve thoracic imaging triage.”