Why deep learning trained on radiologist-labeled data may be worth added time, costs
Deep learning trained on radiologist-labeled images better detects life-threatening lung conditions compared to models developed using computer-generated data labels, new research published Tuesday suggests.
Many new, open-source image repositories contain labels derived via natural language processing, yet some experts argue these algorithms fall short upon radiologist inspection. Despite varying opinions, there’s been little research into this debate on man versus machine.
Using pneumothorax as a testing ground, imaging and data experts found algorithms trained by the hand of radiologists performed “significantly” better, they reported in Academic Radiology. This proved particularly true when assessing tools on images from outside healthcare organizations, a key hurdle for many AI architectures.
“The improved performance on the external dataset suggests that models trained with radiologist labels are more generalizable, and would provide better accuracy for pneumothorax detection when deployed into different institutions and clinical settings,” James Thomas Patrick Decourcy Hallinan, with the National University Hospital’s Department of Diagnostic Imaging in Singapore, and co-authors added.
The team used the National Institutes of Health ChestX-ray14 repository to develop two datasets: the original source containing 112,1220 frontal chest X-rays, including 5,302 positive labels for pneumothorax and 106,818 negatives; and another developed by four rads with 5,138 positive and 104,751 negative labels.
Those sets were used to independently train three CNN tools (ResNet-50, DenseNet-121 and EfficientNetB3). On internal NIH data, all models trained via radiologist labels notched much higher area under the receiver operating characteristic curve scores compared to NLP-trained networks.
And while NLP labels did well on NIH data, performance dropped “markedly” when tested on an external set of 525 chest radiographs from the Singapore-based emergency department.
Having radiologists manually manage images requires long hours and higher costs compared to NLP labels, which are based on existing radiology reports. And as technology improves, providers will likely face tough decisions on which to choose.
“As reasonable performance can already be obtained from automated NLP labels in large imaging repositories, radiology AI algorithm developers have to balance the benefits of radiologist annotation with the considerably higher barriers to obtaining such labels,” the authors concluded.