Deep-learning classifier understands free-text radiology reports
Free-text radiology reports can be automatically classified by convolutional neural networks (CNNs) powered by deep-learning algorithms with accuracy that’s equal to or better than that achieved by traditional—and decidedly labor-intensive—natural language processing (NLP) methods.
That’s the conclusion of researchers led by Matthew Lungren, MD, MPH, of Stanford University. The team tested a CNN model they developed for mining pulmonary-embolism findings from thoracic CT reports generated at two institutions.
Radiology published their study, lead-authored by Matthew Chen, MS, also of Stanford, online Nov. 13.
The researchers analyzed annotations made by two radiologists for the presence, chronicity and location of pulmonary embolisms, then compared their CNN’s performance with that of an NLP model considered quite proficient in this task, called PeFinder.
They note that PeFinder and similar existing NLP techniques demand a “relatively high burden of development, including domain-specific feature engineering, complex annotations and laborious coding for specific tasks.”
Lungren and colleagues found their CNN model—which was trained on 2,500 radiology reports, based on an open-source deep-learning library and used an unsupervised learning algorithm—could accurately predict the presence and characteristics of pulmonary embolism from unstructured text in CT reports with 99 percent accuracy.
The CNN model also performed significantly better than PeFinder in annotation task performance in an intrainstitutional validation set while performing about equally well in validation measures from the two contributing institutions.
Further, the experimental CNN model had a statistically significant larger F1 score, reflecting high precision in ascertaining positive vs. negative embolism findings, than PeFinder, “representing a new benchmark in the classification of radiology free-text reports,” the authors report.
Lungren and colleagues underscore the ready applicability of the CNN model vs. its NLP counterpart.
“The main intuition underlying the [CNN] model is the simple observation that ratios of word-word co-occurrence probabilities have the potential for encoding some form of meaning,” Lungren et al. explain. “Briefly, the model automatically converts each word, including punctuation and other special characters, in a given report to a dense vector representation.”
They suggest their CNN model is sufficiently sharp at, and generalizable for, automatically classifying free-text radiology reports that it “could be made available for a variety of applications, including diagnostic surveillance, cohort building, quality assessment, labels for computer vision data and clinical decision support services.”