Fluent in radiology: NLP method automatically extracts observations from mammo reports
A recent study demonstrated how successful natural language processing (NLP) systems can be at extracting imaging observations from free-text mammography reports, further closing the gap between unstructured report text and structured information extraction.
Published this month in the Journal of the American Medical Informatics Association, the research was led by Selen Bozkurt with the Akdeniz Unviersity Faculty of Medicine in Turkey.
Noting that radiology reports are narrative and unstructured, making it difficult to input contents into decision support systems, the team worked to develop NLP methods to recognize each lesion in free-text mammography reports and to extract corresponding relationships which would produce a complete information frame for each lesion.
Bozkurt and team noted that one method of reducing variation in reports is to standardize the vocabulary used in mammography reports, much like the adoption of the Breast Imaging-Reporting and Data System (BI-RADS).
The research team noted that NLP techniques for information extraction could acquire the structured information from reports needed to provide the inputs to decision support systems.
“The goal of this work is to develop and evaluate NLP methods to extract information on lesions and their corresponding imaging features (imaging observations with their modifiers) from free-text mammography reports, with the ultimate goal of providing inputs to [decision support systems] to guide radiology practice and to reduce variability in mammography interpretations,” Bozkurt and colleagues wrote.
A number of NLP systems have been developed previously for use in clinical records and these systems annotate syntactic, perform named entity recognition, map spans of text to concepts from a controlled vocabulary or ontology and identify the negation context of named entities.
“Our work tackles the challenge of recognizing mentions of breast lesions in mammography reports, extracting their relationships, and associating the extracted information with the respective lesions,” researchers wrote.
The team’s information extraction task focused on recognizing three types of named entities: imaging observations, their characteristics and anatomic locations of imaging observations while using BI-RADS controlled terminology.
The team tested their NLP method by using a set of 300 reports containing a total of 797 reported lesions. In 102 of the reports, an expert mammographer noted a deficiency in the structured data entries in the reporting application database (where cases were duplicated, missed or deficient in reporting).
Only the structured data entries determined by the expert mammographer were used in assessing the performance of the team’s NLP system.
The research team’s NLP system detected 815 lesions. Of these, 780 were true positives, 35 were false positives, and 17 were false negatives (lesions reported in the original set of 300 that were not detected by the system).
There were 57 partially matched cases among the detected lesions. In addition, there were 12 cases where calcification type was not detected correctly and 8 cases where breast density was not detected.
For the goal of perfect match information frame extraction, the precision of extracting the mentions of Imaging Observations with their modifiers by the team’s NLP system was 94.9, recall was 90.9, and the F measure was 92.8.
“Our NLP system extracts each imaging observation and its characteristics from mammography reports,” Bozkurt and colleagues wrote. “Although our application focuses on the domain of mammography, we believe our approach can generalize to other domains and may narrow the gap between unstructured clinical report text and structured information extraction needed for data mining and decision support.”