Radiologists use structured CT reports to create database of metastatic cancer patterns
Natural language processing can extract imaging data within CT reports and generate patterns of disease spread in patients with cancer. The information can improve treatment and clinical decision making, experts said in a new analysis.
Medical imaging, particularly CT, is the “workhorse” of cancer imaging, and structured reports contain mounds of potentially insightful information. Manually scraping this data, however, requires expert training and is labor-intensive, oncology researchers explained Tuesday in Radiology.
So, the team turned to natural language processing for help. Three rads manually curated a sample of more than 2,200 CT reports, diagnosing each with metastases or not across 13 different organs. Three separate models were tested on nearly 400,000 reports from a single institution.
The top model notched accuracies between 90%-99% across all organs. And the new database of metastatic disease labels may go a long way toward cancer research, the authors noted.
“Such a database may one day aid investigators interested in pursuing correlative studies with genomics data, associating specific mutational profiles with metastatic phenotypes,” Richard K. G. Do, MD, of Memorial Sloan Kettering Cancer Center’s Department of Radiology and co-authors added. “Finally, a large database of metastatic labels may one day serve as a training tool for machine learning efforts to detect and measure disease on CT images directly.”
The retrospective study included patients who underwent a CT between July 2009 to April 2019, good for 387,359 reports. Chest, abdomen, pelvis CT and abdominopelvic nodes were most frequently reported to have metastatic disease (23.6% of reports), followed by thoracic nodes (17.6%), lungs (14.7%), liver (13.7%) and bones (9.9%).
Overall, the NLP established a database for more than 90,000 patients. And this initial set may just be the beginning, Do et al. noted.
“With larger annotation sets, better sampling of organs with less frequent metastatic disease, continual advances in NLP, and new developments in language models and machine learning techniques, we expect continued improvements in the accuracy of labeling metastatic disease in a greater range of radiology reports in the near future, including MRI and PET/CT,” the authors concluded.