‘New frontier’: Hybrid tool eliminates time, cost of extracting clinical data from screening reports

A newly developed technology eliminates much of the time and costs required to manually extract data from colonoscopy screening and pathology reports, according to research published in the May issue of Gastrointestinal Endoscopy.

More than 15 million colonoscopies were performed in 2012 alone, with such screenings leading to lower colorectal cancer incidence and mortality. And while evidence suggests exam quality may hinder its effectiveness, the information needed to assess quality is buried in free-form reports and difficult to obtain.

A new technique combining optical character recognition with natural language processing proved capable of easily extracting such information. And it may be a gamechanger for colorectal cancer research and care.

“A process which was previously expensive and time-consuming can now potentially be done accurately in a time- and labor-efficient manner,” study author Maged Rizk, MD, a gastroenterologist and associate director for the Cleveland Clinic Medicare Accountable Care Organization, and colleagues wrote in the study.

For their work, Rizk and co-investigators gathered a random list of outpatient screening colonoscopy procedures and path reports from patients treated at the Cleveland Clinic and the University of Minnesota. Two experts manually reviewed reports for relevant clinical information and then used the hybrid OCR/NLP tool to gather the same data points from electronic health records.

The algorithm performed remarkably well at retrieving clinical variables, extracting information from image-formatted reports with 95% accuracy. It proved proficient at detecting polyps (95.8% accuracy), adenomas (98.5%), sessile serrated polyps (99.3%), and advanced adenomas (98%), among other findings.

And the information gathered only using natural language processing versus that collected via the hybrid approach showed nearly all clinical variables were identified with 99% accuracy.

Unstructured reporting notes proved to be the source of most disagreements between manually annotated data and that collected via the novel tool.

Future studies will involve many other healthcare centers to further validate the hybrid tool.

“The results of this proof-of-concept study create a new frontier in the use of large-scale data extraction from scanned reports, which was previously limited by lack of appropriate technology,” Rizk added.

Read the entire study here.

""

Matt joined Chicago’s TriMed team in 2018 covering all areas of health imaging after two years reporting on the hospital field. He holds a bachelor’s in English from UIC, and enjoys a good cup of coffee and an interesting documentary.

Trimed Popup
Trimed Popup