Generative AI model shows potential for reading chest X-rays used for TB screenings
Generative artificial intelligence (AI) models have shown great potential for improving multiple aspects of the radiology field, but a new analysis cautions that they still require significant oversight.
Generative AI is among the many emerging technologies experts believe can address radiologists’ growing workloads. Studies have shown these models can help streamline the reporting process, improve the communication of actionable findings and help patients better understand their reports. The models also can detect certain findings on radiologic images, which has prompted interest among experts about how their growing capabilities could benefit patients in low-resource areas.
Recently, researchers sought to determine how a generative AI model would perform at detecting tuberculosis-associated abnormalities on chest radiographs used for TB screenings, which are crucial for mitigating risk in high-risk settings.
“Chest radiographs play a crucial role in tuberculosis screening in high-prevalence regions, although widespread radiographic screening requires expertise that may be unavailable in settings with limited medical resources,” Eun Kyoung Hong, MD, PhD, with Mass General Brigham in Boston, and colleagues noted.
Using data from two public TB screening datasets, the group trained a generative AI model to create free-text reports for chest radiographs. The model was tasked with labeling the radiographs based on the presence versus absence and laterality of tuberculosis-related abnormalities, while radiologists interpreted the images themselves and then determined whether they would or would not accept the model’s report as-is. Two additional radiologists determined the reference standard.
Based on the reference standard, just more than half of the 800 radiographs contained TB-related abnormalities. The AI-generated reports yielded a sensitivity, specificity and accuracy of 95.2%, 86.7% and 90.8%. The readers’ performances were slightly higher than the model, though their localization measures were significantly better. For radiographs deemed either normal or abnormal by AI, the first reader signaled they would have accepted 91.7% and 52.4% of the model’s reports, while the second reader indicated 83.2% and 37.0% of the reports were acceptable.
Though the team believes the model has potential, they suggest it will require improvements before clinical deployment is a viable option.
“While AI-generated reports may augment radiologists' diagnostic assessments, the current model requires human oversight given inferior standalone performance,” the group concluded.
Read more here.