ImpressionGPT, a ChatGPT-based framework, can accurately summarize radiology reports
A group of computer science researchers have developed their own ChatGPT-based framework for summarizing radiology reports.
ImpressionGPT leverages the in-context learning abilities of large language models to generate report summaries using domain-specific, individualized data relative to radiology. This approach could reduce report errors while also saving radiologists’ time, experts involved in the model’s development suggested.
“The 'impression' section of a radiology report is a critical basis for communication between radiologists and other physicians, and it is typically written by radiologists based on the 'findings' section. However, writing numerous impressions can be laborious and error-prone for radiologists,” Chong Ma, with the School of Automation at Northwestern Polytechnical University in China, and co-authors of the new study noted.
The group built on ChatGPT’s capabilities using an iterative optimization method that provided them with more precise results than that of OpenAI’s chatbot, which has been known to sometimes provide inaccurate data when prompted to answer medical questions. Similar search algorithms were used to develop dynamic prompts that incorporate data from radiology reports, which are then used to train the LLM to generate reports detailing imaging findings and impressions. This is known as prompt engineering, the authors explained.
“In general, prompt engineering is a new paradigm in the field of natural language processing, and although still in its early stages, has provided valuable insights into effective prompt patterns,” the authors wrote. “These patterns provide a wealth of inspiration, highlighting the importance of designing prompts to provide value beyond simple text or code generation.”
Using dynamic prompts, the group observed a significant improvement in the quality of ChatGPT’s generated responses. By fine-tuning a small number of samples that were presented to the LLM, the group was able to generate correct impressions while also training the model how to avoid “bad responses” that are presented in similar writing styles.
On comparative analysis, ImpressionGPT outperformed several other LLMs used to assist with radiology report summarization.
“Our approach has demonstrated state-of-the-art results, surpassing existing methods that employ large volumes of medical text data for pre-training,” the group wrote. “Furthermore, this work is a precursor to the development of other domain-specific language models in the current context of artificial general intelligence.”
In the future, the team intends to continue to optimize their prompt design and will begin to explore privacy and data security concerns as well.
The study’s details are available here.