Open-source, local large language model can read radiology reports while respecting privacy
As the potential applications of large language model (LLM) AI in healthcare expand, researchers are trying to understand how well these systems protect patient privacy.
Research from the National Institutes of Health (NIH) has explored the viability of LLMs to extract valuable information from radiology reports while maintaining stringent patient privacy standards. The study was conducted using a local LLM model that is not public and not connected to the internet, and the results show in the right context, they can perform a diagnostic analysis while respecting anonymity. The results are published in Radiology. [1]
Despite a stream of research on the use of LLMs in healthcare, due to HIPAA rules and privacy concerns, the models have not been widely used in actual patient care settings. However, with the advent of locally run, open-source variations, such as Vicuna-13B used in this study, many of those concerns may be alleviated.
“ChatGPT and GPT-4 are proprietary models that require the user to send data to OpenAI sources for processing, which would require de-identifying patient data,” study author Ronald M. Summers, MD, PhD, from the NIH said in a statement. “Removing all patient health information is labor-intensive and infeasible for large sets of reports.”
To test the security and viability of Vicuna-13B, the research team instructed the LLM to label critical findings in chest radiography reports from two robust datasets: the NIH’s own database, as well as the Medical Information Mart for Intensive Care (MIMIC) database, a publicly accessible repository of de-identified health records from a critical care setting.
In total, the study utilized 25,596 chest X-rays from the NIH and another 3,269 from MIMIC. From this dataset, researchers instructed the Vicuna LLM to identify and label the presence or absence of 13 specific findings within the chest radiography reports. It was given two different prompts to perform this task, allowing researchers to better understand how to best utilize the AI.
The study authors then compared the AI's performance with two widely used non-LLM labeling tools. While one of the prompts produced inconsistent results, when executing the second Vicuna-13B exhibited an average of “moderate to substantial agreement” with the non-LLM labelers for both the NIH and MIMIC datasets, according to the statement released on the findings.
Additionally, Vicuna-13B, again using the second prompt, performed on par with both labelers in nine out of the 11 specific findings, as indicated by a median area under the receiver operating characteristic curve (AUC) of 0.84.
Potential to go beyond reading chest radiographs
The study findings suggest that locally run LLMs present a viable option for extracting essential data from radiology reports, and they could even be more accurate than alternatives. As for the privacy preserving element, that’s an inherent feature of free, open-source LLMs, as they are not transmitting or storing personal health information.
Summers added that LLMs could also be useful in creating large datasets for AI research without compromising patient privacy: “LLMs that are free, privacy-preserving, and available for local use are game changers,” he said. “They're really allowing us to do things that we weren't able to do before.”
Summers thinks this research opens the door for LLMs to extract important information from other text-based radiology reports and medical records, not just chest X-rays. It could be a useful tool for identifying disease biomarkers and clinical decision support.
“My lab has been focusing on extracting features from diagnostic images,” he said. “With tools like Vicuna, we can extract features from the text and combine them with features from images for input into sophisticated AI models that may be able to answer clinical questions."
The full study is available here.