Getty Images

Generative AI May Help Flag Social Determinants of Health in EHR Data

Fine-tuned large language models were able to accurately identify 93.8 percent of patients with adverse SDOH who could benefit from additional support.

Researchers from Mass General Brigham have found that specialized large language models (LLMs) can identify under-documented social determinants of health (SDOH) in electronic health records (EHRs), according to a study published this week in npj Digital Medicine.

SDOH have a significant impact on patient outcomes, but the research team underscored that SDOH documentation in clinical notes is often incomplete or missing. As a result, extracting this information to provide additional support to patients can be a challenge, leading the researchers to investigate generative AI’s potential in this area.

“Our goal is to identify patients who could benefit from resource and social work support, and draw attention to the under-documented impact of social factors in health outcomes,” said corresponding author Danielle Bitterman, MD, a faculty member in the Artificial Intelligence in Medicine (AIM) Program at Mass General Brigham and a physician in the Department of Radiation Oncology at Brigham and Women’s Hospital, in a news release detailing the study.

“Algorithms that can pass major medical exams have received a lot of attention, but this is not what doctors need in the clinic to help take better care of patients each day. Algorithms that can notice things that doctors may miss in the ever-increasing volume of medical records will be more clinically relevant and therefore more powerful for improving health,” Bitterman continued.

The research team noted that clinicians often summarize SDOH information in their visit notes, but these data are rarely systematically organized in EHRs.

To address this, the researchers sought to fine-tune LLMs for SDOH data extraction. They began by manually reviewing 800 clinical notes from 770 cancer patients who received radiotherapy at the Department of Radiation Oncology at Brigham and Women’s Hospital.

During this process, the research team flagged sentences that referred to one or more of six SDOH categories: employment status, housing, transportation, parental status, relationships, and the presence or absence of social support.

After annotating these data, the information was then used to train existing LLMs to identify clinician references to SDOH.

Each model was then tested using an additional cohort of 400 patients who received immunotherapy at Dana-Farber Cancer Institute and patients admitted to critical care units at Beth Israel Deaconess Medical Center.

This testing revealed that fine-tuned LLMs can accurately and consistently identify SDOH references in EHRs. Official diagnostic codes included these data in approximately two percent of cases, whereas the generative AI models flagged 93.8 percent of patients with adverse SDOH.

However, the “learning capacity” of the LLMs was limited by the relative rarity of SDOH documentation in the training data, as only three percent of sentences in clinician notes referenced social determinants. The researchers addressed this by leveraging ChatGPT to generate 900 additional synthetic samples of SDOH references that could be utilized for extra model training.

The specialized LLMs were also less prone to bias than other generalist models, such as GPT-4. The research team found that their fine-tuned LLMs were significantly less likely to alter their determinations based on a patient’s race, ethnicity, and gender.

Despite this, the research team underscored that it is difficult to understand the origins of algorithmic bias, so more research is needed.

“If we don’t monitor algorithmic bias when we develop and implement large language models, we could make existing health disparities much worse than they currently are,” Bitterman said. “This study demonstrated that fine-tuning LMs may be a strategy to reduce algorithmic bias, but more research is needed in this area.”

This is just one potential use case for generative AI in healthcare that researchers are exploring.

In June, a research team from Beth Israel Deaconess Medical Center (BIDMC) found that generative AI tools like ChatGPT have significant potential to assist clinicians with complex diagnostic cases.

To evaluate the accuracy of these tools, the researchers tested ChatGPT-4’s performance on 70 complex diagnostic reasoning challenges. The model was then tasked with providing potential diagnoses based on each case.

The final diagnosis was included in the tool’s differential in a majority of cases, and the model’s top diagnoses matched the final diagnosis in just under 40 percent of cases. The study had multiple limitations, but the research team noted that the study plays an important role in expanding the literature exploring the promise of healthcare AI.

Next Steps

Dig Deeper on Artificial intelligence in healthcare