stnazkul - stock.adobe.com
Gen AI patient portal messages OK, but need human review
Using generative AI to craft patient portal messages can boost patient education, but human review will ensure patient safety.
More evidence is emerging indicating that generative AI helps lessen clinician burden by drafting patient portal message responses—but only if there’s a clinician there to review the response before it’s sent out, according to a group of researchers from Mass General Brigham.
The researchers, writing in The Lancet Digital Health, said clinicians are necessary for adding instructions to patients within patient portal responses.
“Generative AI has the potential to provide a ‘best of both worlds’ scenario of reducing burden on the clinician and better educating the patient in the process,” corresponding author Danielle Bitterman, MD, a faculty member in the Artificial Intelligence in Medicine (AIM) Program at Mass General Brigham, said in a public statement.
Indeed, patient portal messages can take up a significant amount of time for clinicians, who are already steeped in other administrative duties. A separate 2023 analysis in JAMIA showed that patient portal messages skyrocketed during the pandemic. Each patient portal message was tied to a 2.3-minute increase in EHR time each day, the study showed.
This MGH Brigham study indicated that generative AI and large language models (LLMs) can help solve the problem of full EHR inboxes. The LLM can generate a response much like ChatGPT to answer patient queries sent to the patient portal, taking that duty from time-pressed providers. Some EHR vendors are already testing this approach.
But there are some caveats, said Bitterman, who is also a physician in the radiation oncology department at Brigham and Women’s Hospital.
“However, based on our team’s experience working with LLMs, we have concerns about the potential risks associated with integrating LLMs into messaging systems,” according to Bitterman. “With LLM-integration into EHRs becoming increasingly common, our goal in this study was to identify relevant benefits and shortcomings.”
The researchers used GPT-4 from OpenAI to test how LLM-based messages stacked up against provider-generated messages to 100 hypothetical patient queries. A group of six radiation oncologists were tasked with answering the patient queries, and those responses were compared to the AI responses after being reviewed and edited by radiation oncologists.
Overall, the AI responses were effective. For example, the human radiation oncologist reviewers thought the AI responses were actually crafted by a human 31 percent of the time. AI responses also did not need human editing 58 percent of the time. Reviewers said the AI responses typically included more detailed patient education content.
Even still, AI responses still need a human review before being sent to patients, the radiation oncologists advised. For one thing, AI responses did not typically offer directive advice for patients. Those directions needed to be added in by a clinician.
And even though the LLM-generated responses were deemed safe in 82.1 percent of cases, when they weren’t appropriate for patients, the consequences could be disastrous, the researchers said.
In 7.1 percent of the LLM-generated responses, there was a patient safety risk. In 0.6 percent of cases, the LLM response could pose a risk of death. In most cases, this was because the message did not urgently instruct the patient to seek medical care.
As healthcare continues its exploration into LLMs and how they can augment care, it is crucial leaders do not overlook the potential to reduce administrative burden. After all, answering patient portal messages can take up a lot of time for providers. Using LLMs, clinicians may circumvent at least some of their administrative tasks.
But it will be necessary to address any potential patient safety pitfalls that can come with delegating certain tasks to generative AI, Bitterman stated.
“Keeping a human in the loop is an essential safety step when it comes to using AI in medicine, but it isn’t a single solution,” Bitterman explained. “As providers rely more on LLMs, we could miss errors that could lead to patient harm. This study demonstrates the need for systems to monitor the quality of LLMs, training for clinicians to appropriately supervise LLM output, more AI literacy for both patients and clinicians, and on a fundamental level, a better understanding of how to address the errors that LLMs make.”