Getty Images

ChatGPT, Provider Responses Almost Indistinguishable to Patients

Researchers investigating the utility of chatbots for patient-provider communication found that laypeople may trust ChatGPT to answer low-risk health questions.

Researchers from New York University’s (NYU) Grossman School of Medicine and Tandon School of Engineering demonstrated that ChatGPT’s responses to healthcare-related queries were nearly indistinguishable from those provided by human clinicians, according to a study published in JMIR Medical Education last week.

The research team noted that artificial intelligence (AI)-powered chatbots are currently being piloted to help draft responses to patients’ queries, but the general population’s ability to distinguish between chatbot and provider responses, in addition to patients’ trust in chatbots in this area, have not been well-established.

To address this, the researchers looked at the feasibility of using ChatGPT and similar AI-based chatbots to support patient-provider communication.

The research team conducted a United States-based survey of 392 adults recruited from Prolific, a crowdsourcing platform for academic studies. Each participant was presented with ten non-administrative patient questions extracted from patient EHRs, which were then answered by either a provider or the chatbot.

For those responses generated by ChatGPT, the tool was tasked with answering the patient query using approximately the same word count as the human provider’s response.

Participants were informed that of the ten query/response interactions, five were answered by a provider, while the remaining five were answered by a chatbot. From there, the cohort was asked and incentivized financially to identify the source of each response.

Participants were also asked to rate their trust in chatbots’ use in various patient-provider communication tasks using a Likert scale from one to five.

The researchers found that participants had limited ability to correctly distinguish between provider and chatbot responses. The cohort accurately identified chatbot responses in 65.5 percent of cases and provider responses in 65.1 percent of cases on average.

These results were consistent across demographic categories, with participants’ accuracy ranging from 49.0 to 85.7 percent depending on the question presented in the survey.

In terms of patients’ trust in chatbots, the research team demonstrated that participants’ responses were weakly positive, with the cohort rating their trust in these tools at 3.4 out of five on average.

However, trust varied based on the health-related complexity of the task being presented. For logistical questions, such as insurance queries and scheduling appointments, trust was highest at 3.94 out of five. Preventative care, like cancer screening- and vaccine-related tasks, achieved an average score of 3.52.

Diagnostic and treatment advice resulted in the lowest average trust ratings of 2.90 and 2.89.

These findings indicate that laypeople appear to trust the use of chatbots to respond to low-risk health queries, with the researchers suggesting that these tools may prove useful for administrative- and common chronic disease management-related patient-provider communication tasks in the future.

Despite this, the research team stated that more studies in this area are needed. Further, they cautioned that providers should exercise critical judgment when leveraging chatbot-generated advice, as these tools have limitations and biases.

This study highlights how researchers are investigating the potential utility of AI chatbots in healthcare, particularly in patient education and health literacy.

Last month, a research team demonstrated that ChatGPT may be able to consistently provide evidence-based responses to public health questions. However, the tool primarily provided advice, rather than referring users to health-related resources.

When compared to previously established benchmarks for tools like Apple Siri, Amazon Alexa, Microsoft’s Cortana, Samsung’s Bixby, and Google Assistant, ChatGPT achieved significant results across queries in four public health domains.

For all 23 questions, ChatGPT outperformed these other tools.

Next Steps

Dig Deeper on Artificial intelligence in healthcare