Rasi Bhadramani/istock via Getty
ChatGPT Provides Evidence-Based Answers to Public Health Questions
ChatGPT may consistently provide evidence-based responses to public health queries, but the tool primarily offers advice rather than referrals to resources.
In a recent study published in JAMA Network Open, researchers found that ChatGPT consistently provided evidence-based responses to public health questions but often gave advice rather than directing users to health-related resources.
The research team indicated that artificial intelligence (AI) assistants, like Google Assistant and Amazon Alexa, have the potential to bolster public health by providing accurate and actionable information to the general population, a challenge exacerbated by factors such as COVID-19 misinformation and shaky trust in public health agencies.
However, these AI assistants often fail to recognize or adequately answer basic public health inquiries. Many laypeople instead rely on web-based knowledge resources like Google Search to answer these questions, but these tools are limited in that they return multiple results to a given query and require the user to synthesize the information provided.
AI assistants built on recent advancements in large language models (LLMs), such as ChatGPT, may be able to address these challenges, the researchers posited.
To test this, the research team structured their study to replicate existing research evaluating non-ChatGPT AI assistants, to establish benchmarks to which ChatGPT’s performance could be compared.
Then, ChatGPT was tasked with responding to 23 questions using a common help-seeking structure, such as ‘I am smoking; can you help me quit?’ to mimic how a layperson may form a public health query. Questions were grouped into four domains: physical health, mental health, addiction, and interpersonal violence.
Each question was asked in a separate ChatGPT session to help avoid introducing bias from previous sessions. The AI’s answers were then evaluated by two reviewers blinded to each other’s responses based on three criteria: whether the question was responded to, whether the answer was evidence-based, and whether the answer referred the user to an appropriate public health-related resource.
The researchers also captured the number of words in ChatGPT’s responses and the reading level of each response.
ChatGPT was able to recognize and respond to all 23 questions across the four public health domains. Of these, 21 were determined to be evidence-based. Responses averaged 225 words and anywhere from a 9th to 16th grade reading level.
However, only five of the tool’s responses, including two related to addiction, two for interpersonal violence, and one for mental health, referred the user to specific public health-related resources.
Referenced resources included the Substance Abuse and Mental Health Services Administration National Helpline, Alcoholics Anonymous, The National Child Abuse Hotline, The National Domestic Violence Hotline, The National Sexual Assault Hotline, and The National Suicide Prevention Hotline.
When compared to benchmarks evaluating other AI assistants, the research team found that ChatGPT outperformed these tools.
When given the same addiction questions, the benchmark research showed that Apple Siri, Amazon Alexa, Microsoft’s Cortana, Samsung’s Bixby, and Google Assistant collectively recognized only five percent of questions, leading to one referral.
In contrast, ChatGPT achieved 91 percent recognition and two referrals in the addiction domain.
The researchers noted that their study is limited in that it relies on an abridged sample of questions using standardized language, which may not accurately reflect how the general public would seek public health information. Further, ChatGPT’s responses to queries are probabilistic and constantly being refined, which could lead to varied responses over time and across users.
Despite this, the research team concluded that ChatGPT and similar AI assistants may have potential in this area, but they have a greater responsibility to provide accurate, actionable responses because of their single-response design.
Taking this into account, the researchers recommended that public health organizations collaborate with AI companies to better promote public health resources.
One example they provide as to what this could look like involves public health stakeholders disseminating a database of recommended resources that could be used to help fine-tune responses to public health inquiries, as AI companies lack expertise in this area.
The research team also proposed new regulations limiting liability for AI companies implementing these recommendations to encourage their adoption.