How Can AI Chatbots Help Docs Tailor Patient Education?

AI chatbots powered by large language models still might miss the mark on delivering patient education, but data shows they can help doctors tailor information.

Sara Heath, Executive Editor

Published: 13 Feb 2024

A sample of three studies presented at the 2024 Annual Meeting of the American Academy of Orthopaedic Surgeons (AAOS) is casting doubt on AI chatbots’ and large language models’ abilities to accurately answer patient questions.

However, one of the studies did indicate that the tools could help providers identify frequently asked questions for patients and use that to tailor patient education strategies.

Large language models have had a staggering effect on the healthcare industry over the past year. The technology, which powers chatbots like Open AI ChatGPT, Google Bard, and BingAI, has the potential to augment how providers deliver medicine and how patients seek out health information. For example, patients could use LLM-powered chatbots to learn more about a medical condition or procedure.

But across the three studies, researchers indicated that orthopaedists are still the best sources for this kind of information, with chatbots demonstrating serious limitations.

The first study compared ChatGPT, Google Bard, and BingAI, prompting each chatbot to answer 45 orthopaedic-related questions, ranging from “Bone Physiology” to “Patient Query.” Two reviewers scored the responses on a four-point scale, assessing them for accuracy, completeness, and useability.

ChatGPT was the most successful, providing correct answers that covered critical points for 76.7 percent of queries. Google Bard performed second-best, giving correct answers for 33 percent of queries, while BingAI did so for 16.7 percent of queries.

But even the best of chatbots had limitations. For example, ChatGPT and Google Bard failed to elicit patient history when answering what the researchers considered less complex patient queries. All three chatbots deviated from standards of care and key steps in workup when giving information about care management.

In an analysis of the chatbots’ citations, the researchers said all chatbots were oversampled from a small number of references.

A separate group of researchers also tested a list of 80 commonly asked patient questions, this time about knee and hip replacements. The study focused specifically on ChatGPT and scored answers on a scale of one to four by two surgeons.

About a quarter (26 percent) of responses had an average scale of three, which meant it was partially accurate but incomplete. Performance was better when ChatGPT was prompted to answer questions “as an orthopaedic surgeon,” the researchers said, but the results still indicated that orthopaedists are best suited to direct patient education.

One study did indicate that ChatGPT could play some role in clinician-led patient education, indicating that the tool could help determine top-of-mind questions for patients.

In the third study, a group of researchers conducted their own Google search to identify the top 10 frequently asked questions about a Latarjet procedure, as well as associated sources about the procedure. ChatGPT had a 100 percent success rate, while Google mostly offered up surgeons’ personal websites for search results rather than clinical information.

What’s more, ChatGPT’s FAQ list was more expansive. While both ChatGPT and Google both churned up potential FAQs about the procedure’s technical details, ChatGPT added FAQs about potential risks or complications, recovery timeline, and evaluation of surgery.

These three studies contribute additional context to healthcare’s overall generative AI conversation. In terms of patient-facing AI chatbots, the question of medical misinformation and the validity of advice is paramount.

As it stands, patient trust in AI chatbots and generative AI is middling, with only around half of patients saying they trust the tools. The risk of consuming medical misinformation is top-of-mind for consumers, but most say that as they learn more about AI in healthcare they could likely come to embrace it.

How Can AI Chatbots Help Docs Tailor Patient Education?

AI chatbots powered by large language models still might miss the mark on delivering patient education, but data shows they can help doctors tailor information.

Next Steps

Dig Deeper on Patient data access

What is the Google Gemini AI model (formerly Bard)?

Differences between conversational AI and generative AI

Chatbots show promise in simplifying pathology reports for patients

AI Chatbots Provide Inconsistent Musculoskeletal Health Information