AlfaOlga/istock via Getty Images

GPT-4 Matches Ophthalmologists in Glaucoma, Retina Management

Mount Sinai researchers revealed that the large language model GPT-4 can match, and even outperform, human specialists in managing glaucoma or retina disease.

Researchers from the New York Eye and Ear Infirmary of Mount Sinai (NYEE) demonstrated that OpenAI’s Generative Pre-Training–Model 4 (GPT-4) can match, or in some cases outperform, ophthalmologists in the diagnosis and management of glaucoma and retina disorders.

The study, published this week in JAMA Ophthalmology, sought to assess the utility of large language models (LLMs) in ophthalmic subspecialties by determining whether one such tool could provide complete, accurate responses when compared with human specialists.

The research team began by recruiting 15 participants, including 12 attending physicians and three senior trainees, from the Department of Ophthalmology at the Icahn School of Medicine at Mount Sinai. The participants and GPT-4 were then asked 20 common patient questions from a set of retina-and glaucoma-related queries provided by the American Academy of Ophthalmology.

A set of 20 de-identified glaucoma and retinal cases from Mount Sinai–affiliated clinics were also randomly selected for the analysis.

Participant and chatbot responses were evaluated for completeness and accuracy using a Likert scale.

Overall, the accuracy and completeness of the LLM’s responses matched or outperformed those of ophthalmologists. For retina questions, the tool matched the specialists in terms of accuracy but exceeded them in completeness. GPT-4 achieved superior performance on glaucoma questions, outperforming humans on both metrics.

“The performance of GPT-4 in our study was quite eye-opening,” said lead author Andy Huang, MD, an ophthalmology resident at NYEE, in the press release. “We recognized the enormous potential of this [artificial intelligence] system from the moment we started testing it and were fascinated to observe that GPT-4 could not only assist but in some cases match or exceed, the expertise of seasoned ophthalmic specialists.”

“AI was particularly surprising in its proficiency in handling both glaucoma and retina patient cases, matching the accuracy and completeness of diagnoses and treatment suggestions made by human doctors in a clinical note format,” explained senior author Louis R. Pasquale, MD, FARVO, deputy chair for Ophthalmology Research for the Department of Ophthalmology. “Just as the AI application Grammarly can teach us how to be better writers, GPT-4 can give us valuable guidance on how to be better clinicians, especially in terms of how we document findings of patient exams.”

The researchers underscored that further testing is needed, but noted that the study findings highlight the significant potential of LLM technology in ophthalmology.

“It could serve as a reliable assistant to eye specialists by providing diagnostic support and potentially easing their workload, especially in complex cases or areas of high patient volume,” Huang stated. “For patients, the integration of AI into mainstream ophthalmic practice could result in quicker access to expert advice, coupled with more informed decision-making to guide their treatment.”

These results shed light on just one possible application of AI in ophthalmology, but researchers are exploring a variety of tools and use cases.

Last month, researchers from Johns Hopkins Children’s Center revealed that an autonomous AI designed to screen for diabetic eye disease in youths can also positively impact screening uptake.

Eye diseases like diabetic retinopathy impact a significant number of children and adolescents with type 1 and type 2 diabetes, but barriers to screening can prevent timely detection and treatment. These screening gaps are large among minoritized and poor young people, leading to health inequities and worse patient outcomes.

The research team had found in previous work that AI could successfully be used to diagnose diabetic eye conditions, but hypothesized that the tool could also increase the likelihood of screening completion.

Their findings supported this hypothesis, showing that patients who underwent AI-assisted screening were far more likely to complete their follow-up than patients who received standard screenings.

Next Steps

Dig Deeper on Artificial intelligence in healthcare