Getty Images

ChatGPT Accurately Answers Ophthalmic Knowledge Assessment Questions

ChatGPT-4 responded correctly to 84 percent of multiple-choice questions used to prepare for the American Board of Ophthalmology certification exam.

In a research letter published last week in JAMA Ophthalmology, researchers demonstrated that ChatGPT-4 provided correct responses to 84 percent of a sample of multiple-choice questions from OphthoQuestions, a practice question bank commonly leveraged by trainees preparing for ophthalmology board certification exams.

The research is an update of a previous investigation evaluating an earlier version of ChatGPT on its ability to answer 125 text-based multiple-choice questions provided by the OphthoQuestions free trial. In this previous work, the study authors found that the chatbot answered 46 percent of questions correctly, indicating that the tool may be unable to provide substantial assistance to those preparing for the American Board of Ophthalmology board certification exam.

However, the chatbot has since been updated with the release of ChatGPT-4 in March 2023. This version of the tool is touted as having broader general knowledge and problem-solving abilities than its predecessors, with improved performance on various tasks according to its developers.

To evaluate whether ChatGPT-4 could achieve higher performance than the prior iteration used in the April 2023 research, the study authors tasked the updated version of the tool with answering the same 125 practice questions for the Ophthalmic Knowledge Assessment Program (OKAP) and Written Qualifying Exam (WQE) tests from the free OphthoQuestions trial.

This evaluation relied on the same methodology used in the previous work, meaning that the tool was presented with a consecutive sample of text-based multiple-choice questions.

The chatbot’s performance was measured in terms of the number of questions it answered correctly, the proportion of questions for which it provided additional explanations, the mean length of questions and responses provided, and the performance of the tool in answering questions without multiple-choice options.

The researchers also recorded the proportion of ophthalmology trainees using the OphthoQuestions trial to determine who selected the same answers as ChatGPT.

The chatbot was asked to answer these questions on March 18, 2023, and it correctly answered 84 percent of the multiple-choice queries, significantly outperforming the earlier version of the tool.

ChatGPT-4 also provided correct responses to 100 percent of the questions about general medicine, retina and vitreous, and uveitis. In the previous study, the tool achieved a 79 percent in general medicine but answered zero questions correctly in the retina and vitreous categories.

ChatGPT-4’s performance in clinical optics reached 62 percent accuracy, and the tool provided explanations for 98 percent of questions.

On average, ophthalmology trainees chose the same response to multiple-choice questions as the chatbot 71 percent of the time compared to 44 percent of the time in the April 2023 research. In addition, ChatGPT-4 responded correctly to 63 percent of stand-alone questions with multiple-choice options removed.

Despite ChatGPT-4’s improvements over its predecessor in the previous work, the research team cautioned that this study has several limitations. They indicated that OphthoQuestions offers preparation materials for OKAP and WQE, meaning that ChatGPT may perform differently in official examinations.

Further, the chatbot is designed to provide unique responses to users’ queries, which could lead to the tool responding differently if the experiment described here were to be repeated. The researchers also noted that their previous study may have helped train the chatbot used in this work, and that the results of the present research must be considered within the context of the study date, as ChatGPT’s body of knowledge continues to expand.

This work reflects a growing interest in ChatGPT’s applications and indications that the tool will transform healthcare.

As highlighted in this study, researchers have recently been exploring the chatbot’s utility in medical education. Recent work in the area demonstrates that ChatGPT can answer competency-based medical education (CBME) questions on microbiology and pass the United States Medical Licensing Exam (USMLE).

However, the tool must be further studied and refined, as other research shows ChatGPT failing both the American College of Gastroenterology’s self-assessment tests and the American Urological Association Self-assessment Study Program exam.

Next Steps

Dig Deeper on Artificial intelligence in healthcare