Getty Images

ChatGPT shows potential for clinical knowledge review

ChatGPT could help clinicians more effectively review medical literature by prioritizing and summarizing research abstracts from journals related to their specialties.

University of Kansas Medical Center researchers have demonstrated that ChatGPT could help clinicians keep up with ever-growing medical knowledge by prioritizing and summarizing journal abstracts.

The research team emphasized that global clinical knowledge is expanding at a pace that makes it difficult for clinicians to stay abreast of new medical literature and practice guidelines.

"There are about a million new articles added to PubMed every year," explained Daniel Parente, MD, Ph.D., assistant professor of family medicine and community health at the University of Kansas Medical Center, in a press release. "Even if you're a physician restricting your focus to your field, it can still be many thousands of articles you might think about reading."

The study authors further noted that on top of sifting through the literature to find articles relevant to their fields, clinicians must then review each article. Article abstracts can help streamline this process, but reviewing these -- many of which are around 300 words -- can be time-consuming, as well.

This study shows us that these tools already have some ability to help us review the literature a little bit faster, as well as figure out where we need to focus our attention.
Daniel Parente, MD, Ph.D. Assistant professor of family medicine and community health, University of Kansas Medical Center

Given recent developments in AI technology, the researchers set out to investigate whether a large language model (LLM) could be used by clinicians to systematically review medical literature.

The team selected ChatGPT-3.5 and tasked the tool with summarizing 140 peer-reviewed abstracts from 14 journals. To assess the LLM's performance, human physicians were asked to rate the quality, accuracy and bias of the ChatGPT-generated summaries.

From there, the researchers compared how well both ChatGPT and the clinicians could rate the relevance of each journal and abstract to particular medical specialties.

The analysis revealed that the LLM's summaries were on average 70% shorter than the original abstracts, reducing the length from 2,438 to 739 characters. The human raters determined that ChatGPT's summaries were also generally high-quality and accurate, with low bias.

However, the LLM was found to have hallucinated in four of the 140 cases, and there were 20 instances of minor inaccuracies found. Despite the presence of these inaccuracies, it was established that they did not change the meanings of the original abstracts.

ChatGPT was less successful at identifying relevance. The LLM performed similarly to the clinicians when classifying whether an entire journal was relevant to a given specialty, but the model fell short when asked to do the same with individual articles.

"We asked the human (physician) raters to say, is this relevant to primary care or internal medicine or surgery? And then we compared to ChatGPT's relevance ratings, and we found that at least the ChatGPT-3.5 model is not quite ready to do that yet. It works well at identifying if a journal is relevant to primary care, but it's not great for identifying if an article is relevant to primary care," Parente noted.

These findings led the researchers to conclude that the use of ChatGPT in healthcare could help family physicians streamline their literature review process, and the team designed software for this purpose during the study. However, the authors underscored that critical medical decisions should still be made based on thorough evaluations of full-text research and clinical guidelines.

The researchers also indicated that as new versions of ChatGPT are released, they are likely to get better at determining the relevance of scientific articles.

"This study shows us that these tools already have some ability to help us review the literature a little bit faster, as well as figure out where we need to focus our attention," said Parente. "And it seems very likely that future versions of these technologies that are smarter and more capable will only enhance that."

Shania Kennedy has been covering news related to health IT and analytics since 2022.

Next Steps

Anatomy-aware GenAI shows promise in medical imaging

Dig Deeper on Artificial intelligence in healthcare