Getty Images

ChatGPT Accurately Selects Breast Imaging Tests, Aids Decision-Making

Researchers have demonstrated ChatGPT’s potential in aiding clinical decision-making for breast cancer screening and breast pain imaging tests.

Researchers from Mass General Brigham have shown that ChatGPT and other large language models (LLMs) may be able to accurately identify appropriate imaging tests for certain clinical presentations, such as breast pain and breast cancer screening, and have the potential to support clinical decision-making.

The study, published last week in the Journal of the American College of Radiology, compared ChatGPT 3.5 and 4’s capacity for radiologic clinical decision support in a breast imaging pilot. The researchers noted that studies evaluating LLMs’ ability to support clinical decision-making are currently lacking, particularly in radiology.

To narrow this gap, the research team tasked ChatGPT with helping them choose appropriate imaging tests for a group of 21 fictional patient scenarios involving reported breast pain or the need for breast cancer screening.

ChatGPT’s responses were then compared to the American College of Radiology's (ACR) Appropriateness Criteria for these clinical presentations. The press release indicates that these guidelines are used by radiologists to determine which test would be most appropriate based on the patient’s symptoms and medical history.

ChatGPT’s performance was measured based on how the tool responded to open-ended and ‘select all that apply’ (SATA) formatted prompts given by the researchers. The tool’s responses were scored according to whether the imaging modalities recommended were in line with ACR guidelines.

Both ChatGPT 3.5 and 4 achieved high performance, but 4 significantly outperformed 3.5.

For breast cancer screening prompts, both versions of ChatGPT achieved an average score of 1.830 out of two on open-ended questions. For SATA prompts, ChatGPT 3.5 suggested the correct imaging tests on average 88.9 percent of the time, while ChatGPT 4 scored a 98.4 percent.

For breast pain, ChatGPT 3.5 achieved an average score of 1.125 on open-ended prompts, compared to ChatGPT 4’s 1.666. ChatGPT 3.5 also achieved a SATA score of 58.3 percent, while ChatGPT 4 scored a 77.7 percent.

These findings suggest that LLMs may have the potential to help primary care providers and referring clinicians choose the most appropriate imaging tests for their patients, the researchers stated.

"In this scenario, ChatGPT's abilities were impressive," said corresponding author Marc D. Succi, MD, associate chair of Innovation and Commercialization at Mass General Brigham Radiology, in the press release. "I see it acting like a bridge between the referring healthcare professional and the expert radiologist — stepping in as a trained consultant to recommend the right imaging test at the point of care, without delay. This could reduce administrative time on both referring and consulting physicians in making these evidence-backed decisions, optimize workflow, reduce burnout, and reduce patient confusion and wait times."

Despite the potential of LLMs in this area, though, the research team highlighted that these tools are purely assistive technologies.

"This study doesn't compare ChatGPT to existing radiologists because the existing gold standard is actually a set of guidelines from the American College of Radiology, which is the comparison we performed,” Succi explained. “This is purely an additive study, so we are not arguing that the AI is better than your doctor at choosing an imaging test but can be an excellent adjunct to optimize a doctor’s time on non-interpretive tasks."

Further, the researchers stated that any models implemented in the clinical setting would need to be extensively evaluated for privacy- and bias-related concerns. The research team also indicated that for a model to be successful in this area, it would need to be finetuned with data from hospitals and research institutions to tailor it to specific patient populations.

This is the latest study investigating how ChatGPT and other LLMs may transform healthcare.

Last week, researchers showed that ChatGPT can consistently provide evidence-based answers to public health queries, but the tool often offered advice to users, rather than referrals to resources.

The research team posited that artificial intelligence (AI) assistants, such as Amazon Alexa and Google Assistant, may have the potential to support public health by providing the public with actionable and accurate information.

However, such tools can fail to adequately answer or even recognize public health questions, leading researchers to turn to ChatGPT. When compared to these tools, ChatGPT outperformed them across multiple domains.

Despite this, the tool’s reliance on providing advice instead of referring users to public health resources led the research team to suggest that public health organizations work with AI companies to promote these resources.

Next Steps

Dig Deeper on Artificial intelligence in healthcare