Getty Images
Researchers Investigate When, How Healthcare AI Models Will Fail
Researchers are exploring strategies to identify when AI suggests a solution for clinical decision-making that isn’t effective in practice.
Researchers from the Carle Illinois College of Medicine (CI MED) at the University of Illinois are investigating when and how medical artificial intelligence (AI) and machine learning (ML) models will fail or not perform as expected in an effort to improve these models.
Many healthcare AI models perform well in certain settings but can experience drops in performance once they are deployed elsewhere, the researchers explained.
“Every domain in health care is using machine learning in one way or another, and so they’re becoming the mainstay of computational diagnostics and prognostics in healthcare,” said Yogatheesan Varatharajah, PhD, a research assistant professor in the Department of Bioengineering at the University of Illinois at Urbana-Champaign, in the news release. “The problem is that when we do studies based on machine learning — to develop a diagnostic tool, for example — we run the models, and then we say, okay, the model performs well in a limited test setting and therefore it's good to go. But when we actually deploy it in the real world to make clinical decisions in real time, many of these approaches don't work as expected.”
The news release further notes that natural variability between the data used for model creation and the data leveraged during model deployment is one of the main drivers behind these differences in model performance. The variability may result from differences in protocol or hardware utilized to collect the data or from differences between the patients in the datasets used.
Regardless of where these differences come from, they can cause significant shifts in model performance and predictions, worsening patient care.
“If we can identify those differences ahead of time, then we may be able to develop some additional tools to prevent those failures or at least know that these models are going to fail in certain scenarios,” Varatharajah said.
The research team’s most recent work in this area, which was presented at the Conference on Neural Information Processing Systems (NeurlPS), is focused on identifying those differences within the context of electroencephalogram (EEG)-based ML models.
The researchers began by evaluating models that used EEG recordings and other electrophysiological data collected from patients with neurological diseases. They analyzed the clinically relevant applications for each model, such as comparing normal and abnormal EEGs to differentiate between them.
“We looked at what kind of variability can occur in the real world, especially those variabilities which could cause problems to machine learning models,” said Varatharajah. “And then we modeled those variabilities and developed some ‘diagnostic’ measures to diagnose the models themselves, to know when and how they are going to fail. As a result, we can be aware of these errors and take steps to mitigate them ahead of time, so the models are actually able to help clinicians with clinical decision making.”
The research team suggested that this work may help clinicians make better decisions about patient care while using healthcare AI by bridging the gap between large-scale study findings and factors related to smaller or local patient populations.
"The significance of this work lies in identifying the disconnect between data that AI models are trained on, compared to the real-world scenarios that they interact with when they are deployed in hospitals," said Sam Rawal, co-author of the research and an MD student at CI MED, in the press release. "Being able to identify such scenarios in the real world, where models may fail or perform unexpectedly, can help guide their deployment and ensure they are being utilized in a safe and effective manner."