Rasi Bhadramani/istock via Getty

Auditing Framework Provides Insights into ‘Black Box’ Medical AI

‘Black box’ AI models pose a challenge for healthcare applications, but a new framework may help users understand the tools’ decision-making processes.

Researchers from Stanford University and the University of Washington have developed an auditing framework designed to shed light on the ‘black box’ decision-making processes of healthcare artificial intelligence (AI) models.

The ‘black box’ problem is a persistent issue in which users of an AI tool cannot see how it makes decisions. Since the system’s inner workings are invisible, users often have a more difficult time trusting and accepting the model’s outputs.

This lack of trust is a major barrier to AI implementation in healthcare, leading stakeholders to push for increased explainability in the tools.

To that end, the researchers set out to develop an auditing approach for these models to reveal their inference processes.

The framework utilizes human expertise and generative AI to assess classifiers, algorithms that help categorize data inputs.

To test the approach, the research team studied five dermatology AI classifiers by tasking them with characterizing images of skin lesions as either “likely benign” or “likely malignant.” From there, trained generative AI models paired with each classifier were used to create modified lesion images designed to appear either “more benign” or “more malignant” to each classifier.

Then, two dermatologists evaluated the images to determine which image features had the most significant impact on the classifiers’ decision-making.

In doing so, the researchers found that the classifiers use both undesirable features – like background skin texture and color balance – and features leveraged by human dermatologists – like lesional pigmentation patterns.

“It could be that the training set for a particular dermatology AI classifier contained a very high number of images of true, biopsy-confirmed melanomas that happened to appear on hairy skin, so the classifier has made an incorrect association between melanoma likelihood and hairiness,” explained senior study co-author Roxana Daneshjou, MD, PhD, an assistant professor of biomedical data science and dermatology at Stanford University, in the news release.

The research team indicated that these insights into model decision-making could help developers determine whether their AI is relying on spurious correlations in datasets, which may help resolve these issues prior to deployment in healthcare settings.

Addressing the ‘black box’ problem in dermatology AI is also a major priority for researchers as more of these tools become available to consumers in Apple and Android app stores.

“These direct-to-consumer apps are concerning because consumers really don’t know what they’re getting at this point,” said Daneshjou. “Understanding AI algorithms’ decisions will be valuable for showing if these models are making decisions based on clinically important features.”

The research team further underscored that explainable AI approaches are key to boosting the tools’ accuracy and users’ confidence.

“It’s important that medical AI classifiers receive proper vetting by interrogating their reasoning processes and making them as understandable as possible to human users and developers," Daneshjou stated. "If fully realized, AI has the power to transform certain areas of medicine and improve patient outcomes.”

Concerns around the ‘black box’ problem in healthcare AI often come alongside worries about clinicians becoming dependent on these tools.

In a November interview with HealthITAnalytics, experts from Sentara Healthcare and UC San Diego Health discussed the challenges of ‘black box’ AI, how to combat these issues, and how healthcare organizations should approach questions of clinician over-reliance on AI tools.

Next Steps

Dig Deeper on Artificial intelligence in healthcare