Chinnapong - stock.adobe.com
Researchers to Propose Framework to Address Federated Learning Challenges
Researchers are proposing an artificial intelligence architecture that addresses common federated learning challenges in the biomedical space.
A research team from the University of Southern California (USC) Viterbi School of Engineering is proposing a novel architecture to address challenges that frequently arise when federated learning (FL) is applied to biomedical research.
FL is a machine-learning (ML) approach that allows researchers to train algorithms without exchanging datasets to help protect patient privacy.
When training traditional models, data is typically stored in a central repository and shared with research teams for algorithm development, making secure health data sharing more difficult. To combat this, FL allows local data samples to be held on decentralized servers, and algorithms are trained independently on-site. From there, models can be sent back to a central server, where they are combined to create a master model.
Despite these benefits, FL creates unique challenges for researchers in the biomedical space. At a presentation at the 2023 International Workshop on Health Intelligence earlier this week, USC researchers will propose a framework, known as Federated Learning INTegration (FLINT), to combat three common challenges: the ‘learning task,’ schema and data harmonization, and missing values.
The ‘learning task’ refers to an FL model’s ability to take the data or information given and make predictions based on what it has learned from that data, the press release stated.
The researchers used their work in neuroimaging tasks to illustrate how FLINT can help models successfully perform the learning task. Leveraging MetisFL, an FL platform the team developed with support from the National Institutes of Health (NIH) and the US Defense Advanced Research Projects Agency (DARPA), the researchers set out to predict the age of a human brain from a structural MRI scan. The difference between the chronological and predicted age can be used to forecast neurological disease risk.
“You have MRI scans distributed across hospitals and you want to analyze what is the difference between the true chronological and the predictive chronological age of the subject. Because the larger the difference between those two values is, the greater the risk of developing a neurological disease. It’s an indicator – or biomarker – of a neurological disease,” explained Dimitris Stripelis, a PhD candidate in the USC Department of Computer Science and co-author of the work, in the press release.
The learning task, in this case, asks: can the model accurately predict the Brain Age Gap Estimation (BrainAGE) using MRIs if it has been trained within the proposed FLINT architecture?
“We show that yes, using our system, you can actually learn that task,” Stripelis continued.
Further, FLINT’s architecture was shown to be secure because no private data left a hospital, and the models were trained under homomorphic encryption, which allows computations to be performed on data without decrypting it, according to Jose-Luis Ambite, PhD, associate research professor of computer science and research team leader at USC’s Information Sciences Institute.
The second challenge is concerned with data integration, specifically schema and data harmonization, which refers to making disparate data formats, values, schemata, access patterns, and characteristics found across data silos compatible.
“Let’s say you have one column that is called ‘DOB’ in one table and ‘birth_date’ in another table. They represent exactly the same attribute, under a different name. Or one site measures weight in kilograms and another in pounds. You have to harmonize the attributes and values in order to do meaningful analysis; that’s data integration,” Stripelis explained.
The third challenge is closely related in that it deals with missing data, which can arise when using information from multiple data silos. Different datasets often have some missing values for particular attributes or may be missing attributes altogether.
These missing data can create a major problem for researchers because a single missing value within a record often means that researchers must drop the record from their study entirely or work to fill in or impute the missing value, Stripelis noted.
Using FLINT, however, the researchers hope to overcome these problems.
“We propose to solve this problem through principled data integration and imputation techniques, so that learning can be done over data that ‘makes sense.’ This is an exciting and ambitious vision that we wanted to share with the research community, but the work is still in progress, so we are presenting it first in a workshop,” said Stripelis.
There is a significant interest in applying FL methods to medical research and clinical settings.
In October, a research team from the University of Pittsburgh Swanson School of Engineering was awarded a $1.7 million NIH grant to develop an FL-based approach to achieve fairness in AI-assisted medical screening tools.
The project, called “Achieve Fairness in AI-Assisted Mobile Healthcare Apps through Unsupervised Federated Learning,” will use smartphones and mobile devices to learn from users’ data collected while engaging with a mobile application to help address issues related to algorithm development and prevent health disparities. The on-device model will then be shared with researchers to help train a shared master model for AI-assisted medical screening.