Getty Images
UPenn, Intel Partner to Use Machine Learning to Detect Brain Tumors
Using a distributed machine learning approach, the organizations will aim to identify brain tumors while protecting patient privacy.
The Perelman School of Medicine at the University of Pennsylvania (Penn Medicine), Intel, and research institutions from around the world will partner to use a privacy-preserving machine learning technique to identify brain tumors.
The technique, called federated learning, is a distributed machine learning approach that allows organizations to collaborate on deep learning projects without sharing patient data.
Penn Medicine and research organizations from the US, Canada, United Kingdom, Germany, Switzerland, and India will use Intel’s federated learning hardware and software to develop a new machine learning model, trained on the largest brain tumor dataset to date.
According to the American Brain Tumor Association, nearly 80,000 people will be diagnosed with a brain tumor this year, and more than 4,600 of these individuals will be children. Training a model to detect a brain tumor requires a large amount of relevant medical data, but it is critical that this data stays private and protected.
With the federated learning approach, researchers from all partner organizations will be able to work together to build and train a machine learning algorithm to detect brain tumors while protecting sensitive patient data.
“It is widely accepted by our scientific community that machine learning training requires ample and diverse data that no single institution can hold,” said Dr. Spyridon Bakas, University of Pennsylvania.
“With this federation of 29 collaborating international healthcare and research institutions, we will be able to train state-of-the-art AI models for healthcare, using privacy-preserving machine learning technologies, including federated learning.”
In January 2019, Penn Medicine and Intel published a paper demonstrating that the federated machine learning method could train a model to over 99 percent of the accuracy of a model trained in the traditional, non-private method.
The new project will provide additional protection to both the model and the data. The work is being funded by the Informatics Technology for Cancer Research Program (ITCR) program of the National Cancer Institute, through a three-year, $2.1 million grant awarded to Bakas, the principal investigator of the project.
“This year, the federation will begin developing algorithms that identify brain tumors from a greatly expanded version of the International Brain Tumor Segmentation (BraTS) challenge dataset,” said Bakas.
“This federation will allow medical researchers access to vastly greater amounts of healthcare data while protecting the security of that data.”
The subset of collaborating organizations expected to participate in initiating the first phase of this federation includes the Hospital of the University of Pennsylvania, Washington University in St. Louis, the University of Pittsburgh Medical Center, Vanderbilt University, Queen’s University, Technical University of Munich, King’s College London, and others.
“AI shows great promise for the early detection of brain tumors, but it will require more data than any single medical center holds to reach its full potential,” said Jason Martin, principal engineer at Intel Labs.
“Using Intel software and hardware and support from some of Intel’s brightest minds, we are working with the University of Pennsylvania and a federation of 29 collaborating medical centers to advance the identification of brain tumors while protecting sensitive patient data.”
Researchers across the healthcare industry have been working to build datasets that can help refine AI and machine learning models for brain conditions.
Recently, the Radiological Society of North America (RSNA) created a public medical imaging dataset of expert-annotated brain hemorrhage CT scans, leading to the development of machine learning algorithms that can help detect and characterize this condition.
“The value of this challenge is to create a dataset that might lead to a generalizable solution, and the best way to do that is to train a model from data originating from multiple institutions that use a variety of CT scanners from various manufacturers, scanning protocols and a heterogeneous patient population,” said Flanders.
“In this case, we had data from three institutions and international participation. The dataset is unique, not only in terms of the volume of abnormal images but also the heterogeneity of where they all came from. The dataset we created for this challenge will endure as a valuable ML research resource for years to come.”