Getty Images
Understanding the COVID-19 Pandemic as a Big Data Analytics Issue
Big data analytics techniques are well-suited for tracking and controlling the spread of COVID-19 around the world.
The rapid, global spread of COVID-19 has brought advanced big data analytics tools front and center, with entities from all sectors of the healthcare industry seeking to monitor and reduce the impact of this virus.
For more coronavirus updates, visit our resource page, updated twice daily by Xtelligent Healthcare Media.
Researchers and developers are increasingly using artificial intelligence, machine learning, and natural language processing to track and contain coronavirus, as well as gain a more comprehensive understanding of the disease.
In the months since COVID-19 hit the US, researchers have been hard at work trying to uncover the nature of the virus – why it affects some more than others, what measures can help reduce the spread, and where the disease will likely go next.
At the core of these efforts is something with which the healthcare industry is very familiar: Data.
“This is, in essence, a big data problem. We're trying to track the spread of a disease around the world,” James Hendler, the Tetherless World Professor of Computer, Web, and Cognitive Science at Rensselaer Polytechnic Institute (RPI) and director of the Rensselaer Institute for Data Exploration and Applications (IDEA), told HealthITAnalytics.
At RPI, researchers are using big data and analytics to better comprehend coronavirus from a number of different angles. The institute recently announced that it would offer government entities, research organizations, and industry access to innovative AI tools, as well as experts in data and public health to help combat COVID-19.
“We're working with several organizations on modeling and dealing with the virus directly using a supercomputer, and we’ve been creating some websites where we track all the open data and documents we can find to help our researchers find what they're looking for,” Hendler said.
“We also have some work we've been doing in understanding social media responses to the pandemic. One project in particular has focused on tracking data from Chinese social media as coronavirus spread there in mid-January, and then comparing it to American data.”
Between recognizing signs and symptoms, tracking the virus, and monitoring the availability of hospital resources, researchers are dealing with enormous amounts of information – too much for humans to comprehend and analyze on their own. It’s a situation that is seemingly tailor-made for advanced analytics technologies, Hendler noted.
“There are several big data components to this pandemic where artificial intelligence can play a big role,” he said.
“One component is biomedical research. A lot of work is going on to try to develop a vaccine to find out whether there are any current drugs work against COVID-19. All of those projects require molecular modeling, and many of them are using AI and machine learning to map things we know about the virus to things in pharmacological databases and genomic databases.”
Several big-name organizations have launched projects like these – Amazon Web Services, Google Cloud, and others have recently offered researchers free access to open datasets and analytics tools to help them develop COVID-19 solutions faster.
“AI can eliminate many false tracks and allow us to identify potential targets. So instead of trying 100 or 1000 different things, we can we narrow it down to a much smaller size much faster. That's going to accelerate the eventual finding of the vaccine,” Hendler said.
Researchers are also leveraging AI to evaluate the effects of COVID-19 interventions on individuals across the country, Hendler stated.
“A second component has to do with natural language processing and social media. What can we extract from social media that can help our scientists? What can we learn about how people are bearing the burdens and stresses of the pandemic?” he said.
“With SARS and other outbreaks, we never really had to figure out how different social distancing techniques are impacting the spread in different places. You can't just compare numbers, because there are a lot of other factors to consider. AI is very good at that kind of multi-factor learning and a lot of people are trying to apply those techniques now.”
At UTHealth, a team developed an AI tool that showed the need for stricter, immediate interventions in the Greater Houston area. And at Stanford University, researchers have launched a data-driven model that predicts possible outcomes of various intervention strategies.
Using big data and analytics tools of their own, Hendler and his team are aiming to do something similar.
“We have a lot of time series data from China, we have information about airline transportation, and we have population models for each country. Now we’re looking at doing this in our own region, and seeing if we can track and predict the spread based on the kind of social measures taken within different regions,” he said.
“We want to prototype that in our region and then scale it up to the US, and then eventually, the world.”
AI can also help organizations draw on research from the past, applying this knowledge to present and future situations.
“A third area where AI can make an impact is in mining scientific literature,” Hendler said.
“In past years, you had hundreds of grad students reading papers and trying to figure out what was going on. At many universities, there's a lot of effort to say, ‘What can we learn from what’s already been published?’”
While AI and other analytics technologies appear to be the best possible tools for assessing and mitigating a global pandemic, researchers can’t always access what they need to build these models.
“The ideal data is hospital data that would tell us who is experiencing certain impacts from the virus,” Hendler said.
“For example, one project we'd love to do would be to correlate environmental or genomic factors to the people who are getting advanced respiratory problems, which is what’s killing most people with this disease. Is there a genetic component to that? Is it something where environmental factors are some kind of comorbidity? But can we can't get that kind of data because of HIPAA restrictions.”
Instead, research teams should focus on extracting insights from the information they do have available, Hendler said.
“Information about how people are moving, the effect of travel restrictions or stay at home orders, how many people have what – that’s data we can get. The more details we can get, the better, and a lot of that data is starting to be shared because you don't have to say who the people are, just where the people are,” he said.
The unprecedented impact of coronavirus around the world has sparked the need for unprecedented partnerships, and these collaborations will contribute significantly to finding viable solutions.
“In healthcare, academia and industry are mostly set up for people to stay in their own lanes. But people are rapidly beginning to realize that attacking this problem is going to require a collaborative effort,” Hendler concluded.
“To make any real progress in this situation, you need to bring together people who understand the computation and AI, people who understand the biological and biomedical implications, and people who understand population models. It's a very interdisciplinary problem, and to make any headway, we need the data and we need the team.”