Sergey Nivens - stock.adobe.com
COVID-19 research generates big data, invaluable info
When supporting COVID-19 research that could uncover treatment for the pandemic, IT and storage pros want to make sure they're 'not the bottleneck' in the process.
While COVID-19 researchers race for a cure, their urgent work places pressure on their IT teams to provide the best possible support.
COVID-19 research requires storage that scales enough to handle rapidly growing big data sets and compute that is fast enough to crunch those numbers in real time. Flash storage array vendor Vast Data gathered a virtual panel of customers from Gladstone Institutes, Ginkgo Bioworks, Massachusetts General Hospital (MGH) and Brigham and Women's Hospital Center for Clinical Data Science who described their strategies for supporting COVID-19 research.
The research centers sped to uncover information that could lead to a vaccine or treatment, while adapting to work-at-home conditions caused by the virus.
"From an IT perspective, we just want to not be the bottleneck," said Scott Pegg, CIO and vice president of research infrastructure at Gladstone Institutes in San Francisco. "That's the way we look at it -- we'll find a cure faster if we can compute fast enough that we're never the bottleneck."
Pegg said Gladstone's biomedical researchers work on live viruses to see how they behave and how they can be changed. "These things all develop tons of data," he said. "There's a lot of imaging and sequencing around those experiments."
Analyzing the data requires intensive compute for processes such as AI, and researchers want results in real time -- and they want to leave things like storage, compute and data management to the IT teams.
'Understandable sense of urgency'
"From an IT perspective, we needed to recognize that there's an understandable sense of urgency in getting the COVID-19 research to process," Pegg said. "We needed to make sure that we always had enough storage and compute readily available for those researchers. Particularly with storage, we wanted to make sure that the data lands in one place and the researchers don't have to spend their time trying to tier it to something else to do their high-performance computing. So we needed to make sure we had enough space that was essentially high performance for that data to land on, so they just didn't have to spend their time thinking about it."
Jayashree Kalpathy-Cramer, a researcher at MGH's Athinoula A. Martinos Center for Biomedical Imaging, said her lab's COVID-19 modeling requires studying X-rays and CT imaging to better understand the disease. Those are storage-intensive applications.
"So we have these large amounts of storage because we need to have access to the data," said Kalpathy-Cramer, who is also research director at Brigham and Women's Hospital. "We need computational capability to build these very complex models for potentially predicting who's going to get worse and what the risk factors are. We have access to a lot of different data -- imaging data, clinical data from the lab. How do we best combine all of these data sources to help inform on a patient level how we treat that patient?"
Pegg said because his researchers had better things to do than manage IT resources, the technical people had to make sure it all worked properly.
"The fact is, biological scientists are typically not great at the nuts and bolts of data management," he said. "It's not what they want to be thinking about, and it's not something that's taught as part of your getting a PhD. They don't want to think about how you move data around. So we're attacking this in two ways. One is, we are trying to teach them data management skills, but we realize that's going to take time. So we're just trying to make their lives easier by having the right resources and making everything as performant as possible, wherever their data is going to be."
'Work itself changed'
The data wasn't always where the researchers were. Like most of the country, researchers suddenly found themselves working from home in March. Florencio Mazzoldi, head of digital technology at Ginkgo Bioworks in Boston, said that while some of Ginkgo's COVID-19 researchers continued to work in the lab, most began working from home. That was an unusual situation for a research lab that uses complex equipment.
"COVID hit us from multiple angles," Mazzoldi said. "For instance, a lot of the culture at Gingko is enhanced from being on site. The way the office is laid out, they're sitting around these sophisticated and high-end bio labs. So being in the office is a big culture infusion.
"It was a shock because from a cultural perspective, being outside is really a challenge we're still struggling with," he added. "Processes needed to change. Work itself changed. The availability of the data needed to be there across sites and locations, which is not what we normally do because we're usually in the office."
Mazzoldi said Gingko's process includes DNA sequencing, which generates a great deal of raw data that must be analyzed quickly.
"So there is a ton of information flowing, and there is a ton of computing being done on that information," he said.
That required putting the computational pipeline close to the storage to speed access to the data. He said that's where Vast Data's all-flash storage systems helped with its NVMe flash drives for speed and QLC solid-state drives for bulk storage.
Kalpathy-Cramer said working from home represented a cultural change for researchers at her organization, as well.
"A lot of what we do is collaboration, and all of that became much harder being remote," Kalpathy-Cramer said. "Technically we could access everything we needed, but from a people perspective, it became much harder. There was a digital divide. Not everyone has high-speed internet at home. A lot of people have little have kids. For some, it was much more of an imposition to be suddenly working from home."
Bruce Rosen, a medical physicist and director of the Martinos Center, said there are several positives about IT's response to COVID-19. He said the sudden pandemic spurred investment that enables remote work, even when that involves bringing multiple teams together from locations around the world. The pandemic also pushed IT organizations to provide infrastructure to store and move great amounts of data in real time -- something that will help all researchers.
"We have invested in the infrastructure to be able to store data, move it and out of our GPUs efficiently, to be able to do the model building quickly and efficiently, and we're going to be able to apply those tools broadly," Rosen said. "That's especially true for high-data domains like radiology and pathology, where a 24-hour turnaround for a computational tool is not nearly fast enough, and has to be at the time the data is coming in."
Pegg said Gladstone Institutes COVID-19 researchers insisted on working in their labs, where the data is.
"Those researchers didn't want to go home," he said. "They run towards the gunfire, so to speak. They wanted to immediately get their hands on the virus and start doing research."