Getty Images
Framework Enables Real-Time Data Analytics During Pandemic
A connected framework is helping researchers collaborate and use real-time data analytics to inform decision-making during the pandemic.
Scientists at the National Energy Research Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory and the Linac Coherent Light Source (LCLS) at SLAC National Accelerator Laboratory are using a superfacility model to advance real-time data analytics research during the pandemic.
Since the onset of COVID-19, researchers and developers have searched for solutions by collaborating across scientific, medical, academic, and political entities.
To accelerate insights during the pandemic, Berkeley Lab and LCLS are leveraging the superfacility model for real-time data analytics. The superfacility model enables data produced by light sources, microscopes, and other devices to stream in real time to large computing facilities where it can be analyzed and delivered to the science user community.
The model allows for discoveries across datasets, institutions, and domains, and makes data from unique facilities and experiments broadly accessible.
Prior to the pandemic, the LCLS team was using the model for large-scale computing projects. Once social distancing and quarantine guidelines went into effect, the group used the model remotely to uncover new insights during the pandemic.
“When we went into the COVID-19 shutdown last year, LCLS management decided to invest additional resources into COVID-19 research and we got extra funding from the US Department of Energy to accelerate that and get our users ready to do this work remotely,” said Chuck Yoon, an information systems specialist who leads the LCLS Data Analytics Department.
During and after experiments, the LCLS data analytics team used NERSC’s supercomputer to process data and provide results in real time, allowing researchers to monitor the experiment, begin analysis, and make changes as necessary.
Data from the LCLS experiments went to NERSC through the Energy Sciences Network (ESnet), the high-speed computer network that connects US Department of Energy scientists and their collaborators worldwide.
“We have developed this pipeline where we go from raw data to structure,” said Yoon, who also leads the Advanced Methods for Analysis Group at SLAC.
“We have the pieces ready, and the goal is to run it efficiently at NERSC and then display the results in real time at SLAC or for anyone with a laptop.”
NERSC allows for real-time data analysis and helps researchers make informed decisions quickly.
“Previously, you’d have to go home and crunch the numbers after the fact. This way, you can see any bad data while the experiment is still running and find out whether there’s a difference in results between the proteins, so you can move on to the next sample and use your precious beam time well,” said NERSC application performance specialist Johannes Blaschke, who has also been involved with this project.
The efforts from scientists collaborating with NERSC and ESnet have been extremely valuable for real-time data processing.
“The data rates LCLS can produce can only be handled using advanced supercomputing facilities,” said research scientist Aaron Brewster, a project scientist in Berkeley Lab’s Molecular Biophysics and Integrated Bioimaging division who has been working with the LCLS team.
“They allow us to match the processing rate to the data production rate, letting us study small differences in atomic structure in near real time. This in turn guides decision-making processes during the experiment.”
As the healthcare industry has seen throughout the COVID-19 pandemic, experimental data volumes continue to increase. Supercomputers like those at NERSC are becoming increasingly crucial for scientists to spot patterns and trends in large, complex datasets.
The superfacility model allows experimenters to obtain real-time results, an ability that will increase the success of research efforts.
“This is an emerging need, being able to move data between research sites and computing centers,” said Debbie Bard, who leads NERSC’s Data Science Engagement group.
“By building this infrastructure, we’re making it easier for scientists to think about doing science in this way. This particular work with the LCLS is one of many experiments that need our resources. But part of the idea is that every experiment should have this capability.”
With the superfacility approach enabling more sophisticated COVID-19 research, this effort will also help to further develop the superfacility concept by revealing challenges and necessary improvements. NERSC developers want to grow in automation and resilience in the future.
“The goal is for machines to talk to machines, with no humans in the middle to slow things down,” said Bard. “We’re not there yet, but we’re working toward submitting jobs and moving data around in an automated way.”
The collaboration between LCLS and NERSC will advance research both during the pandemic and after the crisis has subsided.
“This work is urgent in two senses: we need to analyze the data coming from the experiment urgently, and the science of COVID research is very urgent,” said Bard.
Researchers will continue to develop new methods and tools to enhance real-time data analysis and supercomputing.
“Looking into the future, we have developed algorithms and artificial intelligence to use the GPUs when Perlmutter comes online, and we have put a lot of work into automating the crystallography analysis,” Yoon said.
“The idea is to get humans out of the loop so that the machines can efficiently process the data and provide near-real-time feedback to help us optimize our experiments.”