cecs - Fotolia

When long-term data archiving means 'forever'

Storage customers such as the German data processing service center GWDG have started to think about how they can archive data forever. It's challenging and complex, but possible.

What do you do with data you need to archive forever?

First of all, long-term data archiving for decades upon decades is not as simple as loading up tapes and putting them on the shelf.

"There's a lot of pre-planning necessary," said Ramin Yahyapour, professor and CIO at the University of Göttingen in Germany. "Just archiving it is not sufficient."

Having partners provide tools is important for long-term data archiving, said Yahyapour, who is also managing director at GWDG, a data processing service center for the University of Göttingen. The service center uses Quantum StorNext to manage 25 PB of data, some of which needs to be retained forever.

Implementation and challenges of the 'forever' archive

"The Code of Conduct for public-funded research in Germany now requires at least 10 years archiving of data to assure reproducibility of research," Yahyapour said. "However, we have plenty of data sets that are considered cultural heritage and not replaceable. As such, we preserve and curate such data on a 'forever' perspective."

The GWDG, which provides the archiving back end for multiple institutions, has collected historic samples from social and natural sciences. Certain animals, plants and languages do not exist anymore and, thus, cannot be recreated, Yahyapour said. For example, the sounds of an extinct bird should remain in archives forever.

Books and other artifacts belong in the "forever" category, as well. The Göttingen State and University Library has a mission to preserve and collect 17th century items. The library aims to digitize those books from the 17th century -- that's a lot of text.

Ramin YahyapourRamin Yahyapour

Yahyapour said he estimates his organization manages about 5 PB of data in the "forever" archive. That data is part of the 20 PB in a Quantum tape archive within the StorNext file system.

Tape is safe but has slow access. Yahyapour said he also uses disk-based storage, which enables easier data reuse.

Most long-term archival data, though, is never accessed. As a result, policies for data location are important.

There are other significant challenges to long-term data archiving.

The GWDG, which was founded in 1970 and has provided an archiving service for 40 years, constantly needs to renew its architecture and migrate data, Yahyapour said. It has used Quantum for close to 15 years and renewed its libraries about two years ago. Tapes have a 20- to 30-year lifespan, but the technology to manage them typically lasts eight to 10 years.

All the time, you're thinking of the next migration. It's a long-term mission.
Ramin YahyapourProfessor and CIO, University of Göttingen

It can take two years to migrate all the data.

"It's quite complex work," Yahyapour said. "All the time, you're thinking of the next migration. It's a long-term mission."

A long-term data archiving process also needs to consider file formats. "Can you still read your Microsoft Word file from 1995?" Yahyapour said.

Tools to work with files and manage the evolution of file formats are helpful.

It's a lot of effort, but it's worth it, Yahyapour said, as the long-term archiving has been successful, with no data loss.

'Think different' about archiving

Long-term data archiving has emerged as a concern for customers, such as organizations in life sciences and media and entertainment, said Eric Bassier, senior director of product marketing at Quantum, which is based in San Jose, Calif. For example, media and entertainment organizations would likely want to retain original copies of certain films and sports moments.

"We're finding these companies want to keep these data sets forever," Bassier said.

Object storage is going to be a critical technology for these archives. Objects are easier to search, access and repurpose, Bassier said.

Tape will also play a key role. Tapes cost less compared to other media, use low power and have a long lifespan.

The long-term data archiving will require smart software that knows storage locations and can make copies to the next generation, Bassier said.

Education is important, as well. Quantum has started to get the message out about this emerging challenge.

"It's a different way to think about archiving or building an archive," Bassier said.

Dig Deeper on Archiving and tape backup