Giovanni Cancemi - Fotolia
DNA data storage developments demonstrate serious potential
As futuristic as it may sound, writing to DNA-based data storage is well on its way to becoming a possibility. Mainstream use may be a ways away, but development has been steady.
Within the past decade, university and corporate researchers have turned their attention to deoxyribonucleic acid as a possible way to store data. DNA data storage offers density and durability that far exceeds any of today's storage media -- be it tape, flash or optical drives. DNA has also been around for billions of years, so it's not likely to become obsolete anytime soon.
It's no surprise that scientists are so focused on DNA-based data storage right now. The world is producing more data than ever, and those numbers will only continue to grow. According to the IDC report "Data Age 2025: The Evolution of Data to Life-Critical," the world will be producing 163 zettabytes of data annually by 2025. To store this amount of data, you would need approximately 16 modern 12 TB hard disk drives. Even if this were financially feasible, the drives would require vast amounts of space and energy, while suffering from relatively short lifespans. DNA has the potential to address many of these issues.
That's not to say that DNA-based storage doesn't come with its own set of challenges; it's costly, slow and prone to errors. Even so, researchers have been making steady progress in meeting these challenges, with some notable recent successes.
How DNA works for storage
DNA is a self-replicating material that forms naturally within a biological cell. DNA encodes information about a cell's characteristics and functions, providing the genetic instructions necessary to shape the cell's host organism.
DNA contains four molecular structures known as nucleotides -- adenine, cytosine, guanine and thymine -- that are joined together into base pairs, with two different nucleotides per pair. Together, the base pairs form a linear strand, or oligonucleotide, with each base pair representing a rung on the oligonucleotide ladder, resulting in the familiar double-helix chain commonly seen in science journals and company logos.
DNA data storage uses the nucleotides to represent the binary ones and zeros that provide the foundation for today's digital data. Storing data in DNA is a basic two-step process:
- Translation software converts a file's binary data into sequences of nucleotide base pairs that correlate with the bit patterns.
- A synthesizer builds DNA strands based on the nucleotide sequences. A synthesizer is a scientific instrument that uses synthetic biological engineering technologies to create artificial DNA molecules, a process known as synthesizing.
Retrieving data that has been encoded into synthetic DNA is also a two-step process:
- A sequencer decodes the DNA nucleotides within the oligonucleotides in a precise order and returns their genetic code, a process known as sequencing. Like a synthesizer, the sequencer is a scientific instrument, but in this case, it is used to automate the sequencing operations.
- A translation program converts the results returned by the sequencer to a binary format based on the same bit patterns originally used to convert the data.
Synthesizing and sequencing DNA have become standard practices in today's bio-industries. As a result, much of the technologies needed to support DNA data storage already exist.
The DNA promise
Researchers are turning to DNA storage because it potentially offers a number of advantages over today's storage media. One of the biggest advantages is its density, which is many orders of magnitude greater than any current storage medium. One gram of DNA can hold millions of gigabytes of data.
DNA is also extremely durable. By some estimates, if the DNA is kept cool and dry, without being exposed to light or radiation, it could last thousands of years and never become obsolete. Plus, given DNA's central role in cellular development, scientists will no doubt continue to study it and pursue better ways to synthesize and sequence it, without DNA data storage suffering the same fate as the obsolete floppy disk.
DNA also has the potential of leading to significant costs savings, in part because it requires much less space and energy to store compared to today's media, but also because synthesizing and sequencing technologies will continue to grow more efficient while dropping in price, as researchers dig further into DNA's inner workings.
Despite this potential for cost savings, however, one of today's biggest challenges to adopting DNA-based data storage in any significant way is the high price tag that comes with synthesizing and sequencing DNA. Storing a couple hundred megabytes of data this way can easily cost thousands of dollars.
Another challenge is that writing data to DNA is an extremely slow process as a result of trying to convert all those bit patterns to nucleotides. Plus, RAM has been difficult to achieve with DNA data storage, requiring the DNA to be sequenced in large blocks and slowing down the reading process as well. In addition, the synthesizing and sequencing processes themselves can be prone to errors that occur at the molecular level, which can translate to data loss or corruption.
Making DNA work
Despite the challenges that come with DNA-based storage, the technology shows enough promise for scientists to continue to look for practical answers. For example, researchers at Catalog Technologies have come up with a way to make DNA storage more economical for long-term data archiving by decoupling the synthesizing and sequencing processes. Rather than mapping individual bits to nucleotide base pairs, they're synthesizing large quantities of relatively few DNA types that serve as building blocks for encoding data.
Researchers at the University of Padua in Italy are also looking for ways to improve DNA data storage for archival purposes by using bacterial nanonetworks and individual plasmids, features within bacterial cells that carry genetic information. The bacteria can be used to reliably access specific data from different storage locations, using a technology known as the molecular positioning system, which enables bacteria to sense chemoemissions and mobilize toward a specific location.
Researchers at the University of Illinois at Urbana-Champaign are working on a solution to achieve error-free RAM for DNA-based data storage. Their approach is based on selective amplification of specific data to accelerate reads without needing to sequence the entire DNA pool. To carry out this method, they add two unique sequences (primers) to each oligonucleotide, one at each end, using a simple key-value architecture to identify the primers.
Microsoft and the University of Washington have also been working together on a similar technique to achieve error-free RAM. Researchers from these organizations recently demonstrated the ability to retrieve specific files from over 400 MB of data. Microsoft plans to introduce a prototype of a commercial DNA data storage system by 2020.
Plenty of other organizations, such as the Defense Advanced Research Projects Agency, are also looking seriously into DNA for storage. At the same time, synthesizing and sequencing processes are steadily improving, and the prices are coming down. Given the massive amounts of data that analysts expect, any hope of storing that data lies with technologies far more advanced than today's media. DNA certainly has the ability to meet that need, if its practical application can be fully realized.