Data durability guarantees shouldn't be big concern with cloud storage
Data durability guarantees of cloud storage providers shouldn't be a big point of concern for customers; they're too tough to measure and enforce.
The cloud storage data durability claims of some of the major providers may contain an impressive number of nines -- such as Amazon's 99.999999999% -- but guarantees shouldn't be a point of concern, according to an analyst who tracks the market.
Dan Iacono, a research director with the storage systems practice at Framingham, Mass.-based International Data Corp. (IDC), said data durability guarantees are difficult to measure and enforce, so cloud storage users would be better off questioning providers on the ways in which they protect data and the overall availability of the services.
In this podcast interview with TechTarget senior writer Carol Sliwa, Iacono provides a primer on cloud storage data durability, including a definition, description of the types of data protection the major providers use and an overview of use cases and reduced durability options.
What is durability in the context of cloud storage?
Dan Iacono: I think durability is a hard topic for people to understand. Most people don't even realize that durability is an issue in environments, because I may have terabytes or even petabytes in my environment, but when you're a cloud provider, you're thinking exabytes and zettabytes. What durability really is are those uncorrectable bit errors that you can't recover from, from the underlying media. After you have written a ton of data, one of those bytes might not be correct. I think one of the hard things with the cloud providers is not all of them actually have durability guarantees, and so they all define it a little bit differently.
What types of storage architectures and technology do the major cloud providers use to ensure against data loss?
Iacono: The one thing that they're looking at from a data loss perspective with durability is the concept of bit rot. Bit rot is that one little piece of data, or bit, that goes bad after a period of time. There are two ways that people guard against bit rot. The first is that they actually go through all the data periodically and compare it against a known copy. Cloud providers can do that by having multiple copies of the data. The second way they do it is through a technique called erasure coding, where they have a mathematical algorithm that compares a full data copy versus a smaller version.
You mentioned two forms of data protection. Is one form preferable to the other in the context of cloud storage?
Iacono: There are a couple different use cases. If you're making multiple copies, this is the traditional way the hyperscale providers or cloud providers have been protecting data. The problem with multiple copies of data is that over time, as your capacity grows exponentially, so does the amount of copies. So it really becomes an expensive way to protect data.
With erasure coding, it's definitely a more compact way. So let's say, for every one piece of data, I only have maybe between another 20% to 50% of the data. I can reduce my overall storage footprint, however it requires CPU cycles to actually rebuild that data using erasure coding. So, there are some trade-offs. I think in the longer term, for some of the reduced durable storage, erasure coding will be something that will definitely be used, because they need to get to an aggressive price point. [Also,] erasure coding is really good for environments that have three sites or more for replication.
That's from a cloud provider's perspective. What about from a customer's perspective? Should it make a difference whether a cloud storage provider uses erasure coding or multiple copies to protect data?
Iacono: I think it does. With erasure coding, my actual recall of data, if something happens, will be slower, whereas if I use multiple copies, the rebuild and availability will be much faster. So, first and foremost, you're saying, 'Hey, how are you protecting the data?' The second part, I would look at cost. And then I'd really use the combination together and apply it to the use cases for my data.
With the two types of data protection, which of the major cloud storage providers use erasure coding and which use multiple copies?
Iacono: I think you have to look at who actually has publicly made that available. Anyone that's using, let's say, an Atmos solution from EMC has erasure coding built in, and those would be AT&T as well as Savvis. Microsoft Azure uses erasure coding as well. If you look at the conference, [File and Storage Technologies] FAST 2012, they won best paper for their erasure coding. If you look at Google, they're known for doing multiple copies of the data. And Amazon and some of the other cloud providers, we don't know exactly what kind of protection they're using for durability.
Cloud storage providers such as Amazon and Google offer options for lower levels of durability at reduced costs compared to their main object services. For what types of data is reduced durability adequate, and for what types of data is it not adequate?
Iacono: I think you have to understand two parts of your data. One is, what kind of data do I need to run my business? And we would think of that as our mission-critical data. The second is, what data can I reconstruct pretty easily? In a traditional environment, we have that mission-critical data, and we want it up all the time. So, I would really care about my durability and availability. Now I may have other pieces of data that I can correct that in software, and if I can easily rebuild that data and I can store it at a much lower cost, then I would definitely want to take advantage of reduced durability storage to lower my overall storage costs.
Do you think any customers could use reduced durability cloud storage for all of their data?
Iacono: It really depends on what the upper-level application can do. If you look at a lot of these newer applications, or Web-scale properties, they have a concept where they design with failure in mind. So, they don't believe the data or any of the underlying hardware is actually reliable. They plan for that in their software. So, I think the newer applications that are designed with failure in mind can definitely take advantage of having reduced durability of storage for all their data.
How important is a durability guarantee with a cloud storage provider?
Iacono: The problem with a guarantee is, if I lose a bit of data, how do I know that that bit of data was actually lost because of durability? Will the cloud provider actually admit there was a durability issue? So unless we can measure it, I really wouldn't get too stuck on durability. What I'd really like to know is how the cloud provider is protecting my data. Is it multiple copies or erasure coding? That's what I'm more concerned about rather than the guarantee.
What's the single most important piece of advice you would offer to potential cloud storage users with respect to durability?
Iacono: I think it's an important point to take a look at, but I wouldn't judge my overall decision on durability. Whether it's 13 nines or 17 nines, it really wouldn't matter or sway my decision. What you really want to ask your cloud storage provider is, "How do you protect against bit rot?" And then I would really start to concern myself about overall availability. If you have massive amounts of data in the cloud, then I'd start worrying about durability.