A tiered storage model is different from caching
In this podcast, Evaluator Group strategist Randy Kerns looks in-depth at today's tiered storage model, including how it differs from flash caching.
Storage tiering has taken on added significance in recent years because of the emergence of solid-state drives (SSDs) in enterprise storage. It has also come to be confused with caching, which plays a significant role with SSDs. In this podcast, Evaluator Group senior strategist Randy Kerns drills into the specifics of a tiered storage model, explains the differences between tiering and flash caching, and looks at the benefits that tiering brings to hybrid arrays combining flash and spinning disk.
Has the storage industry more or less settled on an established definition of storage tiering?
Randy Kerns: When you talk to people in the IT community who are actually deploying and using systems, they're pretty set on that. They believe that storage tiering is moving data between spinning disk and solid-state technology from the standpoint of doing it within a storage system.
Now there are a number of systems out there that have maybe two different types of spinning disk; some even with two different types of SSDs in them. But from the customer community, they simplify it and just talk about tiering within the system and moving it from spinning disk to SSD or back the other way.
Now there are people who have experience doing external tiering, if you will, and that's between systems (with a more programmatic software) to move data based on either user controls or some type of usage access controls. But for the most part, when you talk to them about tiering today, they're referring to doing it within the storage system, moving between spinning disk and SSDs.
What is the difference between tiering and caching?
Kerns: That's a great question because sometimes it tends to be confused and there's also a lot of information out there that is not very definitive. In general, tiering is actually moving the data from one location to another. Where it is moved to becomes its absolute location, and where it came from, that space can be reused for something else.
Caching is putting data in a typically higher-performance location, like in DRAM or solid-state memory, for access and speed purposes, and the location where it's really stored is still somewhere else. It's probably on spinning disk. So, in that case, I have it in two locations at once, or at least have a copy of some of the data in two locations at once. [And] I'm really paying for it being stored in two different places. In tiering, it's in one place or the other, and I move it, or the systems move it, based on some type of control.
How well do today's systems that tier data best take advantage of solid-state technology?
Kerns: There's a lot of discussion about this, and certainly a lot of competitive issues come out. In general, traditional storage arrays that have added solid-state technology do a good job of moving the data and tiering. They have the controls based on patterns of access, the type of data, etc., [and] they do a good job in their intelligence and their algorithms to move the data.
They'll move it to solid-state storage based on the need for performance, or they'll move it out of solid-state when the access has dropped or because higher demand data wants to push it out, if you will. That is done very effectively.
Now there are a lot of other types of systems that do caching as well, and those that do tiering and caching also have another level of intelligence, and that's all based upon expanding the size of the cache with some solid-state technology.
Are there third-party or independent software solutions out there that can tier data as well or better than some system-vendor approaches?
Kerns: If you consider the tiering done inside of the storage system, the answer would be no. Those systems have the best intelligence, the best internal capabilities to move the data around. So if something outside tells it to move it to another LUN, if you will, that is a different type, [and] it won't be as effective as an intelligent system.
But if you consider external storage tiering, where you're going to actually move the data from one storage system to another, or move it from a storage platform to a solid-state [platform] that is maintaining all the data internally (like PCIe [PCI Express] cards and servers that can actually be the storage location for data rather than caching), the external software is much better than the system's. [This is] because the systems typically don't have the larger view of what's going on in the environment, nor the access controls to move data between different systems.
How do the different solid-state systems, such as PCIe flash cards, all-flash arrays and hybrid arrays, handle storage tiering?
Kerns: There's a lot of discussion about this, too. PCIe flash cards typically don't do things themselves; they have a layer of software on top of them. And there's some good software to manage data [that] is stored in flash that's in the server.
That typically is caching data. Sometimes it includes write-data as cache, but it's always read data that's cached. Very few of these actually manage data that is resident there, even though you could pin the data there. But there's still typically storage of the information on spinning disk somewhere else.
All-flash arrays are just that; they're all-flash systems, even with SSDs or more specific implementations with their own flash controller chips and flash chips between those -- or PCIe cards even, in these all-flash arrays. They do not tier data, typically, because they are all solid-state in the first place.
Now the hybrid arrays are a little different story. There really are two types there. One type is in the traditional arrays where solid-state has been added to them. That's what we are most familiar with because these systems have been around for a long time, and the tiering has been added internally to those systems.
Most vendors give their tiering a different type of name to give it some type of marketing terminology that they can point to as a difference, but fundamentally they work similarly.
Hybrid arrays are some new designs that we've seen and that typically are targeting the more mid-tier marketplace. They use solid-state as a cache and then use spinning disk as a back-end store. So there's no direct access from the host to the spinning disk. Everything goes through solid-state.
But you have the performance factors of these systems, of getting solid-state access for reads or writes, getting them a greater performance level. Then those systems intelligently manage the back-end store to push data out and pre-stage data based on their programmatic controls and patterns of access.