Hybrid storage arrays join multiple storage types to cut costs
Despite being more complex than all-flash or all-hard disk drive systems, hybrid systems offer the speed and low latency of flash and the economy of HDDs, tape or cloud.
Most enterprises have multiple types of data, each with different priorities that are dictated by the size of the data and the speed needed by the applications that access this data. For this reason, most large data centers do not have a single, homogenous type of storage. Because few enterprises can afford to put every bit or byte of data on the fastest available flash storage, hybrid arrays that mix flash and hard disk drives are a staple in many data centers.
Hybrid storage arrays address differing priorities and the need to reduce costs by joining various types of storage. Increasingly, this includes linking not just flash and HDDs, but multiple tiers of flash, multiple tiers of HDDs, tape, object and cloud-based storage into a single, transparent virtual storage infrastructure that can maintain the effective performance of the entire storage system at the level appropriate for each type of data and application.
The following use cases can help you better understand the benefits hybrid storage arrays offer, as well as the types of data best suited to this type of storage. This information can help you build a case to management for purchasing.
Which types of data can benefit from hybrid storage arrays?
Real-time, transaction-based big data. Live data is typically active and persistent; databases or other applications using live data will be turning over the data regularly, as users run searches and track sales or other activities. Automated tiering software will generally keep all the active data on the highest possible tier, but administrators may want to designate some databases, partitions or volumes of data that should be kept together on a particular tier to ensure there are no delays as a portion of the data is migrated from a lower tier if it is inactive for a period of time.
How tiering software works
Tiering software is the heart of the hybrid system, whether it simply automatically puts the most-accessed data on the fastest tier or it consists of more complex systems that pre-fetch associated data and can move it multiple tiers up or down as needed. You can manually tier data onto silos of different types of storage, but moving the data around consumes a great amount of the storage administrator's time and will probably cost more than purchasing tiering software in the long run.
The administrator can also purchase tiering software separately from the storage and create his own hybrid system. But the administrator's time to learn the software and deploy the combined software and hardware will eat up much of the savings realized versus buying a complete product. If there are pre-existing silos of storage that can be used for some of the tiers, this may be a more effective option.
In addition to the standard two-step tiering process with one layer of flash and one layer of HDD storage, the administrator may want to consider other tiers. For instance, even within the umbrella of flash, there is memory bus flash, nonvolatile memory express flash, write-optimized flash and read-optimized flash, each less expensive than the previous type but with more limited performance. There are multiple tiers of HDD storage as well -- not just 15,000 RPM, 10,000 RPM and 7,200 RPM drives, but options to spin down drives when not in use and object storage running on HDDs. And don't forget tape and cloud storage further downstream, with lower costs per gigabyte and slower response times.
While the statistics may vary, the 80/20 rule is useful for thinking about tiers: 80% of the data written to a system will be active for approximately 30 days and then be accessed less frequently. The 20% of the data that is continually active should be kept on the fastest storage possible, but the rest can be migrated after 30 days to less-expensive storage and only brought back up to a faster tier if necessary.
With big data, data lakes or other large collections of data, it may be worthwhile to examine keeping the data in the cloud, where tiering options can move the data between hot, warm and cold cloud storage as needed.
Typical file server data. The usual data stored on a file server, which includes text, word processing, spreadsheets and presentations, seldom needs the speed of flash. Once a document or file is loaded, user input can generally be measured in characters per second, which does not need sub-microsecond response times. Even graphics being processed for special effects or ray tracing or large programs being compiled will be limited more by CPU or graphics processing capabilities than by the speed of data access. Exceptions exist but should be rare enough to be addressed individually by the administrator.
Streaming data. Since streaming data is, by definition, predictable and sequential, it does not need the low-latency, random-access capabilities of flash. Even streaming data that is being accessed by many users is fairly straightforward to optimize for the best performance without using a lot of flash. In addition, the usually large file sizes and quantity of data being transmitted makes streaming data a large consumer of storage space and an ideal target for lower tiers of storage.
Virtual systems. In contrast to streaming data, virtual servers and virtual desktop infrastructure (VDI) are ideal candidates for flash storage. They can take advantage of the low-latency characteristics of flash and deduplication, since many virtual machines (VMs) share high percentages of their content with other VMs. For instance, a VDI system with 100 Windows VMs might have 99% of the data in common among all the VMs, resulting in a deduplication ratio of nearly 100:1, so that 100 VMs would take up little more space than just one. Flash storage is fast enough to support deduplication and to handle the peak loads typical of VDI deployments; for example, users log in at 8 a.m., log out for lunch at noon, log back in at 1 p.m. and log out again at 5 p.m.
Migrating data between tiers
Automated tiering software is transparent to the user and often the administrator. Two files that appear in the same directory may actually be on different tiers in the storage system, or possibly even in different systems or data centers. The storage virtualization software identifies seldom-used files and moves them to slower, less-expensive storage, keeping a placeholder to tell the system where the file has gone. If a user opens that file, the system automatically fetches it from the slower storage and moves it back to a faster tier.
Some of the earliest automated tiering systems were based solely on activity. A file was moved to a lower tier if it had not been opened or changed during a set period of time, and it was moved to a higher tier if a user opened the file. Some systems still work this way. Others use predictive algorithms to migrate associated data so that a user opening one file in a directory will get the rest of the data in the same directory moved up in case they need it. Other systems move data at the block rather than file level. That way, a large file that usually gets only small additions to some of its data can keep the rest of the file on slower storage. Only the blocks that are often changed are kept on the faster tier.
Many factors can determine on which tier data should be kept, including service-level agreements, data that is searched only at the end of the quarter, critical data that should have maximum redundancy and data that needs extreme levels of throughput. This data might be designated to specific tiers that the automated tiering software could not handle properly on its own.
The flexibility of the storage management software, whether built into the flash array or purchased separately, will determine how well the administrator can accommodate these types of unusual requirements. Some hybrid storage arrays will easily allow the administrator to measure response times, throughput and latency of specific files or directory trees and ensure they meet minimum requirements other systems will not. Similarly, some systems can migrate data at specific intervals so that it can be brought to a higher tier in anticipation of an end-of-quarter job or that some files or directories be kept permanently on a designated tier.
Hybrid storage arrays offer the speed and low latency of flash and the economy of HDDs, tape or cloud. They are necessarily more complex than all-flash or all-HDD systems, but the flexibility and lower cost make them worth investigating. From NAS boxes that are less than $1,000 to $1 million enterprise-grade systems, most vendors offer some form of hybrid storage. Administrators should be familiar with the way these systems work and their potential to save their IT organization substantial amounts of money.
Editor's note
Using extensive research into the hybrid flash storage market, TechTarget editors focused on market-leading vendors and other well-established enterprise vendors. Our research included data from TechTarget surveys, as well as reports from other respected research firms, including Gartner.