Why storage tiering is necessary now more than ever
Tiered storage is making a comeback by incorporating AI machine learning to take advantage of the cost and performance benefits of new SSD and storage class memory technologies.
Storage tiering has been around for more than a decade. It peaked in popularity several years ago when SSDs were first introduced as a way to combine the performance advantages of flash with the lower cost of HDDs. However, as cost of flash has fallen and SSD capacities have grown, more enterprises moved to all-flash storage and multi-tiered systems became less popular.
Fast-forward to today and we have different types of SSDs with varying performance and cost levels; a range of SSD flash interfaces from high-bandwidth, low-latency NVMe to low-bandwidth, high-latency SATA; and an upcoming generation of storage class memory technology. Tiering is making a reappearance as enterprises aim to capitalize on the cost and performance advantages of all this new technology.
Tiering's evolution
Storage tiering is a policy-based engine that matches data value to the correct price-performance storage tier. As data ages and access frequency declines, it loses value and gets moved from a higher performing and higher cost tier, such as SSDs, to a lower performing and lower cost tier, such as spinning HDDs.
Studies have shown most access to data tends to be in the first 72 hours after its creation, steadily falling after that. The amount of access commonly drops precipitously after 30 days. There are exceptions, but this is generally the rule. Time since last accessed, time since last modification and time since creation are common age-related tiering policies for storage.
Storage tiering software has traditionally placed or moved data based on policy thresholds. Higher performing, higher cost storage tiers were reserved for the highest value data. Data was moved from the primary performance tier to a lower one as it cools. And since there could be multiple tiers consisting of SSDs, fast HDDs and capacity HDDs, data could be moved multiple times.
As cost of flash has declined, the cost differential between flash SSDs and fast HDDs has disappeared. In addition, SSD capacities have grown quickly, storage systems have become all-flash and so multi-tiered storage systems fell out of favor.
But the situation with tiering has changed with the proliferation of various types of flash SSDs, including multi-level cell, 3D MLC, 3D triple level cell (TLC) and 3D quad level cells (QLC). As the number of bits per cell increases, performance and wear-life decreases. These differences have led manufacturers to deliver a plethora of different flash SSDs. Each has a unique balance of latency, IOPS, throughput, capacity, wear-life and cost. They are vastly different from one another.
Take for example the latest high-capacity, low-cost 3D QLC SSDs. Since 3D QLC SSDs have one-tenth the wear-life of 3D TLC SSDs and one-hundredth the wear-life of 3D MLC SSDs, they aren't well-suited for write-intensive applications. They're much better for read-intensive applications that have no effect on wear-life. Once again, storage administrators are faced with the daunting problem of managing different price-performance storage tiers.
In addition, there are different flash SSD interfaces to choose from, ranging from high bandwidth, low latency NVMe to lower bandwidth, higher latency and lower cost SAS to even lower bandwidth, higher latency and lowest cost SATA. Since these interfaces affect performance and cost, being all-flash no longer means a single storage performance tier.
The storage class memory tier
The next generation of SSDs based on storage class memory (SCM) -- including Optane 3D XPoint, resistive RAM, spin-transfer torque RAM, nano-RAM and magnetoresistive RAM -- is adding yet another storage performance tier. SCM SSDs have lower latencies, higher IOPS, greater throughput and longer wear-life than flash. Most are going the NVMe interface route as well. However, SCM costs considerably more than existing storage technologies.
Today, getting the most out of the various flash and SCM SSDs without overwhelming the storage budget requires the use of tiering. The most effective approaches rely on the latest advances in AI machine learning that adapts to changing circumstances and makes the best use of different performance tiers. Storage tiering can be an integral part of an external storage system, software-defined storage or a separate storage application.
Where cloud storage fits in
There's another tiered storage issue: Public and private cloud storage have become increasingly important, but efficient, cost-effective tiering to cloud storage isn't easy. The problem is how to cost-effectively move data from high-cost data center storage to lower-cost public or private cloud storage.
Tiering between different storage types, vendors, technologies and clouds -- known as intersystem storage tiering -- has its own unique challenges. The most popular approach has been to use hierarchical storage management (HSM) technologies, which is still used by cloud storage gateways, storage systems and software-defined storage. The technology was designed for LAN environments, not the cloud and especially not public cloud storage.
HSM is stub based. Data moved from one system to another is deleted from the original system and replaced by a small stub. When data is accessed, that access is actually to the stub, which retrieves the data from its current storage residence and rehydrates it back to its original storage. When used with the cloud, HSM is slow and costly. Every time the data is rehydrated to the original fast primary storage, cloud egress fees are assessed that can add up quickly. While cloud storage itself may be fairly inexpensive, the egress fees involved with using HSM can get expensive.
Then there's the issue of stub brittleness. If data is moved a second time to a different storage repository, the HSM stub breaks because it can't find the data, causing another set of problems.
A new approach to storage tiering
When combining tiering with public or private cloud storage much of the focus is on unstructured data. IDC pegs unstructured data to make up about 80% of an organization's data with an annual growth rate of approximately three times that of structured data. Most new search and analytics tools are aimed at unstructured data as well.
This modern tiered storage approach is referred to as data management or autonomous data management when married with AI machine learning technology. Data management tiering software -- such as Dell EMC's ClarityNow, Hammerspace, Komprise and StrongBox Data Solution's StrongLink -- mounts high performance all-flash SSD file- or object-based primary storage systems with admin privileges. That enables the tiering software to read data and copy it out to public or private cloud storage based on storage tiering policies while inserting a global namespace. The global namespace makes the move transparent to users and applications. Data is read and accessed where it resides; no rehydration is necessary, and it can be deleted from the original storage.
Other data management products, such as InfiniteIO, sit in front of fast SSD storage and the public or private cloud storage and look like a switch. This approach works with both structured and unstructured data.
It's clear that in this era of modern price-performance storage proliferation, intelligent, autonomous storage tiering is no longer a luxury. It's a necessity.