Sub-LUN tiering: Five key questions to consider
IT shops must consider five areas before adopting sub-LUN tiering as offerings vary in number of tiers, block size, analysis period, policies and monitoring
Each storage vendor in the auto-tiering space implements the technology differently, and potential users need to assess a number of distinguishing characteristics. Below are five questions to consider before forging ahead with sub-LUN tiering.
How many tiers does your auto-tiering software support?
One of the first considerations when embarking on the block-based auto-tiering path is the number of tiers the software supports, because even if an IT shop has three or four tiers of storage in an array, the auto tiering might not be able to span all of them.
Compellent Technologies Inc. Data Progression (Compellent has been acquired by Dell Inc.), EMC Corp. Fully Automated Storage Tiering for Virtual Pools (FAST VP), Hewlett-Packard (HP) Co. StorageWorks P9500, HP/3PARInc. Adaptive Optimization and Hitachi Data Systems Dynamic Tiering can automatically migrate data across three tiers. Meanwhile, IBM Easy Tier supports two tiers, one of which must be solid-state drives (SSDs). But each of those vendors offers three or more tiers among the mix of solid-state, Fibre Channel (FC), SAS and SATA drives, as well as different rpm rates.
"Three tiers is probably the maximum you need," said David Floyer, chief technology officer (CTO) at The Wikibon Project. "I think it's a law of diminishing returns when you go from 10K Fibre Channel disk to 15K and have a tier there. The relative performance differences and the relative cost differences get a little fine, and the overhead of doing it all gets a bit high."
Compellent's three-tier system treats the 15,000 rpm and 10,000 rpm drives as if they were the same, and places blocks of data on disk space wherever it's open, according to Bob Fine, director of product marketing at the company.
Large IT organizations with a diverse set of applications might see a need to spend more money for additional data tiers, but others might be content with two. "There are some very good designs that only have two tiers: SSD and slow disk," said Valdis Filks, a research director for storage technologies and strategies at Stamford, Conn.-based Gartner Inc. "Until you know what your requirements are, you can't sit down and decide which number of tiers is best."
Does the size of the data chunks being moved matter?
Storage administrators need to confirm that their auto-tiering software operates at the sub-LUN level to ensure they're reserving their most expensive storage for their most critical performance-sensitive data. But it's less clear if they should concern themselves about the size of the data chunks the system moves, even if their storage vendors are.
Users will find no dearth of block size options for auto tiering their data, from Compellent's choice of 512 KB, 2 MB (default) or 4 MB with its Data Progression; to IBM's 1 GB with its Easy Tier; and EMC's 1 GB for Clariion and VNX, and 768 KB to 370 MB for Symmetrix.
Brian Garrett, vice president of Enterprise Strategy Group (ESG) Lab, created the following hypothetical scenario to illustrate how block size factors into automated storage tiering. Suppose a storage system moves hot chunks of data in 8 KB increments from a hard disk drive to a flash drive to improve the performance of a database application. If the hot chunk is 512 KB in length and the auto-tiering system moves chunks in increments of 512 KB, the system is 100% efficient. But if the system shifts a 1 MB chunk, it would be approximately 50% efficient because it would move not only the 512 KB of hot data but also cooler data that might be next to the hot chunk, thereby wasting half of the expensive flash capacity.
"A smaller chunk size increases the efficiency and cost effectiveness of each sub-LUN migration," Garrett said, via an email. "But smaller chunk sizes increase the amount of metadata needed to monitor and track sub-LUN migrations. Metadata is typically stored in high-speed memory, which adds cost. Doing more metadata updates and lookups could impact performance."
According to Gartner's Filks, the chunk size won't matter much to small organizations that pay little attention to the inner workings of their systems, but it might catch the attention of sophisticated high-end users that want to tune systems for optimal performance based on their application needs.
"It's one of those arguments that will run forever in the storage market: What block size is the most important?" Filks said. "We have argued about block sizes when we've been tuning databases for the last 20 years."
Floyer at The Wikibon Project said vendors constantly raise the issue of block size with him, but he views it as "hogwash." He advises users to "focus on how much money you're going to save; focus on the business case. Can I run this report to predict how much I'm going to save? That's 20 times more important than worrying about the size of the block."
Some block-based storage systems equipped with sub-LUN tiering collect and analyze data in minutes before migrating it to another tier. Others might take 24 hours to complete the assessment. Several offer the customer a degree of choice.
Dell EqualLogic XVS arrays, for instance, have a learning cycle of approximately 10 minutes before tiering data between SSD and SAS drives. HP-3PAR Adaptive Optimization and HP StorageWorks P9500 have a minimum sampling period of one hour, although customers also have the option to customize the time frame.
EMC claims its Symmetrix arrays can move data based on real-time workload analysis, while its Clariion systems shift blocks based on an analysis window of 24 hours. An EMC spokesperson said the analysis time is optimized based on typical workloads, but users also have the ability to create custom policies. For instance, you can select a window of Monday through Friday from 6 a.m. to 6 p.m. for analysis and ultimately control when the system moves the data.
IBM Easy Tier monitors activity on 1 GB data chunks to determine the data "temperature" and to create a "heat map"; algorithms then generate a data relocation plan once every 24 hours to place the data on the most appropriate tier.
A cyclical application operation that tends to run the same for eight-hour stretches every day might benefit from a 24-hour analysis, whereas dynamic workloads that change quickly might be better suited to speedier assessment periods, noted Randy Kerns, a senior strategist at Evaluator Group Inc. in Broomfield, Colo.
How can you track the effectiveness of the auto tiering?
IT shops can seek out professional services to help pre-determine the potential benefits of automated tiering. Or they can try vendor-supplied tools to predict the effectiveness of the auto tiering, and to later monitor the data movement and system performance.
Some tools show not only what's going on but also answer potential "what if" questions, such as the performance impact of a change in the amount of flash drives or the cost benefit of an increase in the number of SATA disks, often with the savings calculated.
Determining which applications stand to receive the optimal performance benefit from flash drives or the greatest cost benefit with SATA, and how much SSD and SATA an IT shop might need, are complex problems that require calculus to solve, ESG Lab's Garrett said.
"We're just beginning to see tools that can not only model the performance impact but the price impact," Garrett said. "Over time, I think we'll see more friendly tools, more easy ways to do this modeling. But, right now, they're generally sharp and pointy tools in the hands of experts."
EMC, for its part, makes available professional services to plan and implement FAST VP, but it also offers up a free Tier Advisor tool to plan and model FAST configurations based on application workloads.
Compellent Enterprise Manager generates reports that display capacity usage and power and carbon savings with respect to tiering configurations.
Users of Hitachi Data Systems Dynamic Tiering can employ the Hitachi Command Suite or Storage Navigator 2 to monitor the auto tiering, and graphical reports show where the tiers are set and I/O load against each tier. Alerts notify administrators when service levels fall below desired levels.
IBM Easy Tier includes a Storage Tier Advisor Tool (STAT) to report on the system workload of each volume in the pool or to predict how effective Easy Tier will be with SSDs.
Auto-tiering software can minimize time-consuming and burdensome tasks associated with moving data to the right tier at the right time, but it doesn't eliminate the ability to exert a level of control over the storage tiering process. Tiering products, to varying degrees, provide options to define policies based on their individual needs.
Compellent Data Progression, for example, offers both default policies for customers who have little or no storage management experience, and customizable policies for IT shops that want to tier by application, RAID level or other configuration options. Users can lock a volume to a specific tier for a set time period with a critical application, such as an ERP database or, more typically, they can set an expiration time for data to stay on fast disk, according to Compellent's Fine.
Hitachi Data Systems users also can lock Dynamic Tiering volumes in place and control the monitoring cycle duration, excluding selected time periods from "heat analysis," according to John Harker, a senior product marketing manager at the company.
HP-3PAR storage systems let users define the optimization mode based on performance, cost or a combination of the two, as well as tinker with the schedule to measure performance or migrate data.
EMC FAST VP allows users to assign policies not only to individual storage devices but to storage groups of one or more related LUNs. The policies define the pools that form the three tiers and the maximum amount of space for each tier.
"You can specify which applications can move into the various tiers or which users can move into the various tiers," Gartner's Filks said. "For example, you may not want to have YouTube applications use SSD because that's just a waste of your resources."
Administrators also need to take care that important financial applications that become crucial at the end of the month haven't been pushed to a lower tier of storage.
"There's common sense involved in this," The Wikibon Project's Floyer said. "Don't do anything that rocks the boat too violently. Just because you can do it doesn't mean you should do it."
More on automated tiering
Auto-tiering software set to boost adoption of SSDs
Radiology firm uses Compellent Data Progression to cut disk costs