Deconstructing the storage algorithm
Jon Toigo examines the four steps of the storage algorithm, and concludes that the current storage equation just doesn't add up.
James Harvey Robinson, the American historian, once said, "Most of our so-called reasoning consists in finding arguments for going on believing as we already do." I'm reminded of this quote at many of the events I attend, especially when I'm asked to make sense of industry analyst projections about storage trends. To wit:
Is it true that SANs are heading for a SAN-pocalypse, to be replaced by "server-side virtual SAN [VSAN] configurations and software-defined storage, as the Wikibon folks assert?"
If it's true that storage capacity demand is exploding behind virtual servers, as IDC and Gartner suggest, why aren't sales of disk drives and storage arrays exploding too, rather than shrinking?
Will solid-state storage really replace all hard disk storage within the next couple of years? What about cloud storage?
Generally, I try to avoid addressing these kinds of questions, since most folks are afflicted with confirmation bias, and they almost always want the numbers to add up a certain way. If I take a position that jibes with their existing predispositions, I'm regarded as uber-smart; if my view doesn't complement theirs, not so much.
The four steps of the storage algorithm
With respect to the storage future and its drivers, here's the basic algorithm that analysts seem to be following.
- Proposition 1: Server virtualization is taking hold in a big way, driving out non-virtualized application environments.
- Proposition 2: Server hypervisors prefer server-side storage resources to shared, hard-wired storage network infrastructures.
- Proposition 3: Moving to a server-side storage architecture, complemented by a lot of storage-to-storage replication, drives up capacity requirements and increases the need for storage acquisition.
- Proposition 4: Future storage sales are expected to be "high and to the right," though cost reductions will be seen from increased use of flash storage and the centralization of value-added software into a centralized software-defined storage layer.
My analysis of their analysis reveals some issues to consider.
First, I wonder if server virtualization is actually growing and by how much. IDC and Gartner claim the number of virtual workloads -- virtual machines (VMs) instantiated on a server hypervisor -- is increasing by approximately 45% per year and that, by 2016, we should see 80% of server workloads virtualized. That sounds pretty impressive.
Asked another way -- framing the question in terms of how many physical servers are actually running hypervisors -- we achieve somewhat less compelling results from the standpoint of server virtualization fans. According to the same leading industry analysts, we've seen a rather smallish increase in the number of servers running hypervisors: from 10% in 2009, to 17% in 2012, to a projected 21% in 2016. That's a somewhat paltry 1.25% annual increase in physical servers running hypervisor software from now through 2016.
Combining these data points on the percentage of virtualized workloads and percentage of servers running hypervisors, we must conclude that by 2016 80% of workloads will be processed on the 21% of servers that run hypervisors, while 20% of workloads will operate on the 79% of servers that are not virtualized. The questions not addressed by the analysts are threefold:
- Which applications will remain un-virtualized?
- How important/critical are these non-virtualized applications relatively speaking?
- How much data will they produce?
If the non-virtualized applications are high-performance transaction processing systems that represent the most mission-critical applications of the business and generate the lion's share of revenue-producing data, they may be doing just fine on a networked, hard-wired, block-oriented infrastructure such as a Fibre Channel (FC) SAN.
So, maybe our failure to see the predicted SAN-pocalypse is a function of what workloads are actually being virtualized and which ones aren't.
All storage is local
The second piece of the algorithm suggests that hypervisors prefer to manage locally connected storage assets, creating support for VSAN and other server-side architectures. That may be true or it may be bunk, but the reality is that all storage is direct-attached. Parallel SCSI is a direct-attached storage (DAS) interface, while serialized SCSI -- whether FC, iSCSI or something else -- is merely switched DAS. There's no such thing as a true storage network, at least not one that fits the ISO model of networks. We have storage fabrics with switched DAS (SANs), and we have DAS with thin file server appliances (NAS). So, the idea that virtual servers won't work with anything but server-side direct attachment is silly.
That said, the idea that hypervisors have difficulty dealing with the rigidity of SANs, whether iSCSI or FC, traces back to the notion that template cut-and-paste technologies for VM migration are hampered by the need to re-provision storage every time a VM is moved. In other words, you need to inform the application where to find its data by providing routing instructions to the storage relative to the server address, which is a pain, to say the least.
However, if you virtualize your storage, you can surface a virtual volume that contains the data for the VM and that travels with the VM from server to server, automatically adjusting the best route to the physical storage assets containing the data. So, with real storage virtualization, proposition 2 may be completely bogus. Perhaps we're hearing about a need to move to VSAN not because it's necessary, but because some vendors want to sell us new kit.
Unfortunately, storage virtualization software, which has been around since the late 1990s, finds its value being challenged by so-called software-defined storage (SDS) vendors. The SDS folks claim they aggregate value-added services that were previously on storage array controllers into a centralized software layer so they can be deployed to support data storage requirements more readily. Noble objective that, but it stops short of storage virtualization's additional functionality: aggregating capacity that can be surfaced as a virtual volume. It's functionality that offers the advantage of preserving investments in existing storage infrastructure while providing the same service aggregation merit the SDS guys claim to provide. Unfortunately, this discussion is simply not being had, I suspect, because the hypervisor guys don't know storage from Shinola.
More replication, more storage
Proposition 3 is absolutely spot on. Moving to software-defined server-side storage will require lots of data replication and drive up storage capacity demand. Data needs to be replicated behind any physical server that might possibly serve as a host for a given VM. VSAN requires two copies of data, though experts claim three copies are needed for high availability. Get it? That's where IDC comes up with its 300% capacity demand growth projection. Gartner gets to 650% by adding in backups and remote site replication.
Proposition 4 says future storage sales will be high and to the right, but investments won't necessarily be in SANs or NAS appliances, and maybe not even in disk drives. This is a mix of truth and fiction that will require another column to sort through. One thing is for sure: Storage sales revenues do not reflect any sort of spike in demand, but in fact indicate shrinking margins for vendors that are doing some serious price cutting to keep up their sales volume targets.
I don't get too caught up in the declining disk shipment data points, since they reflect pressures on the market created by slowing server and PC sales, and a number of other inventory sizing, logistics/supply chain and cost-of-money issues. Solid-state drives and other flash-storage componentry may also be reducing demand for hard disk drives by delivering more IOPS with less hardware and by virtue of their own price wars that will culminate in a much reduced cadre of vendors in the near future.
Anyway, the current narrative around storage -- and the algorithm that purports to validate it -- is an interesting but fictional one. And it isn't very good fiction if you don't care what happens to the protagonists. This one just asks us to suspend disbelief and buy something.
About the author:
Jon William Toigo is a 30-year IT veteran, CEO and managing principal of Toigo Partners International, and chairman of the Data Management Institute.