kentoh - Fotolia

Focus attention on a cognitive data management system

Jon Toigo advises paying less attention to storage component stories and more to system-focused narratives such as a cognitive data management process.

The cognitive dissonance engendered by the disparity in how various trade shows and conferences defined storage's place in the enterprise in 2016 is telling. Hypervisor computing and server-centric shows tended to cater to the app/dev crowd, with lots of woo about high-performance (NVMe and NVMe over Fabric) and agile (hyper-converged) storage. By contrast, shows catering to large-scale "enterprise-strength" data centers and "industrial farmers" of the cloud tended to skip over storage component stories in favor of a more data management system-focused narrative.

In the very first general session of IBM Edge 2016 in Las Vegas in September, for example, one of the very first statements to emanate from the show ringmaster was this simple declaration: "Performance is not the result of any single component but of the successful and efficient integration of a complete system."

It was nice to hear such a principle echoed to all the app/dev'ers in the 5,500-person assembly. Because, frankly, I've grown tired of the limited perspective offered by those hawking component-level storage wares rather than a data management system.

Component-level overkill

To illustrate my point, here's a summary of a few of pitches that crossed my desk over the past year:

Truth is, while 3D NAND flash is more capacious than 2D, it's also significantly more expensive to manufacture -- a potential gating factor on uptake.

1. 3D NAND flash (the first generation, at least) is now shipping, with second-generation chips not scheduled until late 2017, at the earliest. Truth is, while 3D NAND flash is more capacious than 2D, it's also significantly more expensive to manufacture -- a potential gating factor on uptake.

2. Nonvolatile memory express (NVMe) flash is beginning to show up in the market, but remains an answer to problems most of us don't have. While theoretically accelerating the performance of workloads where bus congestion is a problem, by parallelizing I/O transports to individual flash chips, NVMe does nothing to parallelize the unloading of sequential I/O from x86 chips to the bus where latency develops in the first place. So the logjam in the I/O path slowing virtual machines (VMs) and other apps hosted in multicore chips remains unresolved by NVMe, although the technology may find use in large in-memory databases in the future.

3. Shingled magnetic recording (SMR) disk is available on many -- primarily archival -- disk targets. In truth, SMR is pretty much a bust. Without a lot of workarounds, users have to rewrite an entire piece of high-capacity media if a file or object is updated. Without such gyrations or a near-perfect knowledge about what data will remain unchanged for the next half decade or so, you're better off using high-capacity helium drives or tape.

4. Seagate continues to promise large capacity hard drives with good random access performance leveraging heat-assisted magnetic recording (HAMR), but the dates of arrival keep slipping. Who knows if bit-patterned media or acoustically assisted magnetic recording will ever deliver the goods?

Success stories

Who knows if bit-patterned media or acoustically assisted magnetic recording will ever deliver the goods?

Unlike the component-level technologies listed above, tape continues to gain traction as a data management system, especially among cloud service providers who see it as the only way to store the mass quantities of data -- measurable in thousands of exabytes -- expected to come their way by 2020. Tape's on track to deliver between 140 and 220 TB of capacity (uncompressed) per cartridge by the next decade, and is the only hope for cloudies looking to forestall the Zettabyte Apocalypse.

I'd also like to tip my hat to Nutanix for demonstrating with its initial public offering (IPO) -- one of a very few this year -- that appliance-ized "server + hypervisor + software-defined storage stack + storage hardware" has a market among virtualization administrators and hypervisor vendors who know very little about storage. The IPO has caused many of the third-party software-defined storage (SDS) vendors allied with server makers like Cisco UCS, Dell, Fujitsu, Hewlett Packard Enterprise, Huawei or Lenovo to pine for their own soup-to-nuts appliance bearing their own logo art on the bezel.

I would caution that the SDS stack is very much in flux, though. Which could mean current hypervisor-centric stacks may end up becoming Jurassic infrastructure in a relatively short period of time.

Integration: The adult in the room

Now, if you went to the enterprise data center shows, you were likely treated to a more grown-up treatment of technology -- one in which the whole data management system is greater than the sum of its parts. Less important than the latest flash performance metrics is how you design a balanced data management system that levels the differences between the speeds and feeds of discrete components to deliver the greatest overall efficiency.

For example, IBM is once again talking about "FLAPE" and "FLOUD," both of which involve the use of high-speed storage to capture data and lower-speed storage for archiving. These are essentially "triangulated" architectures, much like hybrid cloud, intended to meet the goals of application developers for speed and agility, with no latency, and the ops crowd, which tends to prize resiliency, continuity and security in addition to performance.

If you went to the enterprise data center shows, you were likely treated to a more grown-up treatment of technology -- one in which the whole system is greater than the sum of its parts.

In this vein, the storage tech I'm watching closest nowadays is cognitive data management (CDM). Essentially, CDM is the integration via cognitive computing of (1) an internet of things (IoT) approach to storage resource management, (2) an analytics approach to storage service management and allocation, and (3) a fresh take on data lifecycle management.

The CDM platform sits over the storage infrastructure to direct data where it needs to go to receive the appropriate services (per policy linked to file/object metadata) in the most efficient manner. All data migrations (tiering) and copies (backups) are handled by the cognitive computing platform so as not to create latency in applications or VMs. "Universal translator" functionality enables many types of file systems and object systems to coexist and share data, with a global namespace providing all locational information for every stored bit across the infrastructure, even in the cloud.

I like what IBM has in mind for this technology, almost as much as I like what StrongBox Data Solutions is already doing with it under the moniker "StrongLINK." IBM is taking an IoT-meets-Watson approach to storage resource management in order to place data where it should go on tiered on-premises and cloud infrastructure storage components sporting the IBM moniker. StrongLINK, the CDM from Strongbox Data Solutions, takes a hardware-agnostic approach that will likely deploy more effortlessly in a heterogeneous shop. In the final analysis, it is CDM -- which goes well beyond the current boundaries of software-defined storage -- that holds the greatest potential for addressing practical storage challenges that will come to a head by the next decade.

Next Steps

Adding cloud storage to data management systems

A closer look at Toyota's system for managing data

Improve the quality of your data storage management

Dig Deeper on Storage architecture and strategy