Five data protection guidelines for business continuity

Storage expert Jon Toigo offers data protection guidelines to help storage admins achieve the best possible results when protecting data in their care.

In part three of his five-part series on the best tips for maximizing efficiency, Jon William Toigo covers data protection guidelines, including mirroring, tape backup and WAN-based replication technologies.

Along with managing storage performance and capacity, data storage administrators are usually called upon to devise ways to protect data placed in their charge. In the past, this was just a matter of making a copy of output data to one or more storage media targets -- to another array on the raised floor (to guard against a localized equipment failure) and/or to removable media subsequently transported off-site or to a stand of disk at the other end of a wide-area network (WAN) connection (to guard against an interruption event with a broader geographical footprint).

However, as the volume of data generated by businesses has grown over time, and the number of arrays used to store data has proliferated, the simple matter of establishing and following a set of data protection guidelines has become much more complex, challenging and costly.

Data protection efficiency is a gauge of how well administrators are coping with the protection of burgeoning data: assigning appropriate data protection methods to bits, testing and validating the solvency of their data protection techniques, and delivering "defense in-depth" in a budget-savvy way. Here are five guidelines for achieving higher data protection efficiency outcomes.

Data protection guideline 1: Lose the 'Tape Sucks' bumper sticker.

You would have to be living under a rock not to have heard the "tape is dead" pitch from array vendors. A massive marketing campaign has been waged by disk array makers since the late 1990s to promote disk-centric tape alternatives such as deduplicating virtual tape appliances that seems to have succeeded in shaping buyer perceptions and strategies.

Witness the latest reports from industry analysts: of the 20+ exabytes of external disk storage deployed by 2011, almost half the capacity was being used to make copies of the other half. In many companies, local-area network (LAN)-based mirroring has been augmented by WAN-based data replication, an entirely disk-based data protection methodology touted by array makers as the meme for data protection in the 21st Century.

Read the entire Toigo tip series on storage efficiency

Tips for efficient disk capacity allocation

Remove useless data with these capacity utilization techniques

Data storage techniques for a green data center

How to solve storage performance issues

While array-to-array mirroring and replication may be appropriate as a data protection method for some data, it's by no means a sensible strategy for all data. This observation derives from a fundamental point: Data inherits its criticality from the business process whose applications and end users it supports. Not all business processes require "always on" failover-based recovery strategies, which tend to be the most expensive approach to recovery. Those that do "may" be well-served by WAN-based mirroring, but even this isn't a full solution to the problem of data protection (see below).

By contrast, tape backup provides an efficient means for protecting data of applications that don't require "always on" services. Restoring data from tape may require more time than simple re-pointing of applications to an alternative disk-based store, but it's substantially less expensive and in many cases more reliable. Smart data storage administrators perform tape backups of mirrored arrays in simple recognition that annual disk failure rates are estimated to be between 7% and 14% annually. Data protection usually requires a mix of technologies.

Data protection guideline 2: Get real about WAN-based replication.

WAN-based disk-to-disk replication is only a valid strategy when it doesn't create data deltas that obviate recovery time objectives (RTOs) and recovery point objectives (RPOs). A delta -- or difference -- in the state of data at the production data center and its mirror at a recovery data center results whenever data traverses shared network pipes greater than 18 kilometers.

Part of this has to do with distance-induced latency -- how fast you can push data across a WAN connection. It's been estimated that for every 100 km (62 miles) data moves over a SONET link, the remote array lags behind the primary by approximately 12 SCSI operations. That's simply a speed-of-light issue, and we can't argue with Einstein.

Adding to the deltas created by distance latency are the effects of "jitter," delays that result from using a shared network service. Depending on the locations of your primary and backup facilities, the impact of jitter can be minimal or profound. Despite nominal or rated speeds of WAN pipes, one Sacramento, Calif.-based firm that was seeking to replicate data to another site in the Silicon Valley area reported unpredictability in transfer speeds and feeds ranging from a few seconds to several hours -- a function of routing through a network with nine different carriers.

Bottom line: nominal rated speeds of WAN services are meaningless. Variables related to everything from processing delays and routing protocols, to buffer bloat and packet resends can impact transport efficiency. Even a company with the coin to afford OC-192 pipes needs to understand that a minimum of two hours will be required to move 10 TB. That's why the fastest way to move data over distance continues to be the use of a carrier pigeon (Google "IP over Avian Carrier" for more information).

Deduplicating data may have a valid use in reducing the volume of data that needs to move across a WAN link, but it doesn't make traffic move any faster over the gridlocked arteries of the information highway.

Data protection guideline 3: Mirrors are difficult to test.

Whether you're replicating data across a WAN or mirroring it over a LAN, another challenge inherent in these data protection schemes are the impediments they present to ad hoc testing. Testing is the long-tail cost of disaster recovery planning, so administrators should be looking to reduce the workload in annual testing events by enabling ad hoc testing of data protection schemes throughout the year.

If you're seeking to verify that failover is possible in a mirrored configuration, you need to "break" the mirroring process and check the contents of the primary and backup data stores. In a LAN, this is usually a painful process requiring that (1) the production application process be stopped or temporarily be redirected; (2) that all cached data be written to continuity volume (the disk being replicated); (3) that this data be fully replicated to the remote disk; and (4) the mirroring process itself be stopped. A (5) comparison is then performed between the contents of the primary site and the recovery site volumes to determine the differences between them. All the while, (6a) new data from the application (if it hasn't been stopped) must be buffered, and (6b) when the mirror connection is re-established, the (7) data stores must be resynchronized.

The difficulty associated with this process helps to account for why it's rarely performed. And an untested mirror is a career-limiting recovery problem waiting to happen.

Data protection guideline 4: No bucks, no Buck Rogers.

As a practical matter, mirroring and WAN replication are expensive. Array vendors seem to have conspired to create lock-ins around their gear by limiting the possible mirroring relationship to only two (or three) identical arrays, all bearing the same logo. In heterogeneous storage infrastructures, this introduces complexities in everything from addressing and cabling "matched pairs" of rigs, to monitoring and managing infrastructure and data placement change over time.

It also increases the cost of data protection. For example, embedding proprietary deduplication and replication software on one popular virtual tape appliance has already produced an acquisition price of $410,000 for 32 TB of 1 TB SATA disk drives worth approximately $4,000. To replicate this appliance, a second copy of the identical system is required. Add in the cost of the WAN link, and the need for a permanent and fixed recovery facility to host the target system, and to paraphrase the late Sen. Everett Dirksen, "Pretty soon, you're talking real money."

This illustrates another practical reason why mirroring and WAN-based replication aren't the be all and end all of data protection: high cost. To achieve data protection efficiency, cost needs to be in acceptable proportion to the measure of protection provided and the criticality of the data itself. Senior management, which holds the purse strings, needs to see this proportion and balance it clearly, or funding may be denied.

Data protection guideline 5: Think 'defense in-depth.'

Truth be told, contemporary data protection guidelines require defense in-depth. Data must be protected against corruption or loss due to application/user errors and malware/virus attacks. Then, like concentric circles, defense is required against a machine failure (Google "Commonwealth of Virginia storage array failure in 2011") and against facility outages or disruptions.

These three layers of data defense may be provided by three distinctly different technologies, all of which must be manageable and ideally avail themselves of ad hoc testing throughout the year. This goal can only be reasonably achieved by moving data protection services off hardware and into a common storage virtualization layer.

Virtualizing your storage infrastructure enables you to create layers of data replication and mirroring without the impediments and obstacles of hardware vendor lock-ins, reducing cost and increasing manageability. A good storage virtualization engine will also enable the selective assignment of data protection services to application data based on the requirements and criticality of that data and the business process it serves.

Final thought: Whichever "storage hypervisor" you select, you should ensure that it will also enable you to integrate tape into your overall solution. Remember the old Sony advertisement: "There are two kinds of disk: those that have failed and those that are going to." Integration and management of multiple protection technologies is the key to improving data protection efficiency.

About the author:
Jon William Toigo is a 30-year IT veteran, CEO and managing principal of Toigo Partners International, and chairman of the Data Management Institute.

Dig Deeper on Storage management and analytics