Essential Guide

Browse Sections
Tip

How to create a successful data archiving strategy

A good archiving process provides the automation needed to deliver the necessary application granularity while minimizing the impact to IT operations.

What you will learn: The technical tools and processes required for an effective data archiving strategy depend entirely on a company's compliance, data governance and storage management requirements.

There's a story that tells how someone once asked Abraham Lincoln how long a man's legs should be. Our 16th president reportedly replied, "Long enough to reach the ground." Similarly, when it comes to the question of how long data should be archived, the reply might be, "Long enough to be sure that it's available when you need it." This statement captures the two most critical variables of the data archive equation: time and accessibility.

Time, or more accurately the retention period, is the "tip of the spear" when it comes to matching an organization's needs with potential archiving solutions. Data retention requirements can be highly variable, often determined on an application-by-application basis. For example, all organizations must manage financial data, which generally must be retained for seven years. Human resources data may need to be retained for three years, but that regulation can vary by state. Medical data might be retained for the life of the patient plus seven years, nuclear power data for 70 years and so on.

There's a simple answer to the question of what all these time periods have in common: compliance. In most cases, the retention requirement matches the statute of limitations for a party (either governmental or private) to bring legal action against the organization. Failing to produce records demanded by a court order can lead to civil and, in some cases, criminal penalties. On the flip side, retaining records beyond the mandated period makes them subject to legal discovery and needlessly jeopardizes the organization's legal position.

Unfortunately (or perhaps fortunately), most IT people have no legal background. So, step one in developing a data archiving strategy is to inventory the data and assign a retention schedule to it. Corporate counsel may be able to provide the necessary parameters. If the attorneys can't (and you'd be surprised how often they decline to do so), the heads of the individual departments that "own" the data might be able to supply the retention information, as they should be familiar with the regulatory environment of their area. Sometimes, attorneys and department leaders don't want to chisel a time frame in stone. In that case, IT organizations shouldn't guess. In the absence of a specific time frame, the default retention period becomes "forever." While not optimal, it may be the only option for IT managers.

The term archive has been used in a rather fast and loose manner over the past several years. Archiving can refer to moving infrequently accessed data to high-capacity, low-cost disk (including tiered storage), backup to tape and offline/off-site storage. Similar to having a continuum of data protection (i.e., a mix of snapshots, replication and backup), organizations will have a data archiving continuum. This continuum will be necessary to meet the varying time frames mentioned above at a cost-effective price. Satisfying these varying needs will be balanced against complexity, and a good archiving solution will provide the automation needed to deliver the necessary application granularity while minimizing the impact to IT operations.

Data archiving benefits

IT organizations will be motivated to implement archiving as a general-purpose enhancement or for application-specific reasons. In either case, expected benefits of archiving include:

  • Reduced costs. Data archiving is largely, though not exclusively, an effort to lower costs. This is measured as $/gigabyte stored. Many vendors offer a total cost of ownership (TCO) analysis. All models are expected to yield positive results, so the results are only meaningful if you agree with both the data input and the underlying premises of the TCO model.
  • Reduced backup window. Even with backup to disk, data compression and data deduplication, backup windows face constant pressure from data growth rates that often exceed a 50% compound annual growth rate. There's no point in repeatedly backing up unchanged data. Archiving can remove tens of terabytes or more of data from the backup set.
  • Compliance. As mentioned earlier, governmental requirements and legal liability are key reasons to implement a data archiving strategy. Doing so at the lowest possible cost is the trick.
  • Knowledge retention. In an era of big data, organizations are learning the value of analyzing vast amounts of data. Here, the consideration isn't cost, but the desire to gain a competitive edge in the marketplace.
  • Improved performance. By reducing the amount of data to manage, or partitioning unused data from active data, organizations may see substantial improvement in system performance.

Application-specific archiving products are tailored to deliver these benefits to specific environments. Examples include SAP, email and Oracle applications. Application-specific products are designed to know the ins and outs of the application so they can prune or separate data in a manner that optimizes the application without endangering referential integrity. General-purpose archivers aren't usually smart enough to do this. An application-specific tool may be all that's needed when data volumes don't justify a system-wide implementation, the major pain point relates to a specific application or a general-purpose product won't adequately address a given application.

About the author:
Phil Goodwin is a storage consultant and freelance writer.

Next Steps

Why data archive planning is important

Dig Deeper on Storage management and analytics