Cybrain - Fotolia
Enterprise backup software provides data protection foundation
Learn how backup and recovery software has evolved and what features it offers IT enterprises that need to protect cloud and virtual environments and other data sources.
Enterprise backup software moves data from primary storage platforms and applications to secondary storage. Historically, tape and disk have been used as secondary media. But, increasingly, vendors are also supporting public cloud storage as a long-term strategy.
In modern IT implementations, backup tools have evolved from an application that merely copies data to a much more sophisticated data management application that protects applications where they're deployed. This includes on-premises and public cloud IaaS or SaaS offerings.
It's tempting to take enterprise backup software for granted and assume all products are basically the same. The reality is the backup and recovery software market has evolved to address a wide range of data recovery requirements. As a result, there are significant differences between backup applications. Understanding those differences requires addressing the internal features common to each.
Data movement
The first consideration is data movement, which is the process a backup application uses to get data from primary storage to the backup storage platform. Early backup and recovery software ran on each server and simply wrote to a local tape drive or disk device. This method wasn't scalable and introduced considerable hardware and management costs.
Vendors have evolved their products into network-based backup. These systems implement one or more centralized backup servers that pull data across the network from each source application server. Scalability is achieved by adding more backup servers and storage media, such as tape and disk drives. As long as sufficient network bandwidth is available, backups can scale to meet demand.
Network backup systems have advanced over time to improve the efficiency of data movement. Some products read data directly from the storage platform through snapshots and replication. Other systems use data protection APIs available in hypervisor and hyper-converged infrastructure platforms.
Editor's note: Using extensive research into the data backup and recovery market, TechTarget editors focused this article series on data protection products from both traditional and new entrants into the market that address the many data sources of today's IT. Our research included data from TechTarget surveys and reports from other well-respected research firms, including Gartner.
With the evolution of backup to run in hybrid environments -- including the public cloud -- network bandwidth once again becomes a challenge. Enterprise backup software vendors have introduced techniques such as data deduplication and compression that reduce the amount of data that's needed to traverse the network. Client-side deduplication identifies data that already exists on the backup platform so it can be tagged as backed up, without having to be sent again.
Some vendors have implemented products that keep data both local and in a public cloud. This facilitates quick restores, while ensuring data is secure in the event of a total site loss. Essentially, the key is to optimize the placement of secondary data to ensure resiliency and to meet recovery time and recovery point objectives.
Data management
Once data reaches the backup platform, it must be managed. At the most fundamental level, this means using metadata to keep track of which files have been backed up from each of the source applications. This process is evolving, as applications are no longer directly connected to a physical or virtual server. IT organizations need to protect information in SaaS platforms, such as Salesforce or Office 365, as well as traditional and NoSQL databases.
Data stored centrally in backups has moved from physical media, such as tape, to deduplication appliances that use a mix of flash and hard disks. Target-side deduplication reduces the amount of physical storage used, compared with the volume of logical data under management. As public cloud storage becomes a target for backup data, an organization must use deduplication before data is written off site, as the public cloud providers don't offer any savings that can be achieved from data reduction.
Enterprise backup software vendors have adapted their products to offer more Google-like searches for backup data. This means offering search consoles and efficient pattern matching to file names and data sources.
Backup vendors are also designing their products to reuse secondary data, which requires new approaches to storing and tracking content that might be used to seed test and development environments or be used for data analytics. Some vendors have developed new internal scale-out file systems that allow organizations to access backup images directly off the backup platform. Some vendors have made backup data self-describing. This enables the organization to push data to public cloud storage and access it directly, outside of the backup product itself and without having to restore the content to access it.
We can expect to see many more examples of data reuse, as vendors turn what was seen as overhead in IT into business value.
Job management
At a basic level, job management means the scheduling of backup tasks and other management functions by the backup software. Early backup products required the administrator to be heavily involved in managing the timing of backups to match the bandwidth capabilities of the infrastructure.
Today, vendors are moving toward a policy-based approach, where data protection is based on service-level objectives (SLOs). The backup software is then responsible for scheduling backups to ensure SLOs are achieved. For organizations that want to protect thousands of virtual servers and other data sources, manual scheduling has become impossible. Instead, automated scheduling is implemented alongside enhanced reporting that can show the compliance status of data protection across the entire application landscape.
At the very least, enterprise backup software should deliver features such as backup and disaster recovery. But today's backup products are becoming increasingly integrated into data management processes within the enterprise. At the same time, backup and recovery software needs to provide options that cater to the dispersed, highly scalable and dynamic enterprise. This means treating data independently of the source it came from and associating it more with the application than physical constructs, such as VMs and servers.