Business continuity services, DR planning key to recovery success

Although all-in-one options for business continuity and disaster recovery planning are lacking, organizations can create a system using the various services and tools examined here.

Ensuring that a business can continue functioning after a disaster is critical to its success. Even so, business continuity and DR planning can be surprisingly difficult. Given the lack of all-in-one offerings, organizations commonly piece together systems that are based on various products and services.

Business continuity services and disaster recovery tools are two very different things. In general, business continuity services tend to refer to consulting services that identify what it would take to keep core business processes running after a disaster. Conversely, DR tools are what make it possible to put the prescriptive business continuity guidance into practice.

Here we examine the different types of business continuity services and disaster recovery tools and how they work as part of an overall business continuity and disaster recovery (BC/DR) planning effort. Additionally, we address the features an organization should look for to ensure they select the best options to meet their organization's requirements.

Business continuity services

Several organizations specialize in providing business continuity services. They tend to be business-oriented and may also provide ancillary services such as compliance auditing or crisis management. Some of these service providers function primarily as auditors, while others may act as accounting firms or function primarily as IT consultancy groups specializing in DR.

Auditing firms. When it comes to business continuity, an auditing firm's primary job is to determine how well the organization adheres to industry standard business continuity practices. These standards are defined in ISO 22301.

An organization has several things to consider when it comes to selecting an auditing firm to assist with its BC/DR efforts. First, the firm's auditors should have some sort of professional training. For example, many IT auditors are certified by the nonprofit International Consortium for Organizational Resilience.

Another important consideration is what happens after the audit. An auditing firm should do more than simply provide a report of the deficiencies it has found. It should also provide prescriptive guidance as to how best to remedy those deficiencies.

Accounting firms. An accounting firm's most likely role in the business continuity planning process is that of performing a business impact analysis. A BIA determines the cost and other potential consequences of critical business processes going offline. While such a report can in some cases be created in-house, the difficulty of accurately estimating the cost of an outage on a workload-by-workload basis may justify the use of an accounting firm.

Most accounting firms should be able to help an organization establish the cost of a workload outage, but the business should ideally choose a firm that has experience with business continuity planning or IT resource planning.

Consulting firms. If an organization has a dedicated IT staff, it may be tempting to dismiss the idea of hiring a consulting firm to help with BC/DR planning. After all, the organization's own IT staff should be able to create a system that allows critical business processes to continue running after a disaster. In many cases, however, consulting firms are used to help establish procedures related to business continuity, rather than assist with the creation of IT infrastructure.

A company may not need a consulting firm in every situation, but it can help the organization's IT staff answer questions such as the following:

  • What is the best way to assess the extent of the damage following an event?
  • What criteria should be used when deciding whether to initiate the DR plan?
  • How should the organization's status be communicated to employees and customers?
  • When a failover does occur, what should be done to restore normal operations?

IT consulting firms tend to specialize in certain areas. For instance, a consultancy may focus on custom coding, container orchestration, Active Directory or any number of other IT specializations. It's important to use a consulting firm that lists business continuity planning among its core competencies.

On-premises DR tools

There are still numerous disaster recovery tools that can run on premises. Although these tools vary in scope, they largely work by replicating on-premises virtual machines (VMs) to the public cloud. That way, if the organization experiences a failure in its data center, there's a cloud-based replica that can take over for any failed VMs.

Even though the concept of failing over to a cloud-based VM replica seems simple enough, there's a significant amount of underlying complexity. For example, the replica VM operates within a different subnet than the production VM, and therefore needs a different IP address. This impacts DNS resolution, storage mappings and more. As such, it's imperative for any on-premises disaster recovery system to support orchestration. Otherwise, any failover would require a significant amount of manual effort.

Another critical feature of DR planning tools is the ability to perform non-disruptive testing of replica VMs. Testing is the only way an organization can ensure its DR plan will work. It isn't enough for an organization to verify that a VM replica will start. True DR testing means making sure workloads are functional and that replica VMs can connect to any necessary resources. In order to do this, however, the recovery platform must offer a way to start the replica VMs in a sandbox environment, so administrators can perform comprehensive testing, without the risk of impacting production workloads.

The ability to roll back replicas to an earlier point in time is also essential. Even though backups and disaster recovery are technically two different things, having the ability to revert a VM replica to an earlier state should be considered critical. If, for example, a ransomware infection hit a mission-critical VM, the damage done by the ransomware would be replicated to the cloud-based VM copy. This would render the replica useless unless there was a way to roll back the replica to its state just before the infection occurred.

Finally, it's important for on-premises disaster recovery tools to have alerting and reporting features. Nobody wants to attempt a failover only to discover the replication process hasn't been working for the last few weeks. Administrators need a console that can provide real-time health information for DR replicas, as well as alert them to problems and provide detailed diagnostic data should the need arise.

Cloud disaster recovery tools

Some of the available cloud-based DR tools work essentially the same way as their on-premises counterparts, except that the replication engine is cloud-based. Amazon's CloudEndure service is one example of such a product. CloudEndure is designed to replicate physical, virtual or cloud-based workloads to the Amazon cloud.

Microsoft's Azure Site Recovery is a different type of cloud DR tool. Whereas CloudEndure focuses on replicating external resources to the Amazon cloud, Azure Site Recovery replicates resources from one Azure region to another.

Regardless of which DR planning approach an organization takes, the ultimate goal is to be able to recover from a disaster with little to no downtime.

Regardless of which DR planning approach an organization takes, the ultimate goal is to be able to recover from a disaster with little to no downtime. As such, there are some things organizations should look for in a cloud-based DR system.

First and foremost, the system must be compatible with the organization's existing infrastructure, regardless of whether that infrastructure is located on premises, in the cloud or a combination of the two.

Another important consideration is synchronization frequency. Assuming that resources are being synchronized asynchronously, the replication frequency determines the amount of data that could potentially be lost in a failover situation. If, for example, a system performs replica synchronization every five minutes, then up to five minutes' worth of data could potentially be lost during a failover.

Similarly, some cloud providers base the failover process around the use of recovery points. Administrators need to be able to fail over to a replica that's based on the most recent recovery point, but they also need to be able to select an earlier recovery point if necessary.

When it comes to DR planning, every organization's needs are unique. As such, it's important to know what tools and services are available, so you can select the ones that best meet your organization's requirements.

Next Steps

How to use AI for business continuity and disaster recovery planning

Dig Deeper on Disaster recovery planning and management