DRaaS guide: Benefits, challenges, providers and market trends
Disaster recovery once involved setting up an off-site facility and duplicating expensive storage gear. Today, DRaaS offers an alternative to traditional approaches.
Disaster recovery as a service, or DRaaS, is a cloud-based data protection approach that has become increasingly popular. DRaaS offers an alternative to traditional disaster recovery methods, in which a business must set up, equip and operate off-site DR facilities. DRaaS shifts that responsibility to third-party providers, which harness private or public cloud storage.
These as-a-service offerings aim to provide disaster recovery to a wider range of companies. Indeed, the penetration of DRaaS is growing among organizations. An Enterprise Strategy Group (ESG) survey of IT professionals in 2016 found that 39% were using DRaaS. In 2019, that portion had climbed to 53%. "It's absolutely gaining a lot of ground," said Chrisophe Bertrand, a senior analyst covering data protection at ESG, based in Milford, Mass.
"The level of adoption will certainly increase in the future," said Naveen Chhabra, senior analyst at Forrester Research, a market research firm based in Cambridge, Mass.
In short, DRaaS is becoming DR for the masses. This DRaaS guide provides an overview of how DRaaS offerings work, the benefits and challenges of this DR variation, the providers offering services and general market trends.
How does DRaaS work?
With DRaaS, an organization's physical servers or virtual machines (VMs) are replicated to a third-party service provider. The service provider hosts the customer's infrastructure, using public or private cloud resources, providing a failover target in the event of a disaster. Depending on the specific type of DRaaS offering, a provider can also manage the failover process, transitioning users from the primary environment to its hosted service, and the failback process, migrating workloads back to the customer when he or she is ready to resume normal operations.
A DRaaS provider typically offers a standard service-level agreement (SLA) to establish performance objectives and govern the provider-customer relationship.
DRaaS is sometimes viewed a subset of cloud disaster recovery (cloud DR), which encompasses a range of delivery models and technical approaches. For example, cloud DR might be deployed fully in-house, partially in-house or, in the case of DRaaS, purchased as a service. Some observers, however, draw distinctions between DRaaS and cloud DR, arguing that cloud DR is more of a do-it-yourself approach requiring more in-house expertise.
Hosted colocation, while not quite fitting into the cloud DR category, offers services like DRaaS. One difference between hosted colocation and DRaaS is that the colocation customer typically bears more of the responsibility for defining DR requirements and deploying the proper services. DRaaS providers, on the other hand, tend to be more knowledgeable about DR requirements and performance across various industries. In addition, DRaaS providers typically offer greater flexibility because they can tap scalable, cloud-based resources.
Benefits of DRaaS
The cloud economics of DRaaS make this approach appealing to businesses that would otherwise find it difficult to afford an off-site DR function. Unsurprisingly, DRaaS advantages first caught on among SMBs for which investing in a secondary data center strictly for DR would be cost-prohibitive.
"DRaaS was more popular originally in the smaller space," said Dan Timko, chief strategy officer for cloud backup at J2 Global, which includes its OffsiteDataSync, KeepItSafe and LiveDrive businesses. "Most people don't want to spend the money on a second data center they only use in the rare event of a disaster."
DRaaS lets organizations avoid the cost of equipping a secondary site. That task would normally involve installing like-for-like hardware and software licensing to enable a successful failover from the primary data center to the DR location. Cloud-based DRaaS, however, offers the ability to take advantage of resource sharing and the resulting cost efficiencies.
"It is more cost-effective than doing it in house, if you are looking at a secondary center and setting up gear," Bertrand said.
Although DRaaS is a natural fit for small companies, their larger counterparts are starting to adopt this disaster recovery method as well. Making a two-time investment in IT infrastructure is an expensive proposition for organizations of any size.
"You are starting to see larger companies take advantage of it," Timko said of DRaaS.
In addition to cost efficiency, DRaaS has the potential to out-perform traditional recovery methods -- especially when businesses try to scrimp on IT resources. Timko said organizations might install decommissioned hardware in the DR site to save money -- and then become upset when failover doesn't work. "A [DRaaS] provider is going to have equipment that is modern and is performing better when you do need to recover."
The replication methods of DRaaS also offer a performance boost over other DR media -- tape storage, for example. Those methods include synchronous replication, which writes data to the primary and backup site simultaneously, and asynchronous replication, which involves a delay between writing to the primary site and the backup site.
"With the replication technologies that we have today, you have the ability to reduce your recovery time objective and recovery point objective," said Jeff Ton, strategic IT advisor at InterVision Systems, a IT services provider with headquarters in Santa Clara, Calif., and St. Louis. InterVision provides a DRaaS offering.
In addition, DRaaS relieves IT personnel from sustaining a DR site in addition to their other responsibilities. Companies that use the cloud as their DR destination "free up IT staff to support more strategic on-premises systems or applications," Bertrand said.
"IT resources are expensive and generally have other things to do," Timko added. "Maintaining the DR site tends to slip down on the list of priorities."
Other benefits of DRaaS include greater granularity and responsiveness. Forrester's Chhabra said traditional DR methods relied on static, dedicated systems, while DRaaS lets customers rent, on a pay-for-use basis, a few VMs or as many as they need for recovery. In addition, customers can change the boot order of the applications and systems to be recovered as their business priorities shift -- and can make those modifications on the fly, Chhabra added.
In addition, a cloud DR destination can be used for purposes in addition to recovery. Ton said a DRaaS environment provides a safe space for regression testing or stress testing a new app rollout. Organizations can also run intensive analytics on secondary or tertiary copies of data the DRaaS provider maintains, rather than bogging down the production environment, he noted.
DRaaS challenges and risks
Outsourcing the cost and labor of DR is perhaps the primary benefit of DRaaS and, potentially, its key drawback. Once a disaster is declared, the customer must rely on and trust its service provider to properly implement the business continuity and disaster recovery plan and adhere to the agreed upon SLA. A customer also counts on the DRaaS vendor's ability to provide adequate security.
Bandwidth, or the lack thereof, is another potential DRaaS pitfall. DRaaS vendors' cloud bandwidth can handle sporadic DR events, but most providers aren't set up to perform recovery operations for all of their customers simultaneously, suggested Brien Posey, a technical writer and former CIO.
"Bandwidth becomes a concern when organizations consider DRaaS," said Jeremy Bigler, director of product management at Otava, a cloud services and DRaaS provider based in Ann Arbor, Mich. "Given the technology is continuous data replication versus snapshot or other point-in-time methods, bandwidth utilization is a concern. This is especially valid for high data-change-rate VMs, such as SQL or other applications."
Security, of course, is a consideration whether an organization runs its own internal DR operation or relies on a service provider. In the case of DRaaS, customers can ask the service provider for documentation detailing its security posture. Paul Kirvan, an independent consultant, suggested requesting a copy of a service provider's Service Organization Control 2 report, which addresses security, availability and privacy metrics as one way to mitigate DRaaS risks.
Customers, however, could create problems for themselves if they fail to fully grasp what they have in their data centers. They must understand their various IT assets and how their criticality translates into a DRaaS provider's service tiers. A service provider's replication technology will typically offer differing recovery times at various price points, Ton noted. The top, most expensive tier might offer workload recovery in a matter of seconds or minutes, while the less expensive mid-tier might offer a 4-to-8-hour enterprise workload recovery time and the least expensive tier might offer 24-to-48-hour recovery.
But without an understanding of those gradations, an organization might put all its workloads -- including its least critical workloads -- in the mid-tier. The result will likely be sticker shock, Ton said.
"We see a lot of people struggle in that initial analysis of what they have," Ton explained "They don't have a business impact analysis or they think one size fits all. Some of these environments are so complex that you have to take the time to understand what you have in there."
DRaaS planning
Getting the most out of DRaaS requires planning. Industry executives recommend that customers start with a close look at their essential needs.
"When adopting DRaaS, it is important that clients first review their business requirements," said Kim Whittaker, president of FNTS, a managed service provider based in Omaha, Neb., that offers DRaaS.
This review should establish the customer's recovery time objective (RTO) and recovery point objective (RPO) for each business system. To do so, the organization must determine how much downtime it can afford, in the case of RTO, and how much data it can afford to lose, in the case of RPO. For example, a 24-hour RPO would be appropriate for a business willing to lose a day's worth of transactions and information, Whittaker said. On the other hand, a business unable to risk losing an hour's worth of transactions and data would need to have a much more aggressive RPO.
"Some applications will be extremely critical with only a few hours of acceptable downtime," Whittaker said. "Other applications may have the capability of being down for multiple days. Determining downtime can help identify the restoration priority of applications."
The RTO and RPO are usually the outcome of a business impact analysis, Otava's Bigler noted. The process of setting recovery objectives "allows the organization to protect systems with the appropriate tool in order to not under-protect or overspend," he said.
Including dependent systems is a critical part of this step, Bigler said. He cited the example of an Active Directory service for a business process that depends on users' AD credentials.
Whittaker also noted the importance of determining the interdependencies of applications. If one application relies heavily on another to perform, those applications must be grouped together from a recovery time perspective, she said.
Once the RTO and RPO are defined by application, the review should then cover what servers are required by application. The customer can work with a service provider to gain an understanding of what infrastructure and system resources will be required to recover the systems within an environment, Whittaker said. This analysis will include the network architecture and connectivity required and the technology or tools that will be used to facilitate the DR activity, she added.
Security should also be part of the DR planning discussions. Whittaker said that step includes reviewing any security or compliance requirements "to ensure the service provider has architected a solution that meets the client's security and regulatory needs."
"Organizations tend to be the most vulnerable during a disaster recovery process, as typical security and controls may be inappropriately alleviated to speed recovery times," Bigler said.
Organizations should also understand recovery priority once they have defined recovery objectives. That is, determining which VMs or systems would be recovered in a given order to support application dependencies, Bigler explained. An organization, for instance, would need to recover a database service before recovering an ERP system that uses that database.
DRaaS implementation can begin once the configuration -- including the RTO, RPO and security components -- is identified and meets client approval. After the implementation phase, a DR test can be conducted to confirm that all aspects have been completed and the DRaaS deployment meets business objectives, Whittaker noted.
Consultant Kirvan suggested that customers work closely with providers to iron out the details of cloud disaster plan testing. That task includes determining what elements will be included in the plan, documenting the plan, identifying who will be involved in testing and finding out whether the DRaaS vendor will provide test scripts before the customer creates his or her own, he noted.
The organization should regularly test and update DRaaS as its business and technology requirements change, Bigler said. "Regularly scheduled tabletop testing is one of the single biggest success factors for any continuity or recovery plan."
In addition, completing DR tests provides an opportunity to identify changes that must be made to the configuration to enhance DRaaS, according to Whittaker.
She also cited the importance of developing an overall business continuity plan, beyond the DRaaS technical offering. An organization's overarching business continuity approach should include plans for supporting the business if it experiences staff disruptions or operational challenges due to a disaster event, Whittaker said.
Other steps organizations can pursue to get started with DRaaS include considering a phased migration. That is, a customer might move only certain applications, databases and data to the service for a limited time period to evaluate the vendor's support, Kirvan said.
In addition to setting recovery objectives and assessing security and testing, organizations should also familiarize themselves with how service providers price their offerings. In general, DRaaS pricing depends on two primary factors: 1) the number of physical or virtual servers -- or applications -- that require protection and 2) how aggressive the customer's RTOs and RPOs are.
How to choose a DRaaS provider
Organizations should weigh several factors when evaluating DRaaS providers. Forrester's Chhabra pointed to a DRaaS provider's recovery posture as an important attribute to discuss with the vendor.
"Can your service provider give you a view into their recovery readiness?" he asked. "There are a number of elements that play into providing that kind of view.
Among those is data availability. Replication times and service tiers, as noted above, can vary, so buyers must ensure a service provider's SLA matches their recovery expectations. Performance applies to VMs as well as replication, former CIO Posey noted. He said organizations should considering whether VMs will perform in the DR site as well as they do when running in the primary data center as part of the vetting process.
The DR testing schedule is also important. Infrequent testing can contribute to DR failure.
Chhabra said a test conducted a year ago will likely miss many changes in the primary data center -- application updates and patches, for example. "The question is, are you sure the same level of changes you did on the primary site have been replicated into the DR site?" he said. Failure to update the DR site, and the resulting inconsistencies, put the success of recovery into question.
Another consideration to evaluate: the DRaaS provider's ability to concurrently support customers in the event of a widespread, regional disaster such as a hurricane.
"How much capacity has the DR service provider provisioned for the number of contracts it has signed?" Chhabra asked. "It is not a factor in your direct control, but you need to know what the level of commitment the DR service provider makes."
Customers with particularly low risk tolerance can ask a service provider to reserve DR equipment just for them, J2 Global's Timko said. But that approach counteracts the cost advantage of shared resources in the cloud.
Security is another top criterion. Security breaches and cyberattacks, such as ransomware, have become leading contributors to outages and downtime, Chhabra noted. In the current climate, an organization must ensure a prospective service provider can offer security that is at least on a par with the security provided in its own data center, he noted.
"In a situation where DR has been invoked … the DR location needs to have the same level of security," he said.
Checking a DRaaS provider's customer references can shed additional light on its capabilities and should be part of the DRaaS selection process. An organization can ask reference customers for their views on the services and lessons learned, Kirvan explained.
Organizations should also consider the main types of DRaaS options available among service providers: self-service, managed and attended.
The self-service option provides the technology of DR but puts customers in charge of planning the recovery strategy and managing the end-to-end process. This approach offers a cost advantage, but the customer must make DR an in-house competency.
"Someone has to manage the environment in self-service mode," InterVision's Ton said.
At the other end of the service spectrum, a managed DRaaS provider takes care of planning, DR testing and executing a recovery. This approach costs more but offers the highest level of customer support. Attended DRaaS, meanwhile, is a compromise between the hands-off cost reduction of self-service and the white-glove treatment of the managed option.
Ton, however, said he sees more customer interest in managed and self-service DRaaS than the attended form.
Provider/service comparison
Companies delivering disaster recovery as a service are a varied group. Market suppliers include pure-play DRaaS providers, data protection firms that offer DRaaS and other products, large IT companies and managed service providers.
DRaaS providers offer differing technical approaches, as demonstrated by such diverse companies as Datto, Druva, Unitrends, Veeam and Zerto. Some companies offer software that the customer installs on its own hardware, while others provide integrated hardware-software appliances. Both approaches replicate to the cloud. Managed DRaaS, with its end-to-end outsourcing approach, buffers customers from the technical method.
Although some DRaaS offerings provide on-premises-to-cloud replication and recovery, cloud-to-cloud approaches let customers perform backups within the same cloud or to another cloud zone or destination for redundancy. In this method, customers perform backups using their own systems, the cloud provider's capabilities or a combination, Bertrand said.
Sales methods also differ, with some DRaaS companies selling direct to customers and others selling exclusively through channel partners. Direct versus channel sales is one attribute some research firms use to segment the DRaaS market. Gartner, for example, significantly reduced the number of DRaaS providers listed in its Magic Quadrant using channel sales among other filters.
An overview of 10 significant DRaaS vendors, based on Gartner's most recent Magic Quadrant, provides additional evidence of the market's diversity, shedding light on BIOS Middle East, C&W Business, Expedient, IBM, Iland Internet Solutions, InterVision Systems, Microsoft Azure Site Recovery, Recovery Point Systems, Sungard Availability Services and TierPoint.
DRaaS providers might, on the surface, appear to be essentially the same. But customers should look closely to discern the differences to determine which company offers the best fit for its workloads. For example, Iland and Recovery Point both offer support for physical and virtual systems and automated orchestration, according to consultant Nick Cavalancia's analysis of the DRaaS providers. However, those DRaaS offerings differ when it comes to the platforms they support and the DR software vendors they employ.
Market trends
Greater public cloud adoption. The DRaaS market continues to evolve. One important trend is the greater adoption of the public cloud as a DR target versus a DRaaS provider's private cloud. This shift potentially boosts cost efficiency because the public cloud platforms offer greater economies of scale. The scale and pricing of public clouds will nudge more organizations to move in that direction, InterVision's Ton said.
One drawback, however, is a trade-off regarding recovery times, he noted. Using a public cloud DR target could require a conversion process when recovering workloads. For example, when an organization must fail over from its on-premises VMware infrastructure to a public cloud, the DRaaS software must convert the customer's VMware virtual machines to the public cloud provider's VM format. An InterVision analysis found recovery time considerably longer in AWS due to conversion -- as much as two to three times longer than an equivalent recovery in vSphere.
"There's a trade-off that an IT leader can look at and say, 'Am I willing to trade faster RTO for lower cost?' That is something you have to weigh," Ton said.
Over time, however, the performance gap is likely to shrink. "You'll see the [public cloud] technology continue to reduce the RTO differences," Ton said.
Increasing complexity. DRaaS offerings will need to manage increasingly complex customer environments that span on-premises and multi-cloud resources.
"We see a unified DRaaS becoming more complex as more organizations embrace a multi-cloud strategy," Otava's Bigler said. "To this end, the right set of tools and processes should be implemented to ensure organizations are not exposed to a more complex recovery process, putting recovery time objectives at risk."
In addition, compliance will raise concerns for multi-cloud customers, as each provider would need to, individually, comply with the appropriate controls for the relevant legislative requirements. That situation calls for transparency into the providers' compliancy postures to ensure a DRaaS plan sustains an organization's compliance, Bigler noted.
A desire to reduce complexity, in turn, will encourage a move toward outsourced -- or managed -- recovery as opposed to insourced or self-managed DRaaS, he added.
Expanding use cases. Protection from natural disasters has been a traditional DR driver, but cybersecurity has grown in importance in recent years. Ransomware has sparked interest in DRaaS among organizations seeking a way to resume operations in the event an attacker encrypts their primary data set.
"The demand for DRaaS has … increased due to ransomware attacks and having the ability to recover systems and information so they are not held hostage," Whittaker of FNTS said. "DRaaS has expanded beyond just preparing for natural disasters into really ensuring client data is protected and recoverable in the event of an outage or loss of access to information."
DRaaS' expanding scope also includes pandemic readiness, which is among the recent challenges confronting the DraaS market. The COVID-19 crisis points to resiliency amid infectious disease outbreaks as an emerging use case. Whittaker said her company is "seeing more clients that have expanded DR planning and overall business continuity to address staffing concerns. This can ensure they have the technical talent and skills to run operations in the event of a worker shortage."
Convergence with BaaS. DRaaS and backup as a service (BaaS) might grow closer together. Whittaker noted that the technology offerings for DRaaS and BaaS are similar, adding that the challenges for convergence to take hold are operational rather than technical.
"Most large organizations separate disaster recovery resources from backup resources," she said. "They tend to have different functions and departments within an organization. Convergence could provide convenience and cost savings, but might add complexity and would increase risk, she noted.
"Whether or not the industry moves in this direction is yet to be seen, but it is creating a buzz in the market," she said.
The increasingly complex nature of IT infrastructure and the need to protect resources that reside on-premises and, potentially, across multiple clouds will influence the future direction of DRaaS. Providers have their work cut out for them regarding their ability to provide coherent recovery objectives and service levels across cloud, on-premises and hybrid environments, according to ESG's Bertrand. Getting everything working in unison across highly distributed settings is a tough challenge, he noted.
"The ability for DRaaS to orchestrate and automate a unified customer experience across multiple cloud providers -- private, hybrid and hyperscale -- is still considered greenfield from a technology perspective," Bigler observed.
IT administrators can also expect to see developments in microservices and container-based protection, additional automation of recovery exercises and simplified testing, according to industry executives.
"The market is maturing very nicely, but it still has some way to go," Bertrand concluded.