Gajus - Fotolia
4 components of a disaster recovery plan to prepare for a crisis
IT teams must take a proactive approach to crisis management and disaster recovery. Use these four guidelines around communication, monitoring and more to build a plan that works.
While most companies have disaster recovery plans, few tend to check them completely -- and many of those plans only cover limited situations.
The details will vary between companies, but there are some specific disaster recovery plan components to create a solid framework -- no matter what the event might be.
1. Standardized communication
One of the most critical components of a disaster recovery plan is an up-to-date communication strategy. An outdated list of staff phone numbers is a recipe for disaster that knows no bounds -- especially while trying to use a free conferencing service.
Often, companies use a combination of different tools, such as Slack and texting, with little messaging platform consistency. If teams can't communicate, they can't work smoothly in a disaster. Implement a service that everyone has access to, such as Slack, Microsoft Teams or Zoom, and make it the paid standard tool. Do not rely on free versions of these tools: Free services can be cut off or reduced in quality without notice. In addition to a conferencing tool, ensure that all contact information, such as personal emails and cell phone numbers, is up to date; review this information every six months to document changes.
2. Prepared staff
As a crisis shifts IT teams' focus to remote workers and their requisite setups, it's easy to overlook the importance of in-house skill sets. But IT staff members must have the technical ability to enact their DR plan -- however it supports offsite workers or enables remote access to data and systems. The IT team can't do that if they are unprepared, or lack access to the necessary systems and tools.
In a traditional office setting, staff has access to all the monitoring and management tools and data for the entire IT environment. This information can be accessible across a network operations center (NOC) -- or even in a series of monitors set to display it. Remote IT staff, however, must now access that critical information from a single laptop screen. The drop off in functionality is significant. And in a DR event, staff must be more responsive than ever -- with limited resources and information.
Not all team members need a home-based NOC, but consider the tasks they'll need to perform and the resources they'll need to perform them. It could be as simple as an extra monitor or external keyboard and mouse; the goal isn't to replicate the full office setup, but to maintain a similar level of functionality.
3. Monitoring metrics
Another critical component of a disaster recovery plan is an IT monitoring strategy, and metrics, specifically designed for a time of crisis.
If IT staff and end users must function differently -- working from alternate sites and using alternate tools -- then teams must prioritize and track metrics that are specific to that scenario, such as application response time, concurrent connections or bandwidth allocations. Rather than focus exclusively on the creation of a DR site, IT operations staff must understand what they might need to monitor and create alternate dashboards before they become necessary. The creation of a monitoring dashboard isn't difficult, but it's not something you want to do mid-crisis.
4. A backup plan
Lastly, be sure to back up alternative sites and resources regularly -- and with the same policies as the primary sites. While this step might sound minor, especially with the focus on keeping core systems online, any failover to a DR location involves data that changes and is critical to the business -- and therefore must be protected. The DR site must be more than a backup store for the primary site: IT must be able to create new backups that, at some point, they can replicate back to the primary site.
An effective disaster recovery strategy doesn't merely enable IT to fail over the environment for a few minutes or hours. Rather, it enables IT to provide users with continued service operations covered by the DR configuration. The question is not what is necessary to restart a service, but what is necessary to support it once restarted.