Getty Images

Tip

Ensure network resilience in a network disaster recovery plan

A network disaster recovery plan doesn't always mean network resilience. Learn how factors like funding, identifying potential risks and constant updates make a difference.

A key attribute of resilience is the ability of an item to adapt to a situation and overcome a problem to the point where it returns as close to how it was before the event occurred.

The notion of resilience has grown because of the COVID-19 pandemic, which forced many organizations to reinvent how they worked due to employees getting sick from the virus and working remotely.

In the case of networking, network availability is an important business asset. Loss of internet access, loss of wireless communications and loss of the ability to connect with other employees, offices and customers are among the worst-case networking scenarios organizations currently face.

A truly resilient network can adapt to an outage, whether it involves transmission channels, switching systems or both. It then returns to "networking as usual" or to a performance level as close to the network's pre-incident status as possible.

The use of technology disaster recovery (DR) plans is an important step toward protecting the integrity of voice and data communications networks. But does a network DR plan -- and its associated technology resources -- ensure network resilience?

Risks, threats and vulnerabilities to network integrity

Figure 1 shows a typical network infrastructure and its many risk points or potential points of failure. One immediate risk factor teams should investigate is where network resources are located -- e.g., in a secure building, buried underground, or aboveground on telephone poles or metal towers.

Diagram showing different risk points in a network
Figure 1. Consider these network risk points when creating a network DR plan.

A variety of events can disable or damage each of the items highlighted in Figure 1. How teams deploy those resources can make a difference when a severely disruptive event occurs. This means it's essential for teams to identify all possible risk points along likely transmission routes when developing a network DR plan.

Logically, network resources located in a secure building and placed underground in conduit are less likely to be at risk from aboveground events, such as severe weather. However, flooding can cause problems if rising water infiltrates belowground cable and equipment vaults and storage areas, especially if these resources aren't housed in waterproof enclosures.

Typical risks and threats to network integrity include the following:

  • severe weather;
  • lightning strikes, flooding, mudslides and earthquakes;
  • loss of power or equipment failure;
  • software failure;
  • human error;
  • sabotage or cybersecurity breaches;
  • loss of network perimeter security;
  • environmental disruptions -- e.g., excessive heat, loss of air conditioning or humidity not in acceptable limits; and
  • construction -- e.g., digging up buried cables or damage to internal building wiring during construction.
Diagram of star network topology
Figure 2. A star topology features a central device that transmits data to other nodes in the system.

Building a network DR plan to achieve resilience

Before building a network plan to achieve resilience, network teams should first determine what constitutes resilience to their company. This process factors in how the firm operates and uses networks, employee dependence on networks and how senior management views resilience.

It can be expensive to achieve a truly resilient network infrastructure -- one that identifies all possible failure points and has enough redundancy to recover quickly. Figure 2 depicts a hub-and-spoke network, or star topology, that limits connectivity to phones and business systems to a single communications link. This is how many networks were configured before the advent of the internet.

Diagram of mesh network topology
Figure 3. A mesh network topology connects each device to every other device in the network.

In contrast, Figure 3 depicts a mesh network, in which network points are connected to each other. The loss of one or more network channels likely won't disrupt communications, as each network node can redirect traffic to an alternate path.

The cost to configure such a network, especially when using direct point-to-point channels, can be expensive. The internet also uses this configuration, which is one reason why the topology has become a popular tool for building secure network infrastructures. The choice of network topology is an important factor to consider when building a network DR plan.

As such, building a network DR plan that facilitates resilience assumes the network infrastructure itself is as resilient as possible, considering the costs and options available.

How to go from DR to resilience

Once network teams have defined the characteristics of resilience, they can follow the steps from disaster recovery to resilience, as shown in Figure 4.

A diagram showing the path from network disaster recovery to resilience.
Figure 4. The path from network disaster recovery to network resilience

Figure 4 shows a possible scenario in which a disruptive event occurs and the extent to which a DR plan can assist the organization. The network resilience portion of Figure 4 suggests a state of network operations that might be beyond the level a DR plan can achieve. The results can vary considerably based on management's expectations and the available budget.

Teams can achieve network resilience in the following situations:

  1. They understand management's expectations and implement them with a technology and people strategy.
  2. Funding is available to implement the necessary resources and cover costs to manage and maintain them. The composition of the DR plan doesn't need to change, other than perhaps procedures and details of systems and services teams added to increase survivability.
A diagram showing the use of internet as network infrastructure
Figure 5. This PBX system uses the internet to link all its systems.

Figure 5 presents one way to increase resilience when using a VoIP PBX system. In Figure 5, all elements of the PBX system are linked using the internet. In effect, the internet provides a network-as-a-service arrangement.

Because the internet is a virtually indestructible network and not likely to fail, the next challenge for teams is arranging internet access. While ISPs, MSPs and local exchange carriers access the internet, customers still have to connect to their switching offices. This is traditionally called the last mile and is represented by Figure 5.

Two techniques to increase access redundancy to local carriers -- and then ISPs, MSPs and others -- are depicted in Figure 6.

A diagram showing alternative strategies for local network access
Figure 6. Enterprises can add redundancy to their local access designs by using wireless or connecting to a second local access carrier via alternative routes.

Enterprises can access a second local access carrier's central office via alternate routes to increase resilience at the local level, but this option can be quite expensive. Figure 6 also depicts two alternate cable entry points into a building, something which might be prohibitively expensive unless the building was originally built with separate cable access routes.

Another alternative is wireless local access using cellular technology or possibly point-to-point microwave. A combination of these approaches might provide additional redundancy.

Other ways to increase network redundancy and survivability include the following:

  • backup power systems;
  • multiple equipment rooms that house redundant components;
  • diversely located cable runs in buildings;
  • spare circuit boards that are periodically checked;
  • backup copies of OSes and databases supporting VoIP PBXs and other devices;
  • redundant network perimeter devices, such as firewalls and intrusion detection and intrusion prevention systems; and
  • updated backup copies of all critical software used to run the network.

On top of all these assets, teams should regularly exercise their DR plans and redundant network components to ensure they're working. They should also ensure that all network management teams are trained in DR procedures, check that all vendor and carrier contacts are current, and engage the carriers and vendors when planning a DR test.

It is important for vendors and carriers to understand what their customer expectations are in a disruption. Ensure that service-level agreements address vendor and carrier responsibilities in an emergency.

Summary

True network resilience is often just out of reach. However, enterprises can get close to a state of real network resilience by following the criteria noted in this article.

Next Steps

Overview of network management tasks and best practices

Dig Deeper on Network strategy and planning