kentoh - Fotolia
Ensure network resilience with redundancy and skills
Ensuring network resilience doesn't just mean building redundancy in network infrastructure. It should also include planning contingencies for people and skills.
Vaccines train immune systems to prevent specific diseases, preparing the body without exposing it to the full risks. In the same way, network resiliency acts like a vaccine that boosts an organization's immune system to prevent downtime, according to Prashanth Shenoy, vice president of enterprise networking and cloud marketing at Cisco.
While network resilience has always been crucial, the COVID-19 pandemic has shown how resilience is an essential component of organizations' business continuity (BC) plans. In a matter of days, companies faced a test of whether they could adapt to a remote workforce and maintain business operations. Those that couldn't adjust lost revenue and struggled to keep up, said Chris Groves, director of technical systems engineering at Cisco, during a Cisco webinar on building network resilience.
Amid the pandemic, network teams relied on their ability to scale VPN capacity and provide secure remote access to ensure network resilience. But the crisis could also influence how organizations build their network resilience strategies moving forward, transitioning from a reactive approach to a more proactive one -- much like the difference between medicine and a vaccine, Shenoy said.
"It's a different mindset shift from treating symptoms with medicine to building an immune system in your IT organization," he said.
How enterprises can ensure network resilience
Redundancy is the most vital component of network resilience, but it comprises multiple aspects, according to John Burke, CTO and principal analyst at Nemertes Research. To build proper network redundancy, Burke said network teams should implement the following factors:
- multiple connections, which can each carry the main burden of traffic by itself;
- redundancy at the physical path level so the different connections travel by separate routes;
- carrier and service provider redundancy, ensuring each has its own physical infrastructure or uses separate infrastructure from each other;
- redundant technology -- i.e., fiber optics and broadband -- to lessen the susceptibility of problems to one connection medium and provide a second medium with which to work; and
- network equipment with failover, redundant power supplies and path separation within the data center.
While most businesses understand how to build redundancy with switching to ensure data paths, they should also consider users and networking applications, such as load balancing, content filtering and firewalls, said John Fruehe, an independent analyst.
"Any failure from those [applications] could cause problems with network availability for end users, even if they have a solid path through the network," he said.
No single technology or capability will ensure network resilience and BC. The most important step is to plan and understand the physical network and all the data paths, Fruehe said. Part of that planning process is figuring out the value each business aspect brings to the company and determining the cost per second of downtime. Fruehe broke those aspects into the three following categories:
- Critical. If the service goes down, you might have angry users.
- Business-critical. If the service goes down, you can't perform a certain business aspect, and it will cost a certain amount per hour.
- Mission-critical. If the service goes down and you can't do this function, you're out of business.
When teams quantify the cost of downtime, they can better focus on making sure crucial aspects of the business are functioning and capable. Burke agreed, stating how calculating downtime and cost is a conversation IT teams need to have with leadership to ensure tradeoffs are clear.
"Like a lot of other things in architecting enterprise IT services, how resilient you are comes down to cost and risk," Burke said. For example, enterprises should consider how much they're willing to pay to balance the risk of being down for long periods and how much redundancy they can afford to implement.
But Fruehe said the pandemic also revealed another vital element of network resilience: people and skills. If employees in specialized roles call out sick or can't make it to work, an organization needs to have someone with the necessary skills to fill in temporarily.
"You don't just need resiliency in applications and hardware; you need resiliency in skills," he said. "Make sure you have contingencies. It's important to cross-train people."
Automation for resiliency
Cisco's Shenoy and Groves highlighted automation and NetOps as emerging trends that will help enterprises build network resiliency. When Cisco needed to prepare for the shift to work from home, automation was key, Groves said.
"The first priority was secure remote access at scale. But then we had to enable business processes, and we could only do that quickly enough with automation," Groves said. His team implemented automation to increase agility, provide more complete visibility and gauge network issues.
But many enterprises don't operate with the same level of resources as Cisco. As a result, Fruehe said, automation adoption has taken a hit in the short term. Even though most teams see the value of automation to garner efficiency, when COVID-19 hit, they panicked about maintaining operations and making sure remote workers were productive, he said. When operations return to normal, he said, he expects automation to see a boost.
"It'd be like changing the engine on an airplane when you're in flight," Fruehe said. "Automation will take a backseat in the near term, but it will have real opportunity in the long term."
Resiliency for cloud environments
With massive cloud outages recently hitting AWS and Google -- and, as a result, affecting millions of customers -- network teams face the question of how much they can truly prepare for outages that are beyond their control.
Enterprises have moved security and other services to the cloud, recognizing the tradeoff of control for resiliency, efficiency and expertise, Fruehe said. The trend has extended to networking as companies see the value of redundancy in a cloud-based network hosted by a provider that has thousands of servers.
But how can enterprises prepare for an impending cloud outage? In many cases, enterprises might not be able to do a lot, Burke said. With unlimited budget and resources, he said teams could spin up temporary or permanent presences in other cloud providers that offer the same services. Alternatively, organizations could take an "omni-cloud approach" and split applications or workloads across multiple clouds, so if one cloud has a problem, another cloud can still perform. But those aren't trivial, easy options, he added.
For Fruehe, a best practice is to align the organization's networking to the location of the application. If, for example, enterprises run in-house applications in the data center, all networking and resiliency should be built around internal networks, he said.
"What you don't want is to get hit with in-house applications that are impacted by cloud downtime," Fruehe said.