Tech Accelerator What is network management?
What are the 5 types of network management? What is network downtime?
Tip

5 principles of change management in networking

Network change management includes five principles, including risk analysis and peer review. These best practices can help network teams reduce failed network changes and outages.

Network change management is a process that aims to reduce the likelihood and risk of a failed change. This process entails several steps that ensure successful changes, but how does each step work?

Aircraft pilots use well-defined processes to ensure safe flying. Similarly, networking teams can use defined processes to reduce the risk of failed network changes that create unplanned outages. Still, organizations sometimes find that changes don't go as planned, resulting in a network outage. Some disruptions are due to a process failure, while others are due to nonobvious results of complex configurations.

This article discusses the basic operating principles of network change management, such as the following:

  • Scope determination and risk analysis.
  • Peer review.
  • Pre-deployment testing and validation.
  • Implementation and testing.
  • Documentation updates.

Before entering the change management process, network teams must establish the change details, such as new configurations, device connection information and documentation.

1. Scope and risk analysis

The first step in the network change management process is to evaluate the scope of a proposed change. Determine which services might be affected and the stakeholders who use those services. Consider the blast radius of a change for its potential scope and effect, including any possible negative outcomes.

Teams should measure the scope in terms of the two following factors:

  1. The number of endpoints affected by a change.
  2. The importance of the services a change might affect.

Once teams identify the scope, they should perform a risk assessment of the change. Is it something that's been done numerous times in the past and is well understood? Is it fully automated, or is it possible that human error will alter the change in an unexpected way? Is the involved technology well understood, or is there a chance that something unexpected might happen?

The scope of a change figures into the risk. A change to infrastructure on which key business processes run has a greater level of risk to the business than a change to a small branch site.

Network teams can use a risk factor calculator that assigns values to key parameters. To create a risk calculator, average the values from the following example parameters, or search for a calculator on the web:

  • Will the effect be visible to customers? (No = 1, Yes = 10)
  • How many customers could be affected? (Range of 1 to 10)
  • How important are the services within the scope? (Range of 1 to 10)
  • Has this change been successfully implemented in the past? (Yes = 1, No = 10)
  • Is the change automated? (Range of 1 to 10, depending on the extent of automation)
  • Can the change be thoroughly tested prior to implementation? (Yes = 1, No = 10)
  • Is the vendor documentation clear and unambiguous? (Range of 1 to 10)
  • Is the peer review thorough, and did it surface any potential issues? (Range of 1 to 10)

The greater the risk, the more careful teams need to be during the remainder of the change management process. Ensure teams have clear change control documentation in place, detailing the rationale of any changes, rollback procedures and scope.

2. Peer review

The next step is to conduct a peer review. While teams can perform this step before the risk analysis, it is better to use the risk level to drive the thoroughness of a peer review. All peer reviews should be comparably thorough, but it's likely that teams conduct cursory reviews for routine changes, such as access control list changes or virtual LAN modifications. Automated testing and deployment of routine changes can help mitigate the risk of cursory peer reviews.

Typically, internal staff who are familiar with the network conduct the peer reviews. If a change is out of the ordinary, however, it makes sense to have an expert from the equipment vendor perform the review. The reviews should feed back to the risk analysis phase and update the technical risk measurements, such as indicating whether testing and documentation are sufficient.

Peer reviewers should examine the following factors during a review, among others:

  • Configuration scripts.
  • Hardware and software compatibility.
  • Rollback procedures.
  • Change rationale.
  • Business needs.
  • Network security and compliance.
  • Templates and documentation.
5 steps to follow in network change management
Follow these network change management steps to ensure successful changes.

3. Pre-deployment testing and validation

Ideally, all changes go through a pre-deployment testing and validation phase. Consider automating low-risk, repetitive tasks and changes to remove the temptation to skip testing for changes that teams perceive as low risk. The greater the scope and risk, the more important it is to properly test and validate the proposed change.

The prevalence of virtual router and switch OS instances makes it easier to automate the creation of test network topologies without expensive hardware investments. Use network labs and sandboxes to build automation workflows in a virtual network topology that teams can tear down when the tests are successfully completed.

Pre-deployment testing includes several steps teams should follow to evaluate a proposed change:

  1. Verify that the test network works as intended prior to the change.
  2. Implement the change in a test infrastructure to confirm that the change results in the desired final state. Teams should use automated processes to avoid human error and reduce the time to validate the change. If a validation in the test environment fails, determine the reason. Did it fail because the change was incorrect, or was it because the test network didn't accurately represent the real network?
  3. Test the backout change process so it's easy to revert to the previous state if something goes wrong. The rollback should return the network to the starting state, which teams can validate by repeating Step 1.

4. Implementation and testing

Deployment and post-deployment testing and validation should follow the same process as in Steps 1 and 2 of pre-deployment testing. If teams have done a good job of pre-deployment testing and validation, nothing unexpected should happen. If post-change testing detects an unexpected problem, teams should back out the change and verify service restoration.

Some network protocols require more time to converge after changes to large networks. As such, post-change verification should incorporate delays or convergence tests, which pre-deployment testing in a small test environment doesn't need.

Many organizations automate network configuration changes with the goal of migrating to a DevOps culture based on infrastructure as code. The objective is to adopt a continuous integration/continuous delivery testing and deployment process for low-risk changes.

5. Documentation and network management updates

Ideally, teams create and update documents during the change creation process, enabling them to review the documentation and network management changes along with details of the change. Once teams have implemented and verified the change, they can incorporate the documentation changes into a network documentation system.

Don't forget to update the network management system as needed. Most network management systems have APIs that enable automated processes to make the changes.

If the change validation step is automated, it can be incorporated into periodic network validation checks. These periodic checks can detect failures in highly redundant and resilient networks. Over time, teams build a library of network validation checks that cover many parts of the network.

The principles of good network change management provide direction to reduce unplanned network outages due to failed changes. Teams should create a process that works for their organization and work toward making that process highly efficient.

Editor's note: This article was originally written by Terry Slattery and updated by TechTarget editors to improve the reader experience.

Terry Slattery is an independent consultant who specializes in network management and network automation. He founded Netcordia and invented NetMRI, a network analysis appliance that provides visibility into the issues and complexity of modern router- and switch-based IP networks.

Next Steps

A guide to network lifecycle management

Dig Deeper on Network management and monitoring