Getty Images

Tip

How to use network automation to ease cloud integration

Cloud network automation can ease the integration of networking and cloud resources. Set clear objectives and standardize tools to make the process smoother.

Cloud network automation is a strategic framework that combines cloud computing principles with automated network management techniques. It helps enterprises manage and orchestrate the integration of networking resources with cloud environments.

Enterprises can use automation to connect their network to a cloud provider. Before doing so, however, they should consider how the provider uses network automation in its data center as well as how they plan to consume the provider's services. For example, cloud providers rely on technologies such as SDN and network functions virtualization (NFV) to connect with customers that often use infrastructure as code (IaC) tools to define their networks.

This article examines cloud network automation and the role it plays in orchestrating cloud resources and network components. It also looks at best practices network engineers should employ as they automate networks and how they can overcome potential challenges.

Network automation in cloud provider data centers

SDN and NFV both gained traction in the data center, especially among cloud providers and hyperscalers, due to their ability to enable programmability, virtualization and centralized management.

SDN helps overcome network issues that first arose with the advent of VMs, such as the following:

  • A MAC address table with a maxed-out number of entries.
  • Virtual LANs maxing out at 4,094.
  • Spanning Tree Protocol not reconverging fast enough.
  • The inability to load balance east-west traffic between VMs, especially those used in high-performance computing.

How SDN enables network automation

APIs are the most effective way to automate the network with SDN. In this implementation, SDN devices at the infrastructure layer expose their configuration attributes through the YANG modeling language. YANG also supports protocols such as NETCONF and RESTCONF, which it uses to accept configuration values from developers.

Developers can use Python to send device configuration data to the centralized network controller. The controller then uses exposed northbound and southbound interfaces to send that configuration data and apply it to devices. The whole process works through RESTCONF, which enables data to travel over HTTP.

Traditionally, vendors have offered SDN products that rely on imperative control. Due to concerns about single points of failure, however, most vendors have moved away from this in favor of a declarative approach.

Here's how the two approaches differ:

Diagram showing an SDN controller connected to three switches
An SDN controller connected to three switches.
  1. Imperative control. Users have to set up and micromanage each part of the network. For example, if they want to configure Link Aggregation Control Protocol, they have to define the configuration on the controller.
  2. Declarative control. Users define the end goal of the configuration as a policy, and then the devices connected to the controller apply the configuration within themselves. This approach might not use RESTCONF at the southbound interface of the SDN controller.

In the example below, let's use Python to interact with an SDN controller that is connected to three network switches.

The code below interacts with the SDN controller's API endpoint using REST API, so it can output all network devices connected to it. It sends a GET request to retrieve information about each of the network devices.

import requests
import json

# API endpoint and headers
api_url = "<http://172.16.20.254/api/v1/network-device>"
headers = {"X-Auth-Token": "my-token"}

# Make GET request
response = requests.get(api_url, headers=headers, verify=False)

# Check status and print network device details
if response.status_code == 200:
    for networkDevice in response.json()["response"]:
        print(networkDevice["hostName"], "\\t", networkDevice["managementIP"], "\\t", networkDevice["connectedInterfaceName"])
else:
    print("Error:", response.status_code)

Here is the output displaying information for each switch:

SW1        192.168.1.10       GigabitEthernet0/1
SW2        192.168.1.11       GigabitEthernet0/2
SW3        192.168.1.12       GigabitEthernet0/3

How NFV enables network automation

NFV enables organizations to run several network devices -- known as virtual network functions (VNFs) -- on a single compute server. Among other attributes, NFV lets companies rapidly scale up resources to meet demands as they change.

Popular examples are virtualized routers -- among them Juniper Cloud-Native Router or Cisco's 8000V -- available on a public cloud marketplace. Enterprises can deploy these routers as VMs on cloud providers, such as AWS or Google Cloud Platform. The routers can connect to an on-premises network via software-defined WAN.

This setup might look like the following:

Diagram showing a VNF router on a public cloud communicating with a non-VNF router on-premises
A VNF router on a public cloud communicating with a non-VNF router on-premises.

It's also possible to deploy this VNF as a Kubernetes pod. An open source project called Kubevirt enables users to run VMs as a pod.

Most VNF devices support network automation and run on Python. They use Python in the following ways:

  • Zero-touch provisioning. Python automatically provisions a device's initial setup network configuration. The configuration is saved on a Trivial File Transfer Protocol (TFTP) server, for example. When the device comes up, it uses the Dynamic Host Configuration Protocol to get its IP address as well as information about the Python configuration filename and how to locate the TFTP server. Once it has this information, it applies the configuration.
  • Automated network management. Python uses RESTCONF or NETCONF to link with the devices and retrieve status information, modify configurations and push updates remotely.

Older network OSes that don't support RESTCONF or NETCONF protocols use the command-line interface (CLI) instead. In these situations, organizations can use Netmiko, an open source Python library, to automate tasks via CLI commands over SSH.

Network automation on a customer's cloud

On public clouds, customers can create an isolated virtual network to deploy their services. They can perform this process manually via a console or use IaC, if cloud network automation is the goal.

The most popular open source IaC tools are Terraform, OpenTofu and Pulumi. Terraform and OpenTofu follow a declarative approach, while Pulumi is imperative. The private virtual network that is created spans one data center and aids in disaster recovery.

In the next example, we'll use Terraform to create an isolated virtual network on AWS.

Diagram showing an isolated virtual network on a public cloud
An isolated virtual network on a public cloud.

The Terraform configuration below, written in HashiCorp Configuration Language, provisions an isolated virtual network on AWS that has access to the internet. It first creates a virtual public cloud (VPC), in which enterprises can launch resources in the public cloud environment.

Screenshot showing the Terraform creation of a VPC

It then provisions a subnet within the VPC and creates an internet gateway that enables resources in the network to access the internet.

Screenshot of the creation of a subnet using Terraform

Finally, the configuration sets up a route table that maps to the subnet and manages network traffic.

Screenshot showing the creation of an internet gateway using Terraform

Best practices for implementing network automation

Screenshot showing the creation of a route map using Terraform

Enterprises can follow established best practices to create a stable, secure and scalable cloud network automation implementation. Below are some helpful steps:

  1. Define goals and strategy. Set clear automation objectives, such as achieving scalability, improving performance or enhancing efficiency. Clarifying goals lets teams start small, making integration easier and more manageable.
  2. Use declarative configurations. Define end goals, not each detail. This is especially helpful in dynamic cloud environments.
  3. Standardize tools and protocols. Stick to a few automation tools and protocols, such as Terraform and NETCONF, to reduce complexity and errors.
  4. Secure automation endpoints. Protect API keys and limit access to automation tools using role-based access control (RBAC), network segmentation and other techniques to prevent unauthorized changes.
  5. Set up monitoring and alerts. Set up real-time monitoring and alerts to catch issues early and track network health.
  6. Test in a sandbox first. Validate automation scripts in a nonproduction environment to avoid live disruptions. For example, create a Terraform workspace to isolate the environment or write a Python unit test using Pytest to validate the code.
  7. Use version-control and a continuous integration/continuous delivery pipeline. Use Git to track changes, as it enables quick rollbacks if issues arise. A CI/CD pipeline can streamline deployments, reduce errors and ensure that configurations and scripts are always tested and ready for production.
  8. Add failover and self-healing. Build failover strategies into automation workflows to ensure reliability.
  9. Update documentation. Keep automation scripts and network documentation current to aid troubleshooting.
  10. Train and improve. Encourage teams to take certification exams, as they help keep employees familiar with new technologies.

Overcoming challenges in cloud network automation

Enterprises will likely face various challenges when implementing cloud network automation. Below are some ways to address those challenges:

  1. Multivendor complexity. Different vendor tools and protocols can complicate automation across environments. Standardize vendor-neutral protocols, such as NETCONF and RESTCONF, and use libraries such as Netmiko and Ansible to oversee multiple vendors.
  2. Security risks. Automation endpoints can expose the network to potential security threats. Implement RBAC, encrypt data and enable logging to secure endpoints and monitor access.
  3. Legacy devices. Older devices might lack support for modern APIs, which limits automation. Use CLI-based tools like Netmiko for SSH-based automation on legacy devices and plan to gradually upgrade to API-compatible models.
  4. Configuration drift. Frequent updates can lead to configuration inconsistencies across devices. Use CI/CD pipelines with version control and automated compliance checks to prevent drift.
  5. Scalability issues. As networks grow, automation workflows might struggle to keep up. Design a framework using SDN controllers, favor declarative configurations and actively monitor automation performance.
  6. Limited expertise. Teams sometimes lack the necessary skills to effectively automate the network. Invest in training, focusing on Python, IaC tools and relevant networking protocols.
  7. Troubleshooting and visibility. Automation can sometimes hide what's happening in network operations, making it harder to troubleshoot issues. Use SDN controllers and monitoring tools to enable real-time visibility and detailed logging.

Cloud network automation is the key to making cloud integration work smoothly. Cloud services can help organizations boost performance, strengthen security and scale operations. Organizations can streamline their network management processes by using SDN and NFV, along with IaC tools. As cloud technology continues to evolve, adopting automation will be crucial for organizations to stay competitive and drive innovation.

Charles Uneze is a technical writer who specializes in cloud-native networking, Kubernetes and open source.

Dig Deeper on Network infrastructure