zentilia - Fotolia
Minimize downtime during a data center upgrade
End users don't want their productivity interrupted by IT maintenance. With the right technology, testing and planning, you can reduce or eliminate downtime in the data center.
Hardware and software data center upgrades are an inevitable part of operations, but you can take measures to ensure that they do not result in any data center downtime. The key steps that help you avoid downtime during an impending upgrade are planning, testing and redundancy.
The first step in any upgrade is planning. Start by thinking about planning in terms of what hardware to buy or ensuring that a new software version is compatible with other resources on your network. This stage of the process is also about figuring out how to perform the upgrade with no disruption.
Traditionally, data center upgrade planning meant giving end users advanced notice that various systems would be taken offline for maintenance, and then performing the upgrade late at night when nobody was in the office. This approach is far less viable in light of modern workflows where users often work remotely and do not limit themselves to working during business hours. Furthermore, global organizations must support end users who work throughout the day in different time zones.
One way to limit the effects of an upgrade is to temporarily move the affected workloads to a public cloud. Once you move those workloads and reroute the traffic, you can begin the upgrade process without worrying about end-user workload disruption.
Testing a data center upgrade
The second step in preparing for an upgrade is to test anything that you can, prior to the upgrade. Your ability to perform pre-upgrade testing may vary depending on what you are upgrading, but there is usually something you can test.
For instance, if you plan to upgrade a piece of software to a newer version, you might want to work through the upgrade process in a lab environment to get a feel for how it works. Once the lab setup is able to run the new software version, you can test for bugs and compatibility issues.
If you don't have the expertise in-house, be sure to research service providers that can help with configuration and software testing. For larger or specialized upgrades, this helps reduce the amount of troubleshooting you have to do after the upgrade process.
Redundancy
When you hear about redundancy, the discussion usually centers on fault tolerance. Even so, it is helpful to keep critical workloads online throughout a data center upgrade. If, for example, you need to replace an aging network switch, you would typically establish a redundant communications path through a secondary switch prior to performing the replacement. This prevents your workloads from losing connectivity during the upgrade.
Similarly, Windows servers within a data center are usually grouped into failover clusters. Microsoft designed the Windows failover clustering feature to support rolling upgrades; you can upgrade cluster nodes one at a time in a way that allows the cluster to remain online -- minus the node that is in the process of being upgraded -- throughout the upgrade process. A node is placed into maintenance mode, taken offline, upgraded and then brought back online and taken out of maintenance mode.
You can then repeat this process with the next node until every node is upgraded. Because only one node is offline at a time, all of the cluster's highly available workloads will remain online throughout the upgrade process. You must ensure that the cluster has sufficient power and cooling resources to run its workloads in the absence of a cluster node before starting the upgrade, however.