Ruslan Grumble - Fotolia

Tip

Is Windows Server troubleshooting on the way out?

When a Windows Server workload starts to falter, what's the best way to proceed before it keels over? Once upon a time, you would try to fix it, but times have changed.

Whenever an important Windows Server workload breaks, a clock starts to tick on how quickly you can get it working.

There are multiple options to get a server back online, but which is the right one? As is the case with many questions related to IT, the answer is: It depends. Knowing which one to choose is critical -- and this decision-making process can be incredibly stressful -- because each has varying levels of effort and time to complete.

It's inevitable for Windows Server to break due to a variety of reasons, such as a bad update or an abrupt shutdown. You have three distinct choices to correct a defective server, which have evolved over time. Correcting issues on the server 20 years ago had a lot of value, but the Windows Server troubleshooting process slowly gave way in favor of backups. Today, a restore operation is still valid, but there's another shift toward replacement.

Why resolving a server breakdown is different

To see how we got to where we are today, let's look at how we used to fix Windows desktops as an example. Years ago, no matter what the issue was, you went through a Windows Server troubleshooting exercise to correct it as quickly as possible. This was due to both the expense of the hardware and loss of data if a desktop needed replacing.

Today, with servers and cloud storage for your applications and data and relatively inexpensive desktop hardware, the concept of fixing a desktop is archaic. No company enjoys losing money on hardware, but the cost of keeping an employee offline is more than keeping a few spare desktops around to swap in and out.

While this substitution process worked for lower-cost desktops, it was too expensive for server hardware. However, this changed when virtualization gave us a new option for servers and made the repair-or-replace decision fuzzy. Unlike desktops, servers often are backed up, so we can revert back to points in time. We also have the traditional option of fixing what is broken, and because of virtualization, we can create a new instance of that server.

Windows blue screen
When a Windows Server workload stops running after an error, there are three options for administrators to get the deployment functioning.

Option 1: Restore the server

One of the first and traditional methods of server repair is restoring what is missing or damaged. Server restore can be ideal for rolling back updates or other changes quickly, provided the server functions and has a working backup agent. This method can be the right choice for minor repairs or missing files. Even if the server is not working, it's possible to restore the server from a full backup to a previous state.

The benefit here is you have less configuration work and changes to make, but on the flip side, the restore process can take some time. Plus, what damaged the server might still be present, so a successful server restore doesn't mean you fixed the original problem.

Option 2: Fix the server

This brings up the second option: Repair the server. For applications or OS pieces that break, you might be able to correct the issue -- if you can find it.

If you run Server Core -- the GUI-free Windows Server OS -- you'll need to be well-versed in the requisite command-line interface commands to troubleshoot the OS. In a time of crisis, looking up commands on the internet isn't ideal. You should have at least one tools server with a GUI that you can quickly navigate from as long as the broken server can be remotely managed.

To some degree, the modern server is a black box that is designed for replacement rather than repair. Fixing the server comes back to the desktop example where the path of least resistance is either a restore or replacement. This also becomes a business driver, not a technology driver, as there is often a lot of pressure associated with certain applications that must get working as quickly as possible.

The downside to a decline in Windows Server troubleshooting is it reduces the admin's ability; when you fail to use a skill, it tends to get rusty and outdated. The same goes for the vendors that make the troubleshooting tools. If fixing the issue is done less often, then there is little need for these tools, and they begin to disappear from the market. It's a vicious circle that, in the end, means, if it's something that can't be fixed quickly, then you have only two other options.

Option 3: Replace the server

This brings up the third and most unique option: replacement. Replacing a broken server was unheard of until virtualization became mainstream.

Before virtualization, applications were a monolithic design where a small group of servers handled the applications at a single focal point and were sized accordingly. This meant a single server was more critical, and fixing it or restoring data was the only option.

Virtualization introduced the ability to scale out rather than up. This distributed model avoids the single point of failure and keeps applications online despite the loss of a single server. This approach makes server replacement a more viable option than a restore or a repair effort, especially when paired with an automation routine that can deploy a replacement virtual server with minimal effort or time.

Not all applications support swapping out pieces, but this does seem to be the direction the servers are going as the industry continues to embrace containers and other GUI-less server platforms.

Dig Deeper on Microsoft messaging and collaboration