Top three Windows server crashes (and how to avoid them)
Why wait until a server crashes to start the troubleshooting process? Admins can squash potential outages ahead of time by zeroing on the most likely culprits.
There are many types of Windows server crashes, but the vast majority fall under three main categories: old antivirus software, incompatible storage drivers and too many filter drivers. Having analyzed close to 1,000 crashes over the last decade from all over the world, I can personally confirm that these are pitfalls you’ll want to avoid.
Let’s look at these three common server crashers in detail and break down the best practices you’ll need to avoid them.
Antivirus software
By far, the most frequent type of Windows server crash is caused by old antivirus software. All antivirus software uses device drivers -- or more specifically, “filter drivers” -- that intercept the I/O (read and write) requests and perform additional checks. Antivirus drivers also compare known viruses contained in definition files to ensure contagious files are not propagated.
Filter drivers consist of kernel mode code that interacts with the operating system through low-level kernel functions and data structures. These functions and data structures contain a predefined set of arguments and data types that are expected to be present when called by the corresponding device drivers. If the function is passed the wrong data type or an incorrect number of arguments, an error can occur that may cause a system crash in kernel mode.
Related troubleshooting tips
- Troubleshooting general server outages
- Resolving print spooler crashes in Windows
- Tools for finding Windows memory leaks
The problem comes about when these kernel functions or data structures are changed by developers between versions of the operating system, such as service pack updates or major OS releases. While Microsoft does a great job of testing its device drivers for compatibility with any OS changes, it obviously does not test third-party device drivers to ensure they are compatible. Therefore, when old antivirus drivers stumble across these changes it ultimately results in a system crash. Other filter drivers are also susceptible to these issues, but antivirus drivers are the top offenders.
Let’s take a look at an example.
The following crash was a Stop 0x8E bugcheck -- KERNEL_MODE_EXCEPTION_NOT_HANDLED. The !analyze –v command in the Windows debugger reveals this stack pattern. Reading the stack from the bottom up, we see an NtCreateFile call that ultimately involves the buggydrv filter driver, which caused the bugcheck (the driver name has been changed to not incriminate the culprit). Using the !lmi buggydrv command shows us the date of the driver is from 2006, while the operating system -- Windows Server 2003 SP2 -- wasn’t released until 2007. Now we know that the old antivirus driver was never tested against the new version of the operating system.
nt!KeBugCheckEx+0x1b
nt!KiDispatchException+0x3a2
nt!CommonDispatchException+0x4a
nt!Kei386EoiHelper+0x186
buggydrv+0x13059 <-- filter driver that caused the crash
buggydrv+0x8390
buggydrv+0x8809
buggydrv+0x2940
nt!IofCallDriver+0x45
nt!IopParseDevice+0xa35
nt!ObpLookupObjectName+0x5b0
nt!ObOpenObjectByName+0xea
nt!IopCreateFile+0x447
nt!IoCreateFile+0xa3
nt!NtCreateFile+0x30 <-- operating system call to CreateFile
nt!KiFastCallEntry+0xfc
In this case, the crash had already been documented as a known issue by the vendor and a new version of the antivirus software was available to address it. In fact, the vast majority of Windows server crashes you will encounter have already been experienced by someone else and their resolutions have typically been documented somewhere on the Internet. As a result, it’s important to remember that any time you update your operating system -- even with a service pack update -- you should be sure to first check with your third-party vendors for updates to their software.
Incompatible storage drivers
The next most frequent type of server crash you can experience is caused by incompatible storage drivers. As you may know, third-party storage vendors provide device drivers to control their host bus adapters (HBAs) and are used to access storage devices. Vendors like Qlogic, Emulex and Hewlett-Packard (HP) have different device drivers, but they all depend on a Microsoft driver called Storport. The Storport driver provides a general set of routines that are used by these vendor-specific drivers when performing I/O operations.
The problem comes about in much the same way antivirus driver incompatibility does. When vendor-specific drivers are modified they must be retested with the current version of Storport to ensure they are still compatible. The same is true when Storport is updated -- all the HBA drivers must be retested to ensure they still function with the new Storport driver. This can be a real challenge when you consider that Storport had over 50 hotfixes in Windows Server 2003.
The rule of thumb is to check with third-party vendors for any HBA driver updates prior to updating Storport and vice versa. How do you know which storage drivers depend on Storport? Fortunately, there is a free tool called Dependency Walker (depends.exe) that is designed to reveal which drivers are dependent on others.
Once you’ve downloaded and unzipped the tool, run depends.exe and use the file pull-down menu to open the driver you are concerned about. In this example, I chose the Hpcisss2.sys driver, which is used for the HP Smart Array. As you can see below, the tool reveals that the Hpcisss2 driver is dependent on both STORPORT.SYS and NTOSKRNL.EXE.
Too many filter drivers
The third most frequent type of Windows server crash is related to stack overflow conditions when too many filter drivers are installed. Any driver that intercepts I/O requests and performs additional functionality is considered a filter driver. We already know that antivirus drivers are considered filter drivers. Others include disk quota management, disk mirroring and backup agents, to name a few.
While having multiple filter drivers installed is not problem in and of itself, there can be issues when these drivers interact with each other in a recursive fashion, thus depleting the limited kernel stack space. Depending on the computer architecture (x86=12 KB and x64=24 KB), there is a finite amount of kernel stack space used by all device drivers. When the kernel stack space is exhausted, a Stop 0x7F bugcheck occurs causing the system to crash, as documented in hundreds of Microsoft articles.
There is no way to provide additional kernel stack space to accommodate multiple filter drivers. The only option is to identify these filter drivers and disable or uninstall those that aren’t required. A tool called FLTMC (Filter Manager Control program) is built into the Windows Server operating system and allows you to identify which filter drivers are installed.
Figure 2. FLTMC tool (click to enlarge)
As you can see, Windows Server can crash for a number of reasons. The vast majority of these server outages, however, are caused by the issues listed above. Each of these outages can be avoided by updating your third-party drivers any time you upgrade the Windows OS or apply related hotfixes and limiting the number of unused filter drivers.
You can follow SearchWindowsServer.com on Twitter @WindowsTT.
ABOUT THE AUTHOR:
Bruce Mackenzie-Low, MCSE/MCSA is a master consultant and systems software engineer with Hewlett Packard providing third-level worldwide support on Microsoft Windows-based products. With over 20 years of computing experience, Bruce is a well-known resource for resolving highly-complex problems involving clusters, SAN’s, networking and internals. He has taught extensively throughout his career leaving his audience energized with his enthusiasm for technology.