Konstantin Emelyanov - Fotolia
How to troubleshoot unexpected ESXi host reboots using crash logs
When your ESXi host abruptly reboots, you can use logs to track the potential causes, whether they are planned, environmental or hardware-related.
Sometimes an ESXi host reboots unexpectedly, often during a power outage if the uninterrupted power supply doesn't last, but sometimes as a result of a core dump or faulty hardware.
This can cause your ESXi host log to end abruptly and then restart. If you were off site during the failure and reboot process, check for UPS failure when you return, then look to your environment for clues.
Troubleshooting with inconsistent ESXi host logs across reboots creates challenges. If you redirect your ESXi host logs to a shared data store or an external software application, such as VMware vRealize Log Insight, you can avoid such trouble.
Search for redirected log files
You can check for redirected logs in your web browser by connecting to either the vSphere Web Client or the Host Client for unmanaged hosts.
Select your host. Under the Configure tab, select System, and then Advanced System Settings.
Once you determine whether your ESXi host logs were redirected, you can check whether the host restarted intentionally. Look in the /var/log/hostd.log directory. Certain results indicate a deliberate reboot, such as the following:
Hostd: [12:51:54.284 27D13B90 info 'TaskManager'] Task Created : haTask-ha-host-vim.HostSystem.reboot-50
or
DCUI: reboot
Was there a core dump?
A VM can sometimes generate a core dump. You can check if you have the required partition available for a core dump through the Direct Console User Interface, either at the console in the server room or via the Intelligent Platform Management Interface.
You can also check by using a Secure Shell (SSH) client, such as PuTTY, to connect remotely to vCenter or your ESXi host. You must first configure SSH access to your ESXi host.
To list partitions available for core dump, enter the esxcfg-dumppart -l command at the command prompt.
To activate or deactivate core dump partitions, enter the Esxcfg-dumppart -h command.
ESXi hosts don't automatically collect core dumps. To collect the core dump, you must manually run the esxcfg-dumppart command with an option that works for your environment.
Check ESXi automatic reboot configuration
Execute this command to check if ESXi is configured to automatically reboot after a Purple Screen of Death (PSOD):
esxcfg-advcfg -g /Misc/BlueScreenTimeout
If the value listed is anything other than 0, then ESXi automatically reboots after the PSOD. If the output is 0, the system is configured to wait for you to manually restart the host.
Power outages or faulty hardware
If your ESXi host experiences an outage as a result of something other than a kernel error, a human reboot or an intentional shutdown, the hardware might have caused it. Hardware sometimes causes an ESXi host to reboot unexpectedly due to a faulty component, a heating problem -- such as an air conditioning failure -- or a power outage in the data center.
If you work in a location where the power often fails, you might consider investing in UPS protection, a generator or solar-powered battery backup in case of long-term power failures.