Konstantin Emelyanov - Fotolia

Tip

How to troubleshoot unexpected ESXi host reboots using crash logs

When your ESXi host abruptly reboots, you can use logs to track the potential causes, whether they are planned, environmental or hardware-related.

Sometimes an ESXi host reboots unexpectedly, often during a power outage if the uninterrupted power supply doesn't last, but sometimes as a result of a core dump or faulty hardware.

This can cause your ESXi host log to end abruptly and then restart. If you were off site during the failure and reboot process, check for UPS failure when you return, then look to your environment for clues.

Troubleshooting with inconsistent ESXi host logs across reboots creates challenges. If you redirect your ESXi host logs to a shared data store or an external software application, such as VMware vRealize Log Insight, you can avoid such trouble.

Search for redirected log files

You can check for redirected logs in your web browser by connecting to either the vSphere Web Client or the Host Client for unmanaged hosts.

Select your host. Under the Configure tab, select System, and then Advanced System Settings.

Syslog.global.logHost field
Figure A. Use the Advanced System Settings menu to check if your ESXi logs were redirected.

Once you determine whether your ESXi host logs were redirected, you can check whether the host restarted intentionally. Look in the /var/log/hostd.log directory. Certain results indicate a deliberate reboot, such as the following:

Hostd: [12:51:54.284 27D13B90 info 'TaskManager'] Task Created : haTask-ha-host-vim.HostSystem.reboot-50

or

DCUI: reboot

Was there a core dump?

A VM can sometimes generate a core dump. You can check if you have the required partition available for a core dump through the Direct Console User Interface, either at the console in the server room or via the Intelligent Platform Management Interface.

You can also check by using a Secure Shell (SSH) client, such as PuTTY, to connect remotely to vCenter or your ESXi host. You must first configure SSH access to your ESXi host.

To list partitions available for core dump, enter the esxcfg-dumppart -l command at the command prompt.

PuTTY SSH client
Figure B. Access a list of ESXi dump partitions by entering a command into an SSH client, such as PuTTY.

To activate or deactivate core dump partitions, enter the Esxcfg-dumppart -h command.

ESXi hosts don't automatically collect core dumps. To collect the core dump, you must manually run the esxcfg-dumppart command with an option that works for your environment.

Check ESXi automatic reboot configuration

If your ESXi host experiences an outage as a result of something other than a kernel error, a human reboot or an intentional shutdown, the physical hardware might have caused it.

Execute this command to check if ESXi is configured to automatically reboot after a Purple Screen of Death (PSOD):

esxcfg-advcfg -g /Misc/BlueScreenTimeout

If the value listed is anything other than 0, then ESXi automatically reboots after the PSOD. If the output is 0, the system is configured to wait for you to manually restart the host.

Power outages or faulty hardware

If your ESXi host experiences an outage as a result of something other than a kernel error, a human reboot or an intentional shutdown, the hardware might have caused it. Hardware sometimes causes an ESXi host to reboot unexpectedly due to a faulty component, a heating problem -- such as an air conditioning failure -- or a power outage in the data center.

If you work in a location where the power often fails, you might consider investing in UPS protection, a generator or solar-powered battery backup in case of long-term power failures.

Next Steps

Learn about the different files that make up VMware virtual machines.

Dive into the particulars of VMware logs and best practices for logging.

Dig Deeper on VMware ESXi, vSphere and vCenter