I always love to learn new things to optimize my personal skills. One of those skills is mindmapping. Apparently VMware shares that point of view and has created various mindmaps for troubleshooting various issues.
Each mindmap starts with a central theme, Troubleshoot Network Issues for example. You can then select your area where you have a problem by expanding (hit the +). This will result in more specific areas with regards to your selected problem area. Eventually this will result in a set of KB articles which can possibly solve your problem.
For examples have a look at the following articles :
VMware released an excellent whitepaper on troubleshooting performance problems in vSphere 4.1. It really is a great resource and start point for anyone who has performance issues in his / her vSphere infrastructure.
The steps discussed in the document use performance data and charts readily available in the vSphere Client and esxtop to aid the troubleshooting flows. Each performance troubleshooting flow has two parts:
1. How to identify the problem using specific performance counters.
2. Possible causes of the problem and solutions to solve it.
Quote for the Introduction of the Performance Troubleshooting for vSphere 4.1 whitepaper :
Performance problems can arise in any computing environment. Complex application behaviors, changing demands, and shared infrastructure can lead to problems arising in previously stable environments. Troubleshooting performance problems requires an understanding of the interactions between the software and hardware components of a computing environment. Moving to a virtualized computing environment adds new software layers and new types of interactions that must be considered when troubleshooting performance problems.
Proper performance troubleshooting requires starting with a broad view of the computing environment and systematically narrowing the scope of the investigation as possible sources of problems are eliminated. Troubleshooting efforts that start with a narrowly conceived idea of the source of a problem often get bogged down in detailed analysis of one component, when the actual source of problem is elsewhere in the infrastructure. In order to quickly isolate the source of performance problems, it is necessary to adhere to a logical troubleshooting methodology that avoids preconceptions about the source of the problems.
The document can be found here. Source is the blog post from the VMware VROOOM! Blog.
Install your new ESXi with your brand new installation process. Check! Verify that all your custom settings for ESXi are correct. Check! Install your vCenter server. Check! Configure vCenter and create a cluster. Check! Add ESXi host to vCenter. ERROR! *argh*
Troubleshooting. Always fun. You learn new stuff by exploring what you are doing wrong.
But OK. Then what went wrong?
Adding a host to vCenter
When you try to add your ESXi host to your vCenter server, the vCenter agent is installed to the ESXi host. It is installed locally on the ESXi host and is the agent that the vCenter server uses to communicate to the ESXi host.
During the installation of the vCenter agent the following steps are taken :
1. Upload vCenter agent to the ESXi host.
2. Install vCenter agent on the ESXi host and start the daemon.
3. Verify that the vCenter agent is running and vCenter is able to communicate with it.
4. Retrieve host configuration and set configuration settings (if necessary)
Everything went ok, until step 3. At that moment the process stalled and eventually the vCenter came back with the following error message :
Ok, so whenever I have a problem with vSphere I turn to my good old friend Google to solve all my problems. Cause, face it, the events in vCenter / ESXi aren’t always that clear about what’s going on. Ok, they give you a starting point. From that point on it’s either Google or deep dive the log files of vCenter or ESXi.
But Google returned no results that satisfied my requirements. Well then it’s of to the command line and view some logs.
But first enable the Remote Tech Support Mode to be able to log in via SSH. Read about it over here.
The logs for VMware on the ESXi host are located in : /var/log/vmware
There is the file located to install the vpxa daemon, vxp-iupgrade.log. Viewing the last 10 lines with the following command : tail –f vpx-iupgrade.log
This shows that the starting of the vpxa daemon fails after the installation has been completed. Which was also shown during the installation through vCenter. The daemon was installed, but could not be started.
Now turn to the vpxa.log in the /var/log/vmware/vpxa directory to find out what the problem is. There the following error is shown :
Bingo! So we now know that there is a problem with the custom certificate file which was uploaded during the installation of ESXi. Apparently there is something wrong with the certificate file.
The problem is in fact the custom SSL files that ESXi and vCenter use to communicate to one another securely. For more information see my previous post here. My custom SSL certificate file was not recognized. So therefor I decided to re-generate the SSL certificate and private key.
Normally this is done during the installation process first boot. But you can also execute the bash script yourself from the command line:
This will generate new SSL certificate files an put them in the default location /etc/vmware/ssl
Afterwards restart the host daemon to load the SSL certificates again :
Add the host to vCenter and you’ll see that the ESXi host will be added to your vCenter correctly.
The reason for this post isn’t only to give you a solution to what my problem is, but also a path how to troubleshoot your VMware problems. I think every VMware administrator should poses these skills and should be able to look beyond the events that are created in vCenter. As you can see in the post above, it only takes analytic skills, common sense and the internet / Google to solve your problems. And no the command line isn’t something to be afraid of, even for a Windows sysadmin
Some good resources on troubleshooting can be found here :