Install your new ESXi with your brand new installation process. Check! Verify that all your custom settings for ESXi are correct. Check! Install your vCenter server. Check! Configure vCenter and create a cluster. Check! Add ESXi host to vCenter. ERROR! *argh*
Troubleshooting. Always fun. You learn new stuff by exploring what you are doing wrong.
But OK. Then what went wrong?
Adding a host to vCenter
When you try to add your ESXi host to your vCenter server, the vCenter agent is installed to the ESXi host. It is installed locally on the ESXi host and is the agent that the vCenter server uses to communicate to the ESXi host.
During the installation of the vCenter agent the following steps are taken :
1. Upload vCenter agent to the ESXi host.
2. Install vCenter agent on the ESXi host and start the daemon.
3. Verify that the vCenter agent is running and vCenter is able to communicate with it.
4. Retrieve host configuration and set configuration settings (if necessary)
Everything went ok, until step 3. At that moment the process stalled and eventually the vCenter came back with the following error message :
Troubleshooting
Ok, so whenever I have a problem with vSphere I turn to my good old friend Google to solve all my problems. Cause, face it, the events in vCenter / ESXi aren’t always that clear about what’s going on. Ok, they give you a starting point. From that point on it’s either Google or deep dive the log files of vCenter or ESXi.
But Google returned no results that satisfied my requirements. Well then it’s of to the command line and view some logs.
But first enable the Remote Tech Support Mode to be able to log in via SSH. Read about it over here.
The logs for VMware on the ESXi host are located in : /var/log/vmware
There is the file located to install the vpxa daemon, vxp-iupgrade.log. Viewing the last 10 lines with the following command : tail –f vpx-iupgrade.log
[242692] 2011-02-11 10:16:50: exec /opt/vmware/vpxa/vpx/install.sh
Starting vmware-vpxa:failed
This shows that the starting of the vpxa daemon fails after the installation has been completed. Which was also shown during the installation through vCenter. The daemon was installed, but could not be started.
Now turn to the vpxa.log in the /var/log/vmware/vpxa directory to find out what the problem is. There the following error is shown :
[2011-02-14 10:18:02.842 FFDF8B10 error ‘App’] [VpxdCertificate] Failed: unrecognized file format: /etc/vmware/ssl/rui.crt
Bingo! So we now know that there is a problem with the custom certificate file which was uploaded during the installation of ESXi. Apparently there is something wrong with the certificate file.
Resolution
The problem is in fact the custom SSL files that ESXi and vCenter use to communicate to one another securely. For more information see my previous post here. My custom SSL certificate file was not recognized. So therefor I decided to re-generate the SSL certificate and private key.
Normally this is done during the installation process first boot. But you can also execute the bash script yourself from the command line:
/sbin/generate-certificates.sh
This will generate new SSL certificate files an put them in the default location /etc/vmware/ssl
Afterwards restart the host daemon to load the SSL certificates again :
/etc/init.d/hostd restart
Add the host to vCenter and you’ll see that the ESXi host will be added to your vCenter correctly.
Conclusion
The reason for this post isn’t only to give you a solution to what my problem is, but also a path how to troubleshoot your VMware problems. I think every VMware administrator should poses these skills and should be able to look beyond the events that are created in vCenter. As you can see in the post above, it only takes analytic skills, common sense and the internet / Google to solve your problems. And no the command line isn’t something to be afraid of, even for a Windows sysadmin
Some good resources on troubleshooting can be found here :
Trainsignal – VMware vSphere Troubleshooting Training
The certificate problem was probably that the rui.crt file was in DOS rather than Unix format.
The hostd process doesn’t care (it uses the rui.crt file also) but vpxa will choke if it sees CRLF..
VMware KB 1004875 describes this.
Thanks for the comment John.