Performance Best Practices for Hadoop on vSphere 5.1

Apache Hadoop provides a platform for building distributed systems for massive data storage and analysis using a large cluster of standard x86-based servers. It uses data replication across hosts and racks of hosts to protect against individual disk, host, and even rack failures. A job scheduler can be used to run multiple jobs of different sizes simultaneously, which helps to maintain a high level of resource utilization. Given the built-in reliability and workload consolidation features of Hadoop it might appear there is little need to virtualize it.

However there are a lot of benefits on virtualizing the Hadoop workload on top of VMware vSphere. VMware has written a whitepaper with performance best practices for Hadoop on vSphere 5.1. Read the full paper for detailed results and to learn about performance best practices for deploying Hadoop on vSphere.

More information can also be found on the blog by Josh Simons over here.