Apache Hadoop provides a platform for building distributed systems for massive data storage and analysis using a large cluster of standard x86-based servers. It uses data replication across hosts and racks of hosts to protect against individual disk, host, and even rack failures. A job scheduler can be used to run multiple jobs of different sizes simultaneously, which helps to maintain a high level of resource utilization. Given the built-in reliability and workload consolidation features of Hadoop it might appear there is little need to virtualize it.
However there are a lot of benefits on virtualizing the Hadoop workload on top of VMware vSphere. VMware has written a whitepaper with performance best practices for Hadoop on vSphere 5.1. Read the full paper for detailed results and to learn about performance best practices for deploying Hadoop on vSphere.