With high-performance computers being very expensive, anything that increases their effective usage is a real benefit. Virtualization seems an obvious approach to increasing utilization, but this has been generally resisted by organizations using high performance computers.
There are a number of factors involved that create a barrier to virtualization. High performance computers (HPCs) are highly tuned in both hardware and software, and a general-purpose approach based on a hypervisor would be expected to reduce performance, or at least users' ability to tune the system.
Then there is the issue of operating system software. HPC clusters often use operating systems that are more exotic than usual and aren't well fitted to the hypervisor. This means that applications may need a tweak to run or that the whole stack may not run properly at all. Another issue, which hits the defense-oriented hyper-computer market, is the security of a multi-tenant environment.
Despite these impediments, we seem to be turning a corner in virtualizing high performance computers. A case study from Johns Hopkins University is illustrative of the experience that major HPC operations are going through, and it's clear that much of the fear, uncertainty and doubt around virtualization has now evaporated as the cloud approach in the general-purpose market has matured.
The performance myth
Johns Hopkins' team was faced with a set of usage issues that reduced the effective performance of the installation by a large factor. They had a Linux configuration and a Windows cluster to cover the job types to be run. This separation meant that there were often long periods when one cluster or another was idling. This problem is seen in many HPC labs, and usually means purchasing more servers to keep up.
The team expected an 8% performance drop when they virtualized, but instead saw a 2% improvement. Of course, the real benefit is the removal of idle time and savings in hardware, which could almost double performance. This experiment pierced the myth that hypervisors hurt performance.
A key to flexible orchestration of HPC clusters lies in the networking. The fastest interconnect scheme is InfiniBand (IB). Mellanox, the IB technology leader, has invested heavily with VMware to enable RDMA and low-latency connections in clusters using VMware virtualization. This is a work in progress, but recent benchmarks show that a virtualized IB cluster gets nearly bare-metal latency with a message passing interface.
Security concerns and the future of HPC
Security in a multi-tenant environment is actually better than in a bare-metal environment. The early concerns regarding the cloud were well addressed by both the hypervisor suppliers and CPU companies, and the confidence level on virtualized inter-tenant security is now very high.
Clearly, as the Johns Hopkins team found, virtualization is a viable option in HPC today, even though there are limitations in the storage area and in the use of some operating systems. One question that quickly comes to mind is whether containers might prove to be a better approach still. HPC clusters are typically built with hundreds or thousands of servers running the same operating system and often the same app stack.
The benefit of containers is that there is only a single instance in memory for this software, as opposed to a copy per virtual server instance with traditional virtualization. This reduces cache trashing and memory footprint, which could potentially increase the instance count per server by as much as 100%.
One other area of interest is the use of GPUs to boost HPC processing. GPU systems today, while still expensive, are effectively commoditized, and with as many as eight top-end GPU cards per server, the horsepower available is tremendous. SuperMicro and Nvidia have created a set of servers, one of which is offered by Nvidia as its Iray Visual Computing Appliance. Though it's still in early deployment, Nvidia is also offering access to its public cloud of GPU-enabled servers.
Virtualized HPC is beginning to get some traction, but clearly has still some evolving to do. Even so, new large HPC clusters are being funded (such as at the universities of Texas and California) and these will move the state of the art forward and solidify virtualization and containers as the preferred approach.