Sun Microsystems Inc.'s star high-performance computing (HPC) customer, the Texas Advanced Computing Center, or TACC, is set to deploy Sun's version of the Xen hypervisor, xVM Server on part of its 4,000-node supercomputer.
As a research center at the University of Texas at Austin, TACC provides supercomputing resources to scientists and academics engaged in such questions as the lives of the first stars and hypervelocity impact simulation, to name but a few. Today, it does so using a mix of 4,000 Sun blades, InfiniBand switching gear, Sun GridEngine batch processing scheduling software and Sun xVM Ops Center management software. But with demand for the supercomputer's compute cycles so high, TACC administrators have had a hard time giving the computer scientists who develop the HPC algorithms and applications access to its cycles.
"TACC is interested in solving grand-challenge questions, not giving its resources to computer scientists," said Prasad Pai, HPC evangelist for Sun's xVM organization. But now, by using the Xen-based xVM bare-metal hypervisor, TACC has found a way to do both.
Pai said that TACC administrators will deploy Sun's xVM hypervisor on a small portion of the TACC supercomputer and create a virtual grid to give to developers, who can in turn test their programs before moving them over to the actual grid."It allows for a connection between the lifecycle stages from development to test to deployment," Pai said.
In this new era of parallel processing, giving developers adequate access to HPC environments has emerged as a problem. In the past year, the computing industry has responded to the need for developers schooled in the arts of parallel processing by establishing parallel computing centers to educate developers, give them access to resources, and assist in the creation of better parallel languages. Pai argued that lack of access to supercomputing grids may be stymieing development of effective parallel programs. "If computer scientists don't get access to the grid, how can they develop parallel algorithms or parallel languages? They won't," he said.
Going forward, Pai sees another possible use case for virtualization in an HPC environment: fault isolation. If a job runs inside a cluster of virtual machines, it can be insulated from an underlying hardware failure by moving it to another node in the cluster.
The downside of installing a hypervisor is that it can rob the application of some performance, which is at a premium in HPC environments. But Pai said that if the hypervisor layer is lightweight enough, the application uptime that results from running in a virtual machine might potentially outweigh the performance hit. "Application uptime versus Performance; that will be an interesting chart to see," he said.