Senior Technology Editor
Published: 01 May 2010
Server virtualization supports enormous opportunities for hardware consolidation and promotes a level of business agility that would have been unthinkable just a few years ago. Although virtualization certainly solves some significant problems, it also creates new problems that IT administrators need to address.
The most serious effects of server virtualization can occur in the network. The areas of significant impact can be either between the virtual workloads on a physical host server or across the infrastructure that connects physical servers. If left uncorrected, these networking challenges can severely limit an organization’s ability to scale server virtualization and manage it effectively.
When looking at ways to ensure scalable server virtualization, it’s important for administrators to examine some networking issues within the server and consider other network issues outside of the server.
Hypervisor as a target of attack
Every server virtualization platform installs a hypervisor that abstracts each virtual machine (VM) from the underlying physical hardware, manages the allocation of computing resources to each VM and handles virtual switching of data traffic between VMs. This puts a hypervisor at the core of every virtualized server and makes hypervisors the focus of security concerns with virtual servers.
The problem is that all of a virtual server’s traffic ultimately flows through the hypervisor, so any attack that successfully allows access to—or control of—the hypervisor can potentially compromise all of the workloads on that physical system and possibly allow attackers to access the greater network. Although there have not been any major hypervisor security breaches reported thus far as a consequence of direct attack or exploit, it’s one of the most serious concerns that administrators face in moving to a virtual infrastructure.
I/O presents another challenge within each virtual server. Although computing resources are rarely ever limiting factors in traditional non-virtualized servers, the demands of multiple simultaneous virtual workloads can easily overwhelm a physical server’s memory, CPU cycles and I/O bandwidth.
Administrators need to consider workload requirements as they scale server virtualization, especially when it comes to I/O activity on the greater network such as storage access. Any means of reducing the I/O demands of workloads will lower stress on physical servers and allow greater scaling.
One example is the network overhead generated for Windows swap file disk writes. “Every single time you do anything with memory, the Windows operating system writes to a swap file,” said Chris Steffen, principal technical architect at Kroll Factual Data, a business data provider in Loveland, Colo. “If you can get rid of a swap file, now you’ve freed up something like 20% to 25% of the writes that you’re doing to your disk, and your performance goes up.”
It’s not always easy to eliminate a swap file, Steffen said. It requires more memory on the server itself, but the added performance allows for more workloads on the server that may not have been physically possible otherwise.
Each physical server also needs adequate network bandwidth to handle the peak demands of its assigned workloads. The 1 Gigabit Ethernet (1 GbE) support found in many off-the-shelf servers is usually inadequate, and additional network interfaces are needed to meet availability requirements like more bandwidth or failover.
For example, a virtual server may use four, six or more network controllers. In some cases, multiple 1 GbE ports are replaced with two 10 GbE ports, which simplify wiring and lower power consumption.
When gauging a server’s bandwidth needs, be sure to use the peak or maximum demands of each workload rather than the average or idle demands. Otherwise, the server may encounter a bandwidth bottleneck that can result in poor performance for some workloads. I/O virtualization may be deployed to provision the physical network interface cards or host bus adapter interfaces into virtual controllers that are assigned to specific workloads with allocated bandwidth.
Networking challenges outside of the server
Bandwidth and switching performance are two factors in the greater network that can limit the scalability of a virtual server deployment. The network bandwidth available to a virtual server must meet the peak demands of its assigned workloads. This may mean providing multiple network ports between the physical server and its switch, but those pathways must be available.
For some organizations, it could mean redesigning the network with additional network cabling or switch ports. Network changes are even more significant in a move to 10 GbE, which requires new cabling and switching equipment in addition to the corresponding NICs and HBAs in the server.
Managing VM sprawl is one way to contain spiraling bandwidth demands for virtual servers. Administrators must understand why new VMs are needed while limiting the personnel that can create new VMs and following the lifecycle of each VM until it can finally be removed—freeing computing resources for other VMs.
Another means of optimizing bandwidth is to prioritize network traffic by application This ensures that critical VMs can access the bandwidth they need to support the organization. Switching performance is often overlooked. The problem is that switches are normally designed to handle the traffic from non-virtualized systems. For example, one switch port would typically be designed to accommodate one server.
When traffic from 10, 20 or more VMs is handled across a network connection, the switch’s performance may degrade. It might be abated somewhat through network or I/O virtualization—assigning specific workloads to certain NICs, each with its own corresponding switch port—which spreads out the switching load among multiple ports.
Still, high levels of server consolidation can affect switch performance, so it’s essential for administrators to monitor and evaluate switching performance as more virtual workloads are added to the physical servers. Alternately, consider a switch designed for virtual environments such as Cisco’s Nexus 1000V.
Storage performance may not seem like a network issue, but storage is a critical factor that administrators can’t overlook in network performance. Every VM is retained on storage systems, along with snapshots and backups of the environment, so scaling server virtualization will also be limited by the ability to store and move that content.
Imagine the impact of virtualization on an iSCSI SAN deployment. An iSCSI SAN may work fine for non-virtualized servers. But when you consider that a single physical server may have 20 or more VMs—each loading from storage, exchanging data files and swap files with storage and taking snapshots to storage—it’s easy to see how technologies like iSCSI or Fibre Channel over Ethernet can amplify network bandwidth problems. More network bandwidth may be one answer, but the move to a separate high-performance shared storage network such as Fibre Channel is often the preferred choice.
Ultimately, testing and evaluation are vital elements of any network design change. “Spec out what you think you need, overcompensate like you should be doing anyway, and then give it a shot,” Steffen said. If you find that you’re running into trouble, he said, then you probably didn’t do your original evaluation correctly, but expanding or upgrading network devices is straightforward.
Achieving scalable server virtualization
Given the various concerns inside and outside of the server, experts say that constant monitoring is the best means of achieving scalable server virtualization. Performance monitoring should be implemented at the virtual server as well as within the network. Administrators can use monitoring from several different angles.
Predictive analysis relies on threshold monitoring and alerting based on historical trends over time. “Predictive analysis of the server, VM or host enables proactive problem resolution and preemptive troubleshooting so that performance and service degradations are detected and fixed before they affect the end user,” said Allen Zuk, president and CEO of Sierra Management Consulting LLC, an independent technology consulting firm based in Parsippany, N.J.
Predictive analysis is particularly valuable when service levels and availability demands are high.
Workload management focuses on the performance of each virtual workload and allows administrators to measure each workload’s performance and use of computing resources. Based on data obtained over time, an administrator can make informed decisions about moving workloads and allocating computing resources to workloads to achieve optimum performance. Some workload management tasks can be automated through the use of tools such as VMware Distributed Resource Scheduler or the VMware Lifecycle Manager “The tool can also develop intelligence as it increasingly correlates between performance data trends and workload management steps,” said Zuk, adding that it has an equally important impact on change management.
Finally, virtualization abstracts workloads from the hardware underneath, and this complicates traditional troubleshooting processes. So as server virtualization scales up with more workloads residing on—and migrating between— physical host servers, effective troubleshooting will demand superior problem resolution techniques that help administrators locate and correct problems quickly. This is even more critical in demanding user environments.
“The ability to effectively isolate and troubleshoot problems helps avoid [service-level agreement] violations, which affect the end user experience, as well as regulatory or financial penalties that may result from these violations,” Zuk said.
About the Author
Stephen J. Bigelow, a senior technology writer in the Data Center and Virtualization Media Group at TechTarget Inc., has more than 15 years of technical writing experience in the PC/technology industry. He holds a bachelor of science in electrical engineering, along with CompTIA A+, Network+, Security+ and Server+ certifications, and has written hundreds of articles and more than 15 feature books on computer troubleshooting, including Bigelow’s PC Hardware Desk Reference and Bigelow’s PC Hardware Annoyances. Contact him at firstname.lastname@example.org.