Server virtualization is having the same impact on network I/O that it is having on other parts of the IT infrastructure:...
exposing weaknesses. In this article, we'll look at how virtualization is making weaknesses more visible, and how lack of bandwidth management is one of the key limitations in a server virtualization rollout. Drilling down, we'll look at one I/O problem-solver, the 10 gigabit Ethernet (or 10 GbE) card
Network I/O weaknesses
Network I/O is not as plentiful as CPU or memory resources. If you follow most of the server virtualization vendors best practices of installing a 1GbE card per virtual machine (VM), you quickly run into a wall as to how many virtual machines you can support on a given virtualization host. After all, there are only so many card slots available.
These weaknesses in network I/O cause the business to react in several ways. First, almost every data center has to quickly break the one card per virtual machine practice. But doing so means that only certain servers, those with low network I/O requirements, are virtualized. This has the obvious downsides of not virtualizing the entire environment, while at the same time reducing the overall benefit of server virtualization. The second step, and focus of this article, is becoming common practice in virtualization rollouts; installation of a 10 GbE Ethernet card. There is no doubt that 10 GbE greatly increases the amount of bandwidth available to large data sets of an application workload. This is a fairly easy solution for a physical infrastructure, but what if you're dealing with a virtualized server environment?
NICs and network I/O
In this world, multiple virtual servers compete for the same network I/O resources at a very high speed through a single layer of software common to most virtual environments; the hypervisor. The hypervisor is the traffic cop that sits between the physical machines and the physical hardware components of the host server's hardware. Workloads are serviced on a first-come, first-serve basis. As the virtual server count increases on each physical hardware platform, the amount of times the hypervisor is interrupted to handle the network I/O request increases. This is where one or multiple virtual servers that require high network I/O become such a problem, because they compound the situation by interrupting the hypervisor even more.
You can also see where this leads to the recommendation of network interface card (NIC) per virtual machine . This deployment method divides the shared I/O queue into multiple queues; one for each physical adapter. This results in less hypervisor interruptions and helps to better organize the interruptions that do arise. Using a single NIC for a single virtual machine, or even multiple virtual machines, will cause other issues beyond just running out of card slots, which are already at a premium in a virtualized environment.
Multiple NICs become an obvious cabling management nightmare as well, in addition to increasing power utilization. For example the typical 1GbE card draws about eight watts of power. Installing 10 of these to match the performance of a single 10 GbE card would draw about 80 watts of power while at the same time reducing airflow in the server itself. A reduction in airflow increases cooling requirements that will cause the fans to run more often, hence increasing the power consumption of the server. By comparison, a single 10 GbE card draws only about 15 watts of power and has almost no impact on server airflow.
For these reasons, despite the shared queue problem highlighted above, 10 GbE cards are a wise investment and an excellent way to future-proof the environment. The problem with a 10 GbE NIC is that it takes advantage of the 10-times increase in bandwidth; a situation that becomes exacerbated in virtualized server environments. In a non-virtualized environment, a high performance server with a 10 GbE card can achieve a throughput of 9.9 Gbps That same server in a virtualized environment can only achieve about 4.0 Gbps. This is due to the fact that in virtualized environments all I/O operations go through a shared queue in the hypervisor and only one I/O channel in the adapter. As the number of virtual machines and network I/O loads increase, the performance of the virtual machines and the applications within those virtual machines cannot be maintained due to I/O contention on the shared queue.
Solving Network I/O issues in a virtualized environment
There are two potential solutions to this problem. First the hypervisor vendors in the industry should develop an advanced I/O queuing system that allows for the bandwidth of the 10 GbE card to be divided up. This will also require a more intelligent 10 GbE card to take advantage of the queuing system. The second part of the solution is for the network card vendors to develop a Quality of Service (QoS) capability similar to what is available currently from switch vendors. This will allow the card to be segmented at a hardware level to deliver a guaranteed level of performance for the I/O channel.
The first hypervisor to deliver this capability is VMware's, although it is expected that other suppliers will follow suit. VMware calls their capability NetQueue and it dramatically improves performance in 10 GbE environments. Support from the NIC supplier is also required and right now NICs from Intel and Neterion support NetQueue. NetQueue offloads the packet routing work that the VMware ESX host performs, therefore freeing up CPU resources and reducing latency.
NetQueue makes broad server consolidation possible by using optimized 10 GbE adapters to allow the 10 GbE adapter with NetQueue support to get much closer to its rated bandwidth potential. Results of 9.8 Gbps are achievable with such a setup. But critical workloads require a guarantee of bandwidth and NetQueue does not deliver a true QoS for specific virtual machines. QoS has long been available from switch manufacturers but there has been a lack of QoS capabilities at the NIC level.
In the days of one app per server, there really was not much need for NIC-based QoS. Unlike a switch that had traffic coming and going to multiple sources a server used to have only one objective; handle compute and processing power for its application. Now because of the advent of virtualization, there is a need to give certain virtual servers network I/O priority and guaranteed bandwidth. Using the queue to service I/O requests on a first-come, first-served basis is unacceptable. Suppliers like Neterion are helping to complete the solution and broaden virtualization even further by developing a QoS for Network Interface Cards (NICs) called IOQoS.
As part of the coming 802.11 proposals, IOQoS is designed from the ground up for multi-application environments where several applications will compete for I/O access. Using hardware-based channel isolation allows administrators to strictly enforce complete I/O isolation between the different virtual machine data paths. The advantages of true isolation can be seen in security by avoiding interference between shared resources like memory and CPU cores. Most of today's software applications were written on the notion that they would have full access to system resources and that the network card would be all theirs. The sharing of that card is now occurring in virtualized environments, but this reality has not yet been incorporated into minds of the developers. The ability to provision the card to make it appear exactly as the application expects it to be will increase vertical compatibility with existing software solutions.
Finally, there are the bandwidth assurances when consolidating several of the software stacks as individual virtual machines on a physical host. From an I/O perspective the card can be configured to look to the application as a single physical card and that application will have assured access to that bandwidth. In the past, system administrators had to assure bandwidth to certain application workloads to either keep those servers in an unvirtualized state, or dedicate NIC cards in the virtual server to those workloads. Each, as was explained earlier, has a negative impact on the savings that a virtualization rollout strategy could potentially help a data center achieve.
The combination of 10 GbE Ethernet with the intelligence of a queuing system from the virtualization vendor, support of that queuing from the NIC vendors and the adoption of IOQoS will allow for significantly broader and denser deployments of server virtualization. It will also optimize the investment in the infrastructure delivering a more significant return on investment.
About the author: George Crump is President and Founder of Storage Switzerland, an IT analyst firm focused on the storage and virtualization segments. With 25 years of experience designing storage solutions for data centers across the US, he has seen the birth of such technologies as RAID, NAS and SAN. Prior to founding Storage Switzerland he was CTO at one of the nation's largest storage integrators where he was in charge of technology testing, integration and product selection.