At first blush, selecting a commodity server for a virtual data center may seem simple. Hardware providers now urge you to purchase their “virtualization ready” products. While for some workloads these purchases are a good choice, they may not be appropriate for the task at hand. Just as in a non virtualized environment, the application and service workloads you intend to host on servers will dictate the features you select in your server. So too, a virtual environment creates additional requirements in terms of handling applications and workloads. The demands on server hardware are amplified by consolidating multiple operating systems and their respective applications onto a physical machine. A memory-intensive application that has been virtualized along with similar applications on a physical server, for example, will often demand more than simply the sum of each virtual machine’s memory requirements.
This article explores the requirements that virtualization creates for server hardware in the four key server resource areas: CPU, memory, storage and network. We’ll also discuss server form factors, including small rack mount units of 1U and 2U (a U is defined as 1.75 inches in height), large rack units of 4U to 6U, and blade chassis in terms of their effect on the balance of these four key resources. (We will not discuss standalone tower servers, but suffice to say that tower servers tend to have capabilities similar to 4U rack mount servers.) Additionally, application workloads all vary greatly in their demands on the four key server resources. Ultimately application workload demands, coupled with additional virtualization requirements, dictate the optimum form factor for a server.
First, let’s start with CPUs. During the 1990s and into the early part of this decade, the race was on for increased CPU clock speed for commodity CPUs. But the industry suddenly realized that faster CPUs had become more of a marketing battle and did not always translate into faster application performance. In fact, CPUs would spend more than 90% of their time waiting on memory or system peripherals. While clock speed could be increased, more productive use came from expanding into parallel processing, first with symmetrical multiprocessing (SMP), then with hyper threading and then with multiple CPU cores.
These advancements targeted improved application performance. But running across multiple cores and CPUs, only a limited number of applications can fully benefit from SMP. A greater number of x86-based applications have been instrumented only to be SMP-safe. But they do not take advantage of full SMP multithreading on multiple cores, which can improve performance and scalability. Multicore CPUs are now the norm, and for a large number of applications these CPUs are underutilized. In the mid-2000s to the present, multicore CPUs, coupled with ample whole-system performance for several IT workloads, have yielded the situation of underutilized systems. This paved the way for virtualization technologies on x86 systems to offer value by extracting greater efficiency from the increasingly powerful hardware in the marketplace as well as by reducing data center equipment footprint and energy usage. Virtualization enables a quad-core CPU to appear as though it is 10 or more individual CPUs. The virtualization hypervisor is capable of creating single- or multiple SMP CPUs per virtual machine (VM). CPU vendors recognized the opportunity and began working with virtualization hypervisor vendors to add features that facilitate virtualization. Introduced in 2006, Intel-VT and AMD-V processors are examples of this trend.
Should you always create a virtual machine with multiple virtual CPUs? Typically the answer is no. As mentioned previously, most applications do not take advantage of SMP systems but are only SMP-safe. A rule of thumb is to create only one virtual CPU per virtual machine unless the application specifically uses SMP across multiple CPUs to improve its operation. You should test each application in virtual machines with one virtual CPU and then with two or more virtual CPUs to identify performance or scalability gains. If no gains are observed (indeed, you may see a decrease in performance with multiple virtual CPUs), configure only one virtual CPU for that virtual machine.
Let’s return to our original discussion on server form factors. More CPUs plus CPU cores can enable a greater amount of IT workload consolidation through virtualization. Multiple non-SMP applications can now be consolidated onto a single 16-core server, each in its virtual machine, and now fully utilize all 16 cores.
Which form factors yield the highest CPU core density? The answer depends on whether you consider density per server or density per rack. Rackmount servers of between 4U and 6U offer the highest density options per server. Today you can have up to 32 cores per server. But blade servers offer the highest CPU density per rack, with 512 cores per rack versus 336 cores per rack with modern rackmount servers. Based on CPU density alone, if you want to achieve higher consolidation ratios, blade servers may seem to be the best choice. But hold your horses: We have yet to consider memory, storage and network I/O as part of the equation in your server choices for hosting virtual machines.
Virtual environments also augment memory requirements. VM memory partitioning can’t achieve the same levels of consolidation possible with CPUs. It is much easier to take one-tenth of a CPU than to take one-tenth of the memory that an application uses on a physical machine. As a result, the ratio of required physical memory to the number of VMs remains linear, with a small amount of additional overhead for the virtualization hypervisor. If one virtual machine requires 2 GB of memory, 20 virtual machines require more than 40 GB of physical memory.
The requirement for large amounts of memory for high virtualization consolidation rates brings us to memory paging. Coupled with their software operating systems, x86 systems incorporate virtual memory techniques to offer to applications what appears as a contiguous block of memory. This memory, however, is not physically contiguous on a machine. A CPU and OS work together to map physical memory segments into virtual contiguous memory blocks for the applications running on an operating system. A CPU uses page tables to track the memory mapping.
Memory paging becomes important in a virtual machine environment, because the system must now layer memory virtualization on top of memory virtualization. A hypervisor works with physical CPU page tables to map physical memory into virtual memory that is presented to each virtual machine. But each guest OS expects to work with its virtual CPU to map what it believes is physical memory (but is actually virtual memory from the hypervisor) into virtual memory for its applications. As can be surmised, in large memory systems the double-page table mapping becomes a performance bottleneck, especially for applications that tend to allocate and then free memory a great deal during their operations.
To address this problem, Intel and AMD have introduced Extended Page Tables and Nested Page Tables, respectively, in quad-core CPUs. During the first half of 2008, this feature began to appear in server systems, and by mid-2008, VMware Inc. introduced ESX Server 3.5 Update 2, which is the first hypervisor to exploit these features. Tests have indicated up to a 75% performance improvement in virtual environments for applications that generate a great deal of memory paging. Note that applications that create little memory paging haven’t experienced any improvement.
Memory paging aside, the memory requirements for physical servers that support virtualization are quite high and increase linearly with the number of virtual machines consolidated onto a physical server. If we return to the form-factor discussion, this means that server machines that can physically handle more memory per server are better suited for higher consolidation rates. Rackmount servers offer higher memory capacities. Blade servers quickly reveal themselves as the poorest option for total memory available, offering only 128 GB, compared with 256 GB on 2U and 512 GB on 4U rackmount servers. This is due primarily to a blade’s low profile and limited real estate. With blade servers, consolidation ratios per rack server or blade will be the lowest. Memory is not the only limitation of a blade’s small real estate; storage and network I/O are also limited.
Consolidation of multiple IT server systems onto one physical host offers not only better CPU utilization but also better storage I/O channel utilization. The storage I/O channel of light workloads may appear idle 99% of the time. Virtualization can improve that usage. HeavyI/O applications, however, including backup and virus-scanning tools, can quickly exceed the limits of today’s storage I/O channels. Granted, 8 Gbps Peripheral Component Interconnect Express (PCIe) Fibre Channel host bus adapters have recently emerged, giving a boost to server storage I/O capabilities in a storage area network. But if 20 virtual machines begin their nightly backup jobs simultaneously on one physical host, direct-attached storage and slower storage area networking technologies will rapidly run out of I/O bandwidth.
On the horizon, 8 Gbps Fibre Channel, 10 Gbps Ethernet and faster technologies such as solid-state disks will help break these storage barriers. But even with faster “pipes” in and out of servers, redundancy and failover technologies require more than one such pipe. Again, this brings us back to form factors. The 1U and 2U rackmount servers have limited real estate for additional PCIe adapter cards for additional physical I/O channels into and out of servers. The larger 4U servers offer additional room for more adapters. Blade servers currently offer the least amount of I/O bandwidth per server of all the form factors. When outfitted to the maximum, a rack of 2U servers offers more than two times the I/O bandwidth compared with a rack of blade servers. Considering that a rack of blade servers can offer more CPUs than the rack of 2U rack servers, the effective I/O bandwidth from a single blade server is nearly three times less than that of a 2U rackmount server. A 4U rackmount server can offer up to another 50% more bandwidth over a 2U server. Current blade server technologies are clearly unbalanced because of limited I/O capabilities. Direct attached storage options are also severely limited with blade servers compared to 2U and 4U systems. A 4U server can offer space for up to 16 2.5-inch direct-attached hard disks compared with a blade that can offer only up to two direct-attached disks per server blade.
Network I/O is similar to the storage I/O story. As multiple VMs battle for the few network adapters in the server chassis, network I/O demands on virtual server hosts can also bottleneck quickly. Multiple 1 Gbps Ethernet adapters are required to accommodate the bandwidth requirements of several virtual machines. In most instances, this puts a greater strain on I/O adapter real estate than does storage I/O.
In the near future, 10 Gbps Ethernet will be a commodity in server and blade systems and will greatly alleviate I/O bottlenecks. But repeatedly, history has proven that where more bandwidth is offered, applications and services, including increased virtualization consolidation, will simply fill the void. And even with these trends, rackmount servers will continue to have the edge because of the greater number of high-speed ports available per rackmount server.
Blade servers, however, are fighting back in this area. HP’s Virtual Connect backplane and the completion of a multiroot I/O virtualization specification by PCI SIG—an organization that develops and manages the PCI standard—allow for increased flexibility and access from individual blades to the Ethernet I/O ports packed onto the back of a blade chassis. But even with these advancements, the fact remains that physical real estate per server in a blade chassis will always be limited compared with that in a rackmount server.
Choosing a virtual server
Before server virtualization came on the scene, server form-factor and option selection was driven by the demands of the application that would be hosted on the server. This same approach still applies, but as discussed, virtualization makes these decisions more important. Applications that require more memory demand that much more from a host when virtualized. For example, file servers that regularly serve up large video files require large amounts of memory to cache those files for performance. Assuming that 16 GB of RAM is standard for a video file server, then consolidating 10 file servers through virtualization will require at least 160 GB of RAM on a physical host server. In this example, the server selected needs to maximize RAM and I/O bandwidth compared with CPUs; 2U or 4U rackmount servers are the best choice, and blade servers should be avoided.
Let’s consider another example. A financial application that calculates complex relationships between market data will require greater amounts of CPU compared with memory and I/O. While I/O bandwidth is important here, it is not as central as CPU needs. Blade servers may be the optimal choice for this application.
What is the best server form factor for virtualizing and consolidating general workloads? Typically the 2U or 4U server form factor offers the best balance of CPU, memory, storage and network I/O for virtual workloads. Since consolidation increases the effects of a server outage on data center operations (multiple virtual machines are out of service when the physical server hosting them goes offline), high-availability failover solutions become a requirement in most virtual environments. In general, balancing the number of VMs on each physical server node in a high-availability failover cluster—such that enough headroom exists on surviving cluster nodes to take on the virtual machines of the offline server node—points to the use of more 2U rackmount servers as opposed to fewer 4U rackmount servers. But again, the demands of the applications running in virtual machines may dictate other server form factors.
The bottom line is this: 2U form factor servers offer a nice balance of CPU, memory and I/O for virtualized environments, while 4U form-factor servers offer increased memory and I/O bandwidth compared with CPU. The latter is best for those memory- and I/O intensive applications. Blade servers offer increased CPU compared with memory and I/O bandwidth and are best for computationally intensive applications and least useful for virtualized environments.
About the Author
Richard Jones is the vice president and service director of the Data Center Strategies practice at Midvale, Utah-based Burton Group. He covers disaster recovery, business continuity, server operating systems, high availability and clustering, high-performance computing grids, virtualization, data center systems, and device management. Prior to joining Burton Group, Jones was responsible for SUSE Linux Enterprise Server, storage, high availability, business continuity, printing and Microsoft Windows integration technologies at Novell Inc. Jones has 23 years of experience in software engineering, engineering management, project management and product management in the power supply and networking software industry