Imagine living in a crowded apartment with a bunch of people that think they own the place. Like roommates, operating systems and applications can be quite inconsiderate at times. For example, when they're running on physical machines, these pieces of software are designed to monopolize hardware resources. Now, add virtualization to the picture, and you get a lot of selfish people competing for the same resources. In the middle is the virtualization layer, acting as a sort of landlord or superintendent who is trying to keep everyone happy while still generating a profit. Such is the case with disk I/O (input/output) on virtualization host servers. In this tip, I'll discuss some options for addressing this common bottleneck.
Understanding virtualization I/O requirements
Perhaps the most important thing to keep in mind is that not all disk I/O is the same. When designing storage for virtualization host servers, you need to get an idea of the actual disk access characteristics you will need to support. Considerations include:
- Ratio of read vs. write operations
- Frequency of sequential vs. random reads and writes
- Average I/O transaction size
- Disk utilization over time
- Latency constraints
- Storage space requirements, including space for backups and maintenance operations
Collecting this information on a physical server can be fairly simple. For example, on the Windows platform, you can collect data using Performance Monitor and store it to a binary file or database for later analysis. When working with VMs (virtual machines), you'll need to measure and combine I/O requirements to define your disk performance goals. The focus of this tip is on choosing methods for storing virtual hard disk files, based on cost, administration and scalability requirements.
Local and direct-attached storage
The standard default storage option in most situations is that of using local storage. The most common connection methods include PATA, SATA, SCSI, and SAS. Each type of connection comes with associated performance and cost considerations. RAID-based configurations can provide fault-tolerance and can be used to improve performance.
On the plus side, RAID-based configurations are generally cheaper than other storage options and have low latency, high bandwidth connections that are reserved for a single physical server. The cons are the potential waste of storage space, since disk space is not shared across computers; limited total storage space and scalability due to physical disk capacity constraints, especially when implementing RAID; and difficult management, as storage is decentralized
SANs and Fibre Channel
SANs (Storage Area Networks) are based on Fibre Channel connections, rather than copper-based Ethernet. SAN-based protocols are design to provide high throughput and low latency, but require the implementation of an optical-based network infrastructure. Generally, storage arrays provide raw block-level connections to carved-out portions of disk space. The pros include:
- Can provide high performance connections
- Improved compatibility – appears are local storage to the host server
- Centralizes storage management
The cons are:
- Expensive to implement, as it requires Fibre Channel-capable host bus adapters, switches, and cabling
- Expensive to administer, as it requires expertise to manage a second "network" environment
Network-based storage devices are designed to provide disk resources over a standard network connection, such as Ethernet. They most often support protocols such as Server Message Block (SMB), and Network File System (NFS), both of which are designed for file-level disk access. The iSCSI protocol provides the ability to perform raw (block-level) disk access over a standard network. iSCSI-attached volumes appear to the host server as if they were local storage. The pros of this approach include:
- Lower implementation and management cost (vs. SANs) due to utilization of copper-based (Ethernet) connections
- Storage can be accessed at the host- or guest-level, based on specific needs
- Higher scalability (arrays can contain hundreds of disks) and throughput (dedicated, redundant I/O controllers)
The cons are:
- Simplified administration (vs. direct-attached storage), since disks are centralized
- Applications and virtualization platforms must support either file-based access or iSCSI
Storage caveats: Compatibility vs. capacity vs. cost
In many real world implementations of virtualization, an important bottleneck is storage performance. Organizations can use well-defined methods of increasing CPU and memory performance, but what about the hard disks?
Direct-attached, network-based, and SAN-based storage options can provide several viable options. Once you've outgrown local storage from a capacity, performance, or administration standpoint, you should consider implementing iSCSI or file-based network-based storage servers. The primary requirement, of course, is that your virtualization layer must support the hardware and software you choose. SANs are a great option for organizations that have already made the investment, but some studies show that iSCSI devices can provide similar levels of performance at a fraction of the cost.
The most important thing to remember is to thoroughly test your solution before deploying it into production. Operating systems can be very sensitive to disk-related latency, and disk contention can cause unforeseen traffic patterns. And, once the systems are deployed, you should be able to monitor and manage throughput, latency, and other storage-related parameters.
Overall, providing storage for virtual environments can be a tricky technical task. The right solution, however, can result in happy landlords and tenants whereas the wrong solutions result in one seriously overcrowded apartment.
About the author: Anil Desai is an independent consultant based in Austin, Tex. He specializes in evaluating, implementing and managing solutions based on Microsoft technologies. He has worked extensively with Microsoft's Server products and the .NET development platform and has managed datacenter environments that support thousands of virtual machines. Anil is an MCSE, MCSD, MCDBA and a Microsoft MVP (Windows Server -- Management Infrastructure).