While the move to a virtual environment can solve myriad management problems, it also creates its own set of challenges and pain points. Though problems vary from shop to shop, there are several common concerns for those managing virtualized servers. This column groups the gripes and explores some strategies and products to make your life easier.It's worth noting that as with any technology, increasing flexibility brings the complexity of managing that technology. This is clearly the case with virtualization. It is often possible to start with a basic deployment to minimize complexity. And as you progress, configuration, planning, deployment, customization, change management, troubleshooting and so on become only more complex. As we move to more advanced instances of virtualization, we follow the development model of systems management: from scripting to point solutions to integrated, automated solutions to the holy grail of policy-based, autonomic management. While there has been substantial progress with VMware Inc.'s tools like VMotion, VirtualCenter, High Availability (HA), and Distributed Resource Scheduler (DRS), users still struggle with management approaches that require too much do-it-yourself effort as well as a team to support ESX hosts. Configuration is still manual and involves lots of planning, OS builds, pre- and post-template deployment requiring scripting, etc. (New tools from VMware offer the promise of help here, however.) And just like adding an OS, a hypervisor adds its own layer of management -- and cost -- requiring training and maintenance of its own. In fact, upgrading a company's hypervisor to, say, ESX and VMware Infrastructure 3.5 (VI3.5) is sufficiently challenging that a good percentage of the VMware install base has yet to make the move. Networking
Network configuration is an area where flexibility brings challenges. Each ESX host needs to be configured; and configuration details like virtual switch names must be the same for ESX servers within a cluster. The process requires lots of manual review and can create myriad errors. Some tools can help automate and move configurations, but you have to be careful about names. Other components like storage connections and system alarms must be set up manually. User account maintenance is configured in two locations: the ESX host (which uses Linux accounts), and VirtualCenter, where Windows accounts are configured either on the local server or in Active Directory. A best practice is to set up individual accounts with limited permissions on ESX hosts and VirtualCenter. That way, you constrain user privileges on the host for those using VirtualCenter to include specific management tasks or a specific group of ESX servers and virtual machines (VMs).
Another challenging area is I/O. VMware needs network interface cards (NICs), especially as the ratio of virtual servers to physical servers increases. Watch for aggregated bandwidth constraints, and if you confront constraints, use DRS to move VMs around.
A best practice here is to isolate network traffic for VMotion. Sending memory maps from one ESX host to another requires the best performance and security possible. In nonblade environments, this generally involves separating NICs and switches for VMotion traffic. But with blades, it's hard to isolate network traffic this way. You can have separate NICs, but they will still use the same switches. You can use VLAN tagging and create a VLAN for VMotion as well as for ESX host management. Watch bandwidth utilization to ensure that VMotion and management functions get the bandwidth they need.
VMotion within a blade chassis works well: The move all happens within the embedded switch, so from a performance standpoint, clustering within a chassis is good. But you have a single point of failure and could lose the entire chassis. In light of redundancies, this may not be a likely scenario, but it still may be more of a risk than you're willing to take.
To fully exploit ESX on blades, there are other considerations. VLAN tagging in a blade environment allows you to run multiple virtual networks over a single physical network, which is important to separate traffic within the blade chassis. With ESX, you need to extend the VLAN tagging configuration into the ESX host configuration via VirtualCenter. In addition, using NIC teaming and link aggregation on the uplinks from the chassis allows load balancing and failover across everything .Patch management
With the ease of virtual provisioning comes the problem of virtual server sprawl, increasing the number of OS systems to be managed. Patching these systems and keeping them consistent and up to date is yet another pain point. Using templates and images can minimize this issue. VMware just announced the new VMware Update Manager tool in VMware Infrastructure 3.5, which will automate patch and update management for ESX hosts, templates and guest VMs. It will include the ability to patch offline VMs, (including powered off or suspended) and leverage DRS to allow patching to occur with minimum to zero downtime. Planning
Many users have said that the major challenge with VMware is planning: estimating workloads, configuring, sizing and balancing. Many say that the move to virtualization is slower than you might expect because there are so many considerations. As with learning a new OS, training and expert help are highly recommended. Take advantage of the services offered by VMware: experienced value-added resellers (VARs) and key tool vendors (such as CiRBA for planning and Platespin for physical-to-virtual tools). Some vendors can study your environment and offer best practices for your objectives. Know what you want out of your environment, and determine your best strategy to deploy these goals. Understand your application requirements to best use tools like reservations to reserve a certain amount of resources (that is, explicit number of cycles or RAM) for a virtual server. These define the minimum requirements that a virtual server takes at startup. If you overprovision and the resources aren't available, the VM won't start up (i.e., it won't have sufficient memory to fulfill the reservation). Limits can also be used to cap the amount of physical resources a virtual server may access. This can be used to control unstable servers and prevent them from consuming all the resources of an ESX host.
Understand your physical environment and availability needs. VMotion requires similar hardware to move VMs (i.e., it needs the same processor type so you can't use VMotion to move from Intel to AMD processors.) Keep this in mind when creating resource pools. DRS can move VMs to appropriate physical servers. (To be clear, VMotion copies memory to another server without shutting down. HA restarts a VM on another server if the physical server fails (like reboot).
There are no hardware dependencies with HA like there are with VMotion). In blade environments, resource pools work extremely well, allowing new blades to be added (or powered on or left idle for n+1 availability and growth), added to the pool and up in minutes.
VMware has also announced a new feature in VI 3.5 VirtualCenter called VMware Guided Consolidation. The tool helps with initial consolidation and virtualization planning. Using a wizard that discovers physical servers, it identifies consolidation candidates and can convert them into VMs, recommending placement onto specific ESX hosts.Storage and systems management
Storage is the last of the major management pain points, which I'll cover in my next column. Several systems management vendors and startups deliver point products (at various stages of maturity, however) for performance, capacity planning, configuration management and security. Users are just starting to evaluate these next-level management issues, and I'll explore them in future columns. About the authors
Barb Goldworm is president and chief analyst of Focus Consulting a research and consulting firm focused on systems, software, and storage. Goldworm has spent 30 years in various technical, marketing, senior management, and industry analyst positions with IBM, Novell, StorageTek, Enterprise Management Associates, and several successful startups. Goldworm is a frequent speaker, columnist, and author of numerous white papers and research studies. She also recently released the book Blade Servers and Virtualization , chaired the Server Blade Summit on Blades and Virtualization, and has been the keynote speaker at numerous events on both virtualization and blades. Goldworm can be reached at email@example.com. Craig A. Newell is a senior consultant at Focus Consulting and works with end users on needs/technology assessments and evaluation/implementation involving virtualization, server consolidation and blade systems. Newell has more than 15 years of experience in infrastructure, virtualization, networking, storage area networking, and blades and has managed and supported virtualization assessment and proof of concept projects for major corporations and government agencies. Newell is a certified project management professional, a VMware certified professional, a certified wireless network administrator, and a certified business continuity planner and served as a technical editor for Blade Servers and Virtualization.