Kit Wai Chan - Fotolia
Every VM needs network access, but VM network performance issues, such as excessively long ping response times,...
can time out database queries or storage access. IT administrators need to track down the underlying causes of such problems and take immediate corrective action.
First, rule out LAN problems. Common problems are often traced to network congestion, such as an extremely busy workload operating on the same network segment. An example could be a busy antimalware, intrusion detection/prevention system or packet detection tool that demands too many resources.
Other common VM network problems include IP conflicts and faulty or poorly configured network equipment, such as erroneous traffic shaping settings. Fortunately, network problems typically affect multiple VMs on the same network segment rather than just one VM, which can quickly lead troubleshooters to the network.
Second, isolate the problem to the host server hardware. Many issues can conspire to disrupt a VM and impede network access, including:
- a poorly configured basic input/output system or firmware;
- an improperly configured or inadequately patched OS;
- a failed or improperly configured network port or incorrectly set network adapter type, such as an adapter other than VMXNET3 on VMware platforms; and
- outdated or improperly patched VM drivers, such as an outdated version of VMware Tools.
Such problems might affect multiple VMs running on a common host, which directs troubleshooters to the host system rather than the VM itself.
One commonly overlooked host problem in Windows Server environments is an improperly set power plan. Setting the Windows Server power plan from balanced to high-performance can often overcome performance problems in latency-sensitive VM workloads.
Although the power plan setting is actually a host issue rather than a VM one, an aggressive power conservation mode might not affect all the VMs on an afflicted host -- at least not to the same degree -- so it's worth a separate check of the Windows Server power plan configuration.
Mismanaged processor resources
Another common cause of VM network performance problems is virtual processor overcommitment, which is when the host system allocates more virtual processors to the numerous VMs running on the host than there are physical processors available.
When the load on running VMs is relatively light, there are enough processor cycles to go around and all of the VMs can operate adequately. But if processor demand increases beyond the available processor cycles, the hypervisor host must parcel out the available processor time as best it can through the CPU scheduler.
This isn't a perfect process, and some VMs might not receive adequate processor time, resulting in poor workload and network performance. In this case, a high CPU ready time revealed by diagnostics such as esxtop can indicate overcommitted processors. Troubleshooters might need to eliminate overcommitment by reallocating virtual processors or migrating some VMs to better balance the VM processor's needs against available CPUs.
Finally, verify the use of receive side scaling (RSS) as a network feature. RSS spreads out VM network traffic across multiple processors, which can improve processor efficiency. Without RSS, all of the VM network traffic will need to be processed through a single CPU, which can cause a performance bottleneck.
RSS requires a suitable network adapter, OS and drivers such as VMXNET3 virtual network drivers and version 7 or higher of the VM hardware version. Make sure the elements needed to support RSS are in place and enabled.
Dig Deeper on Virtual machine monitoring, troubleshooting and alerting
Related Q&A from Stephen J. Bigelow
Eliciting performance requirements from business end users necessitates a clearly defined scope and the right set of questions. Expert Mary Gorman ... Continue Reading
Requirements fall into three categories: business, user and software. See examples of each one, as well as what constitutes functional and ... Continue Reading
Navigating data center malfunctions when hardware is off premises can be tricky. Organizations must have strong SLAs with their colo provider to ... Continue Reading