The modern hypervisor has made regular server maintenance tasks an afterthought. In the ideal data center, these...
systems run 24/7, under constant supervision of complex monitoring tools interfaced to automation applications designed to perform maintenance as needed. Unfortunately, that ideal isn't easily achieved, and while we may have some monitoring and automation, very few organizations have automated the entire process. This means we still have to perform server maintenance tasks in our virtual environments.
Many of the traditional server maintenance tasks we performed are no longer valid when virtualization removes much of the hardware involved. In fact, moving Windows operating systems to a virtualized environment has helped to increase availability of the operating system due to the removal of hardware drivers that had to support advanced features -- such as temperature sensors that monitor hardware health -- in favor of simpler drivers for the hypervisor. These advancements, combined with our high-speed workplace, mean we can easily forget basic server maintenance tasks that can keep our data centers healthy.
Perform occasional reboots
Modern hypervisors are seemingly bulletproof, but they are still software, and like most software, are susceptible to bugs, memory leaks and crashes. These issues occur less often in hypervisors than in traditional operating systems simply because a hypervisor's narrow purpose means it has a lot less code compared to a traditional operating system. One of the most common troubleshooting suggestions has always been to reboot. In the modern computer world, rebooting a server is often seen as an unrealistic option, but in most cases it's the only fix that will work. It's not that software has gotten worse over time, but that what we ask it to do has gotten a lot more complex and so have the fixes.
With VMware vMotion and Hyper-V's Live Migration, administrators can complete host reboots without causing outages to the virtual machines (VMs). However, ask virtual administrators what their reboot cycle is and they might wonder what you're talking about. In today's world, hosts tend to run until they need to be upgraded or have a purple screen of death (PSOD) failure. We typically schedule upgrades, but a PSOD is unpredictable. However, we can prevent some of these problems with traditional maintenance reboots. The hypervisor is a very solid piece of software, but a clean reboot every 90 days can help keep some of those software gremlins at bay and possibly even reduce your headaches.
Clean up snapshots
Besides making it possible to migrate running VMs, one of the best features of a virtualized environment is a snapshot. These moments in time create an "undo" option for an entire server and have saved countless administrators headaches when upgrades and patches go wrong. The problem with snapshots is that they typically freeze the original VM's disk file, even though it continues to run and the changes are kept in a change log. Over just a few hours this change log can grow to gigabytes. If left unchecked, it can fill an entire data store, cause crashes and corrupt VMs.
A fellow administrator once commented on how his primary Active Directory controller with a 30 GB disk had a 680 GB snapshot change log. At that size, it was no longer possible to delete the snapshot, and all he could do was to continue to add disks until he could retire the server. That is an extreme -- but I doubt unique -- example of a runaway snapshot. While alerts can be set, a simple weekly routine of checking for open snapshots can prevent one of these monster snapshots from growing in your server. A favorite free tool for this is RVtools. It combines a simple interface with the ability to export specific data into Excel for easy viewing and sorting.
Remember to patch operating systems
Once we create a VM, we often forget about the operating system, except for Windows patching. VMs can have greater uptime than their physical counterparts due to the lack of specific vendor drivers and BIOS updates that require downtime. Since VMs typically have one set of drivers (VMware Tools or Integration Services), maintenance is reduced -- but not eliminated. In today's busy IT world, it is easy to forget server maintenance tasks. While most virtual guests can continue to operate with older versions of these hypervisor tools, they become less efficient. If the tools are old enough, they may no longer support some of the new features of the latest hypervisor release.
Administrators are a group that continually pushes ahead with new technology, often without looking back. Reboots, snapshots and driver updates are not as flashy as some of the latest technologies from VMware and Microsoft. However, we cannot forget that each new release or product adds to our collection of duties rather than replacing them.