This is my attempt at putting together something that every sysadmin and supervisor should have when they look at whether to take the next step in virtualization – a Common Sense (tm, patent-pending, sm, r) guide to putting virtualization in place. Truth be told, it works with any server virtualization product, including VMware’s VI3, ESX2.x, and Server, Virtual Iron 3.x, Xen 3.x, etc., but as I’m more familiar with VMware’s product line, that’s my default reference. I’m also taking some creative liberties and coining a new phrase – servirt, short for server virtualization. Lets see if it catches on (no, I don’t really expect it too – it just doesn’t sound as cool as Bennifer or Brangelina).
Rule #1: Don’t put your firewall in your servirt environment.
This rule, along with any future “don’t put your X in your servirt environment” rules, is geared mostly towards security. If your host system is compromised, it’s not a far step before your guest systems are compromised. Given a proper topology, a compromised set of virtual machines is no more dangerous than a compromised set of standard machines, but throw in an outward facing device and you have all the makings of the next NY Times poster child for data-protection reform. It is only a matter of time before serious servirt malware is designed to sneak into guests on an infected machine through the hypervisor (so called Guest-to-Guest, G2G, or GtG attacks). Imagine this situation – a host running Exchange, Oracle, Siebel CRM, ISA, and Sharepoint as guests. The host is infected, finds the app servers, infects them with a data collector, then finds the ISA server, infects it, and uses it as a giant gateway to stream all of your customer records to www.iownzyercreditzfilezd00d.info over http. All without hopping a physical box. For those who say I’m full of it, that such vulnerabilities are impossible or way off, all things are possible.
Rule #2 – Use your virtual switches wisely
It’s tempting to put everything on one virtual switch and then let the host handle the load. This doesn’t often get covered in a lot of howto documents, but it’s important. It’s particularly important in VMware VI3, because of the major improvements VI3 brings to virtual switching and the advantages present in those changes. Honestly, I think VMware should have made a bigger deal in marketing the switching improvements, but I’m not a marketing guru by ANY stretch of the imagination. With the recent Cisco/VMware rumors, the switching in VI3 may get the credit that it’s due. Anyway… Treat virtual switches as you would physical switches – create network segments with care and planning, vlan when necessary (if your product supports it), and make sure that if you have a lot of host servers, that your virtual networks align with the hosts. Packet storms should not take out your entire virtual environment!
Rule #3 – Don’t mess with system requirements. Ever.
Not on the guests. Not on the hosts. Not on the management boxes. Not ever. Very often it can be tempting to put less memory into a guest machine than optimally necessary in order to conserve limited physical memory on the host. Sometimes it’s even tempting to save a thousand dollars on a server (particularly if the spec you optimally need is a thousand dollars per server over your allocated budget) by cutting out memory, dropping the CPU down to a slightly slower model, using lower rpm hard drives, etc. DON’T DO THIS! It may be fine, but it can also come back to haunt you. In the guest-scenario, it may seem easy to say “If I hit a problem, I’ll just up the guest’s RAM”, but it’s a lot tougher saying that when the physical machine is maxed-out and is full of other in-production guests using up all that RAM. I’ve done this. It sucked. I had to take down a host server that impacted five departments, including finance, because I tried to squeak through without spending the dollars at a time when I was under a serious budget crunch. Oh, and the guest server in question that needed more RAM – a Jabber -based internal Instant Messaging server. Not exactly mission-critical, but it had a high profile because it was very visible to the entire company every time it mem-locked and dumped out. Lesson learned.
Rule #4 – There is NEVER a rule #4.
This is from an old USENET post, and I can’t find the reference to link it to. It was funny back in the day, and I’ve kept it up since.
Rule #5 -Use the freebies!
XenSource XenExpress, Xen itself, VMware Player, VMware Server, Virtual Iron Single Server Edition, and a host of other similar applications are free to try, and free to use. Some require host OS licenses that aren’t free (ahem, Microsoft, that’s you guys…) but most will run on free OSes like Linux and/or FreeBSD. I have a whole lab set up with Virtual Server on CentOS Linux, and it works great. We use VMware player to distribute some legacy applications that don’t play well on XP and/or Vista. Also, don’t forget about the free P2V tools out there. VMware’s free P2V converter is great – almost as powerful in the P2V side as enterprise products like Platespin’s PowerConvert. While we wait on new hardware to test Virtual Iron, we’re using a great freebie tool, that we found here to get a jumpstart and convert some of our testlab vmware machines to Microsoft’s VHD format, which we will then import into Virtual Iron. Before we even decided to do virtualization (ok, after we decided virtualization fit the business/finanacial/techincal needs of the company, but before we commited to it) we used demo versions of VMware as a proof of concept. The point is, there’s not a stage in your servirt environment’s development that can’t benefit from the judicious application of a little frugality. Except when it comes to system specs (see rule #3).
Rule #6 – Read the Famous Manuals
And the white papers. And the Wikipedia entries. And the promotional marketing material (if you’re into that kind of pain). In the case of Virtual Iron, read the forums… you might just find the install and admin docs there (yes, that’s a criticism of your website, Virtual Iron) Read whatever you can read on the subject of your servirt environment. For example, when the company I’m with went looking at VI3, I read through a ton of literature and came across an HP document that was immensely valuable. I also found this chart very useful, albeit becoming outdated. When we first embarked on our trip towards virtualization there must have been a gigabyte of material on my hard drive about VMware’s product offerings (at the time, that consisted of ESX and GSX). As we’ve progressed, I’ve accumulated a small library of PDF files, demo software, and links. Small like the New York Public Library is small.
Rule #7 – Don’t put all of your Active Directory domain controllers on the same hosts.
If you do, you’re in for trouble when a host falls over and goes boom. And they do, once in a while, fall over and go boom. Or they may face a G2G attack, in which case your entire AD environment is hosed. If you’re a Novell shop, good for you, but don’t put all your eDirectory servers on one host either. Red Hat Directory Server shop? See above. If you’re using VMware’s VI3, make sure that HA/DRS is configured to prevent all your directory servers from being on the same host, because even if you design it so it won’t happen by laying out the AD controller guests on different hosts, you’re just a slice of probability and a few resource utilization spikes from DRS putting them all on the same srver for you. Me, I leave one AD controller out in non-virtual land just because I can (ok, because I have a spare old server that does nothing else).
Rule #8 – Document Everything
The usual rule of document everything goes here, like it does everywhere. I won’t go into the obvious points, but there are a couple of not-so-obvious points that need to be mentioned. Naming conventions… there’s been some good talk about this, and I won’t repeat it, but remember to name your servers appropriately so when you do a quick network scan you can tell what’s what from the resolved names. Remember, not all management happens in a console, even if you are 100% virtual. What’s supposed to be where… this can change a lot in a well-designed servirt environment because of HA/DRS and similar tools, but document the starting points for all virtual servers and take regular performance metric updates to see what has moved where and why it’s moved.
Rule #9 – Switches and NICs, Switches and NICs, Gonna Get Me Some Switches and NICs.
This is about NICs, really. Lots of NICs. You can never have enough NICs. Fill every slot you can with NICs. Have the available external switch ports to support those lots of NICs. Why? Because some applications will eat your bandwidth like it was Kibble n’ Bits. That means some virtual machines will choke others, given the chance. To get off the 80’s dogfood commercial metaphor and onto a gardening metaphor, bandwidth is like sunlight, and some apps are like pretty weeds. The soak up everything they can, leaving little for others. You can’t kill them, either. If you find you’re in a situation like this, having lots of NICs in your server can make all the difference, because now you can add it to the virtual machine weed you’ve got and essentially transplant it away from the rest of the environment. Some care needs to be taken with HA/DRS, but in that case you need to look more at teaming and aggregating those many NICs and switch ports properly.
Rule #10 – Storage
In some cases, violate rule #5. In our lab, we started with freebie OpenFiler as an iSCSI solution, until it came time to test VMotion. Sometimes it went boom. Other times it was fine. We couldn’t figure out why until we followed rule #6 and found out it was a problem with IET’s (the iSCSI target under OpenFiler) use of SCSI commands vs. VMware’s interpretation. The point being, this is an extension of rule #3… only about storage. Having the right storage environment is crucial… you can have your servirt environment set up 100% perfectly, but if your storage isn’t 100% perfect you’re going to run into all kinds of problems with moving guests around. Since that’s the whole POINT of virtualization’s DR advantage, having a bad storage strategy is essentially having a bad DR strategy. This is to say, anything under 100% on both is an F… because in DR there are no Bs, C, or Ds… just an A+ and an F. For what it’s worth, the problems with OpenFiler and VMware seemed to be fixed at this point, and we’ve gone back to using it in test environments for possible production use once we have 100% confirmation.
Well, that’s it for now… another set of these common sense ideas will probably be forthcoming (maybe after I’ve finished playing with Virtual Iron in the lab and actually get around to posting that long-promised review).