The overwhelming majority of VMware shops use block-based Fibre Channel or iSCSI storage for their VMware ESX Server data stores, but some are beginning to ask whether Network File System (NFS) might be a better fit.
IT business services provider T-Systems, for example, made the architectural decision to go with NFS a couple of years ago when it was designing its VMware-based Dynamic Services managed hosting platform."We really felt that networked NFS was the best option," said Dr. Gregory Smith, Dynamic Services director. "SAN [storage area network] has the advantage in terms of throughput, but in terms of flexibility and management, NFS is much better." As an industry-standard shared file system, NFS volumes are easy to mount and dismount, are supported by any number of clients, and are well understood by IT staffers.
Smith also promoted NFS' security benefits. Since NFS runs over Ethernet, T-Systems takes advantage of virtual LANs (VLANs) to isolate virtual machines (VMs) from one another. "The VMs just see their stuff. You just expose this file system to this VLAN, and you don't have to deal with SANs, switches and all that stuff."
In contrast, housing VMs in VMware's VMFS on a SAN does not currently allow you isolate one VM's storage from another if you want to use VMotion across ESX hosts. Support for N_Port ID Virtualization (NPIV) in ESX Server 3.5 will change that; but even then, NPIV will be supported only on raw devices, not on VMFS, VMware's proprietary file system.
T-Systems hosts applications for about 70 customers servicing 80,000 users on its VMware-based Dynamic Services platform on top of network-attached storage (NAS) storage from Network Appliance Inc. (NetApp). The firm uses "a good chunk of the [NetApp] OnTap suite," Smith said, including Snappmirror Software for replication, SnapVault to create a 30-day archive, and MetroCluster to create an active compute cluster between geographically dispersed data centers.
Performance penalty moot?
But what about performance? "When you tell people you're going to run VMware on NFS, people look at you like you're nuts," said Scott Lowe, virtualization blogger and consultant with ePlus Inc. in Herdon, Va. That's because, from a raw throughput perspective, NFS on Gigabit Ethernet can't compete with Fibre Channel's 4 Gbps of raw throughput. "If all you're looking at is raw throughput, Fibre Channel beats the pants off NFS," Lowe said.
But Lowe and some of his peers have noticed an interesting paradox: The larger the number of VMs in an environment, the less of an advantage Fibre Channel's throughput becomes. In fact, in very large environments, NFS may actually perform better than Fibre Channel, Lowe said.At issue is contention for disk access by multiple VMs. "When you start to load up the number of VMs in a data store, you run in to a SCSI locking issue that prevents all the open hosts from accessing the LUN [logical unit number]," he said. "The more hosts you have, the less of an issue [Fibre Channel's] raw throughput becomes, since you can't always access that throughput." Not everyone concurs, however. "That sounds like a lot of FUD [fear, uncertainty and doubt]," said Greg Schulz, founder of the StorageIO Group in Stillwater, Minn. "Certainly, if you set up the array for contention, you're going to have some issues. But if you follow vendors' best practices, you shouldn't have a problem." Schulz said that most VMware-certified Fibre Channel SAN arrays publish "cookbooks" that detail the speeds and feeds of their systems as well as the maximum recommended number of hosts and VMs per port on their array.
Further, disk contention isn't unique to SAN, Schulz said. "The same thing can happen on a NAS device; except that there, instead of talking about SCSI initiators and targets, you're talking about the number of threads or sessions."James Price, CEO of Fairway Consulting Group Inc. in Sunrise, Fla., most definitely does not agree about NAS' performance advantage. "NAS has a place," he said, "but I don't think you'll find any schooled storage architects that will tell you with a straight face that it should be used as a primary storage platform."
First, the disk contention issue that Lowe cited may have been a factor in previous versions of ESX, but it has been alleviated in the new version of VMFS that came out with Virtual Infrastructure 3. Now, Price explained, instead of all the hosts in a cluster sharing a single journal for access to disk, each host now has its own dedicated journal, he said, which eliminates the locking issue.
Second, Price worries about the additional layer of complexity introduced by a third-party file system, NFS. "The real danger of NAS is doing block I/O against someone else's file system," he said. "You have no visibility into that from ESX," he said. Furthermore, there are all sorts of things you can't do with NFS as your storage repository: "out-of-band operations for backup, clustering, boot from SAN, raw devices -- you can't do any of that with NFS."
Nevertheless, ePlus' Lowe said that more customers are taking the NFS route, including "one large-scale customer that is initiating a migration off top-tier Fibre Channel," he said. Performance questions aside, part of NFS' allure is that certain NAS devices come with vendor-specific features -- for example, those from Network Appliance. "For them, the idea of backing up all that data was a tremendous question that they did not have an answer to. They had looked at [VMware Consolidated Backup] and weren't at all pleased with it," Lowe said. Instead, they will store data on NetApp NFS volumes. That way, "they can get to the data from any mature client, and backing up is much easier."
Let us know what you think about the story; email: Alex Barrett, News Director.