It is common for file systems on a Linux server to be configured to remount in read-only mode if errors are detected. Unfortunately, this setting in combination with VMware VI3 can have both unforeseen and undesirable consequences.
According to the man page, a Linux file system can be configured to behave three different ways if errors occur:
errors=continue / errors=remount-ro / errors=panic
Either ignore errors and just mark the file system erroneous and continue, or remount the file system read-only, or panic and halt the system.
The default is set in the file system superblock, and can be changed using tune2fs(8).
The first choice, to continue, may be fine for systems that contain unimportant data, but it is not a good thing in a given environment to have a server continue after a write error as if nothing happened. The third option, panic, will simply cause the server to kernel panic and reboot if a file system error is detected. However, a reboot may not fix the problem and now the server is in a changed state, making it much more difficult for an administrator to figure out what happened.
The ideal setting is for the file system to remount as read-only if errors are detected. That way, an administrator can diagnose the issue and then take the appropriate action. Remounting file systems as read-only can sometimes have little effect, or sometimes can cause a server to stop behaving normally. For example, if a Linux Web server has its /var/log file system remounted as read-only, it is likely that some service on that server will stop functioning because it cannot write out its logs.
So what does all of this have to do with ESX?
The path failover problem
Many ESX installations are attached to a storage area network (SAN) for shared storage and these servers can be subject to multipathing. Multipathing is the technology used to maintain a constant connection to the SAN in case of the failure of a storage processor, a host bus adapter, a switch, or even something as simple as a fibre cable. Although ESX takes advantage of multipathing, only one path is active at any given time. In the event of a path failure, a path failover occurs where ESX starts sending and receiving all disk activity to another path.
It is not unusual for path failovers to occur on a semi-regular basis -- perhaps once or twice a month. The problem that crops up is how Linux virtual machines (VMs) react to an ESX path failover. If a Linux VM is in the middle of a disk write when a path failover occurs, ESX will notify the VM's virtual SCSI controller that it is busy -- and instruct the controller to wait. The VM decides that the disk is inaccessible and the disk write faults, causing an error. The error is handled in tune with how the file systems "error" value is set. Because it is becoming more standard to remount file systems as read-only when an error occurs, the file system that generated the error will be remounted as read-only. As long as the file system did not include /var/log, there should be an error in the syslog that looks similar to the following:
SCSI Error : <0 0 0 0> return code = 0x20008
end_request: I/O error, dev sda, sector 4928181 Aborting journal on device dm-0 ext3_abort called.
EXT3-fs error (device dm-0): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only.
This behavior is desirable when it occurs infrequently because it gives an administrator an opportunity to determine what caused the event and how to prevent it from occurring again in the future.
However, with ESX and multipathing, there is an increased probability of frequent path failovers. How then do you react properly to this situation?
With VMware ESX, path failovers frequently occur when errors prompt a remount configuration in read-only mode. This problem is the result of ESX and multipathing—technology used to maintain a constant connection to a storage area network in the event of certain equipment failures. There are three options to solve this problem:
- Implement a patch from VMware that will fix this problem on a small number of Linux distributions.
- Edit your kernel's sources and install the new kernel module by hand.
- Have the VM send you an email when this problem occurs so you can take note of it and send another email to VMware requesting a patch for your Linux distribution.
Let's explore each option in more detail.
Option 1: Implementing VMware's fix
VMware has responded to the overwhelming user discontent on forums with regards to this problem by releasing a knowledge base article and solution for a limited number of Linux distributions. Thus far, the supported Linux distributions for the patch are Red Hat Enterprise Linux 3 and 4 and SUSE Linux Enterprise Server 9 SP3. If you manage a VM that uses one of these operating systems (OSs) as its guest OS, you are in luck; there is a supported fix available online at VMware's support Web site under KB 51306.
Option 2: Patching the kernel module source
If your Linux distribution is not one that VMware supports via this patch, it is still possible to fix the problem. We can fool the VM into thinking there is not a problem occurring with the file, preventing a file system error.
Most Linux distributions that ship with package management systems these days, such as RPM or DEB, will ship kernel source and kernel header packages. For this operation, you will need both sets, because the header package usually includes the latest .config file to be used with the kernel source. To download both the source and header packages in Ubuntu Linux for the running kernel simply type:
sudo apt-get install linux-source-`uname -r | sed "s/-.*//g"` linux- headers-`uname -r`
Change directories to /usr/src and there will be a directory for the headers, but not for the sources. You need to deflate the source tarball:
tar xjf linux-source-`uname -r | sed "s/-.*//g"`.tar.bz2
Use your favorite editor to open the file /usr/src/linux-source- `uname -r | sed "s/-.*//g"`/drivers/message/fusion/mptscsi.h. Around line 739 the following stanza should appear:
if (scsi_status == MPI_SCSI_STATUS_BUSY) sc->result = (DID_BUS_BUSY << 16) | scsi_status; else sc->result = (DID_OK << 16) | scsi_status;
Replace the second line of the stanza so that it looks like this:
if (scsi_status == MPI_SCSI_STATUS_BUSY) // sc->result = (DID_BUS_BUSY << 16) | scsi_status; sc->result = (DID_OK << 16) | scsi_status; else sc->result = (DID_OK << 16) | scsi_status;
Save the file and exit the editor. Copy the .config file from the root of the headers directory into the root of the sources directory. Change directories to the sources directory and run:
This command will parse the .config file from the headers package that was copied into the source directory and create a Makefile that includes all of the configuration settings straight from the distribution's maintainer (in this case Canonical). The next command will take a while, so go make yourself some coffee or enjoy some fresh air after you type:
The next step is to replace the old kernel modules with the new ones. Before we do this let's make sure we back up the old kernel module by typing:
cp /lib/modules/`uname -r`/kernel/drivers/message/fusion/mptscsih.ko / lib/modules/`uname -r`/kernel/drivers/message/fusion/mptscsih.ko.bak
Now copy the new file into place:
cp /usr/src/linux-source-`uname -r | sed "s/-.*//g"`/drivers/message/ fusion/mptscsih.ko /lib/modules/`uname -r`/kernel/drivers/message/ fusion/
Reboot the server and voilà, the system should no longer be subject to the whims of path failovers.
If you run an Ubuntu VM and use kernel version 2.6.15-28-686 and want a shortcut, look no further. I have uploaded the modified source and kernel object files to my Website where you can download them. The file is mptscsih.tar.gz and its md5 sum is fe2994417d0e8d2c1a17898bca293c8b.
Option 3: Email notifications
If your Linux VM is not supported by VMware's patch and you do not feel comfortable modifying kernel sources, you should at least configure the VM so that you will be notified when this problem occurs. One way to do this is to create a script that is run as a cronjob every 10 minutes or however often you like. An example of such a script is:
--- BEGIN SCRIPT --- #!/bin/bash # # use the first argument to this script as the # email address to send notifications to TO="$1" # # get the output from the mount command # MOUNT_OUT=`mount` # # see if the string 'ro' exists in the # output of the mount command. be careful, # if there is a CD-ROM inserted into the # server this will always be true and you # will get a lot of false positives echo $MOUNT_OUT | grep \(ro\) # # get the return code for the grep # operation. # RO=$? # # grep returns an exit code # of 0 if there is a match # if [ "$RO" = "0" ] then # # send an e-mail notification saying # that there is a file-system that # has been mounted as read-only # BODY=$MOUNT_OUT echo read-only file systems found echo $BODY `which sendmail` -f [email protected]`hostname --fqdn` -t << FooBar From: [email protected]`hostname --fqdn` To: $TO Subject: `hostname` has read-only file systems $BODY FooBar # # exit with a status code of 1 if # read-only file systems were found # exit 1 fi # # exit with a status code of 0 if no # read-only file systems were found # exit 0 --- END SCRIPT ---
Install this script as a cronjob -- don't forget to give it an email address as an argument -- and it will alert you if your VM has had one of its file system's remounted as read-only, giving you a chance to diagnose the problem. Keep in mind that this script assumes you are running a local mail server, but it can easily be modified to send the mail through a relay host.
Be sure to let VMware know that this is happening to your server so they can hurry up and release a patch for your Linux distribution.
When VM file systems are remounted as read-only on ESX, a problem occurs, but there are different ways to solve that problem. If you have any remaining questions, please feel free to e-mail me via SearchServerVirtualization.com at [email protected].
About the author: Andrew Kutz is a Microsoft Certified Solutions Developer (MCSD), a SANS/GIAC Certified Windows Security Administrator (GCWN), and a VMware Certified Professional (VCP) in VI3. Thanks to his study, "Sudo for Windows (sudowin)", he has obtained SANS GOLD status.