EXPERT RESPONSE
Actually, I do. I've seen this problem first hand, and you are not the
first person who has run into it.
First of all, you can go tell your networking guys that the problem is not a bridging loop.
Actually, since you indicated that you are not familiar with bridging loops you should read up on them. Cisco has a good reference document that explains what a bridging loop is. Basically, when a bridging loop occurs, two or more bridges essentially perform a denial of service (DOS) attack on the network
segments involved in the loop. This is bad mojo. Transparent bridging
is actually a good thing, and the spanning tree algorithm was invented
to help cure the problem of bridging loops by finding non-loop
pathways in a network topology.
I spoke with VMware earlier this year
(2007) in March about the network internals of ESX. Patrick Lin and
Jacob Jensen related that ESX does not possess loop detection or
implement the spanning tree algorithm because ESX does not allow you
to interconnect multiple virtual switches, which would create a
brewing ground for a bridging loop. There you have it, from the mouths
of folks at VMware: it is not possible to create a bridging loop with
ESX.
So what is going on in your situation then? Here is the kicker: your
problem has nothing to do with ESX whatsoever. It just happens that
you, and many, many, other IT administrations like yourself are for
the first time able to cost effectively create server farms - be it
Web servers, database servers, DNS servers, whatever, thanks to
virtualization.
Of course, when there is a problem, ESX and its
mysterious networking capabilities get blamed. In all fairness to your
networking department, I have tried to talk to VMware many times and
get them to be far more open about the internal workings of ESX with
regards to networking. The fact of the matter is that network
administrators are very suspicious people - and for a good reason -
they are expected to have 100% uptime. They do not want any device on
their network that could potentially chew through bandwidth faster
than I can make my way through a 300-count bucket of Super Bubble bubble-
gum (23 minutes is my record). Hence these poor chaps are quick to
blame ESX: novus scelorum and all that.
The fact of that matter is
that this same problem would have occurred regardless of the type of
servers you were load balancing: virtual or physical. You just have
not tried this particular configuration on your load-balancer before
because you had no reason to without the plethora of severs you can
create with ESX.
The problem is occurring because your load-balancer is configured to
forward incoming Ethernet frames based on their media access control
(MAC)-address, and not their internet protocol (IP) address for
purposes of creating a network design so that the server bypasses the
load-balancer when sending frames back to the client.
This setting by
itself would not be a problem except that you have either mistakenly
or intentionally also configured the load-balancer, or a particular
service on the load-balancer, to not use the source IP address of the
client when forwarding frames. Let me guess: you are trying to
configure a service on the load-balancer to forward Ethernet frames
such that when they arrive at their destination it appears as if they
originated from the original client and not the load-balancer itself?
Yep, been there. The issue is that if you do not set the configuration
option so that the source IP address is rewritten as well, you will
create what appears to be a bridging loop. Please allow me to
illustrate:
Values exist as such:
Client IP: 111.111.11.111
Client MAC: 00:00:00:00:00:11
Load-Balancer IP: 222.222.22.222
Load-Balancer MAC: 00:00:00:00:00:22
Server IP: 333:333:33:333
Server MAC: 00:00:00:00:00:33
Phase 1: A client builds an Ethernet frame to send to a server. First
it has to travel through a gateway and load-balancer. The client sends
the frame to the gateway.
Phase 2: The gateway forwards the frame to the load-balancer.
Phase 3: The load-balancer alters the frame's source MAC address so
that it is the MAC address of the load-balancer itself. However, the
load-balancer is incorrectly configured and does not change the source
IP address of the frame to be that of the client's before forwarding
it to the server.
Phase 4: The server receives the frame and builds a response. Because
the server thinks the frame came from the load-balancer it sets the
response frame's destination IP address to the IP address of the load-
balancer, the same for the destination MAC address. This is where
things fail. In accordance with most load-balancer systems that
implement something akin to this (see the Citrix Netscaler
documentation on Direct Server Return), you must also create a loopback network interface card (NIC) on the
server.
The loopback NIC is configured so that its IP address is that
of the load-balancer and its gateway is that of the router that sits
in front of the load-balancer. This is to explicitly keep the server
from sending anything back to the load-balancer since in this
configuration to do so would flood the load-balancer. However, because
the load-balancer failed to re-write the source IP address of the
original frame, the server's response puts the IP address of the load-
balancer in the destination field of the new Ethernet frame. The new
frame gets sent to the gateway which then sends it right back to the
load-balancer. Even if the loopback adapter on the server is
improperly configured, the frames will still get sent back to the load-
balancer. This is where I am a little hazy on the inner-workings of
specific models of load-balancers, but for some reason the frames then
get looped right back at the server, which throws them right back at
the load-balancer: hence a DOS attack that causes long queues for
other, properly configured, services that use the load-balancer.
I've personally caused this on a Netscaler series from Citrix, and two
other IT administrators have seen this as well on a Netscaler. I would
hazard a guess you are using a Netscaler? Even if you are
not, this configuration is possible on other load-balancing devices.
The moral of the story is this: if you are going to set up a type of
Direct Server Return architecture, be sure to use the source IP
address, otherwise you can spam yourself into oblivion!
Hope this helps!
|