Jordan_Bean_883
Feb 17, 2009Nimbostratus
NAT not working, possible ARP issues
We have a pair of older 4U Big IP's running BIG-IP Kernel 4.2PTF-10 Build95 in active/passive mode. We used these previously with no issues with one interface as an external interface and the other as an internal interface. Failover, etc. worked fine.
Now, we have both setup in single arm mode, each homed to a different Cisco switch (we call them CORE1 and CORE2). We have Etherchannel setup and the 200 Mbps link is a trunk. All connectivity seems to be working fine.
We recently added a second IP address to a server that's being load balanced. We setup a NAT translation. It will work for about 5-10 minute and then we get timeouts/destination host unreachables. From the ARP table in the switches, we see that the Big IP is responding to the ping requests. If we SSH into the Big IP and run a ping test, we get a "host down", even though other hosts on the same subnet can ping the IP. An "arp -a" shows the IP address with "(incomplete)" listed. If we swap the primary/secondary IP on the server (so traffic is generated from the new IP), then things start working.
What is unique with this setup is that the servers having issues are blade servers. They are connected to a switch in the chassis that is then dual homed into the same 2 routers at the Big IP's running STP.
As I write this, the passive LB is showing the node's ICMP as being down while the primary shows the node as up.
I definitely think this is an ARP problem, but not sure what to do.