Forum Discussion

Chintan_Patel_1's avatar
Chintan_Patel_1
Icon for Nimbostratus rankNimbostratus
Oct 17, 2014

Load-Balancing Client/Server on same subnet

Hi all,

 

I’m working on a customer's issue where the VIP, pool members and clients are all on the same subnet (lets say 10.1.1.0/24) and no SNAT on the VIP. I'm seeing an issue related to ARP that only affects one of the 6 servers. All servers are Dell.

 

Client – 10.1.1.12

 

VIP – 10.1.1.4:25

 

Member Servers 1-6 – 10.1.1.5-19:25

 

I would expect the flow to go from Client to VIP to Member Server directly back to the client and break communications, but 5 out of 6 servers actually go right back through the F5 and work fine. If you look at the ARP cache on these boxes, it only has one or two entries pointing to the F5 - none for the client (10.1.1.12).

 

The server that does not work does have an ARP entry for the client IP and the return traffic goes directly back to the client. No static ARP setup on the working servers as far as I can tell. Nothing in the F5 configs about mac spoofing/masquerading. Creating a static arp entry for the client IP to point to the F5 mac on the server that's not working fixes this server. I'm a little confused on any of the servers are working at all. Any insight into how this is supposed to work without SNAT would be extremely helpful.

 

Thanks in advance!

 

Chintan

 

7 Replies

  • BinaryCanary_19's avatar
    BinaryCanary_19
    Historic F5 Account
    Why are you troubleshooting a design that you know will not (or should not) work instead of fixing it? The default gateway is irrelevant, since you say all concerned devices are in the same subnet.
  • Have a look on this (auto last hop): https://support.f5.com/kb/en-us/solutions/public/13000/800/sol13876.html?sr=41103729

     

  • Agreed that there is a design issue there. Unfortunately it's a customer's network, so I don't have a lot of flexibility to change their design. More importantly I'm trying to understand what the expected behavior is from the F5 AND the server standpoint in this scenario and why any of them are working at all.
  • Thanks for the response Matthieu. Was not familiar with that feature, but based on the description, I don't think it applies to this issue. Sounds more like it's a feature on the F5 to get the return traffic back to the original source, assuming that the return traffic is going through the F5. Let me know if I misunderstood.

     

  • You do right, your issue is on other side (server side).

     

    If you do not use SNAT, IP address used to open the server connection is the client IP address (and the mac used is BIGIP's one). If you use SNAT, the IP address used is the BIGIP's selfIP (and mac as well).

     

    In your case, servers see client IP address in the connection as source but in the same subnet (broadcast domain). Switches as well (and one switch knows your client MAC).

     

    I would say, if one server sees ARP request / response / broadcast from the client 10.1.1.12, the server should learn this entry and so, send traffic back to the client instead of the BIGIP.

     

    Take a trace and check ARP / IP layers. You should see ARP requests for 10.1.1.12. And should see a difference between working and non working TCP connection at layer 2. I do not understand why working servers don't have any entry for 10.1.1.12 if the original TCP connection uses it.

     

  • "I would say, if one server sees ARP request / response / broadcast from the client 10.1.1.12, the server should learn this entry and so, send traffic back to the client instead of the BIGIP."

     

    That makes sense. Thanks a lot

     

    "I do not understand why working servers don't have any entry for 10.1.1.12 if the original TCP connection uses it."

     

    Yep, that was a head scratcher. I'm wondering if the linux box doesn't need to ARP out for reply traffic since the request has an IP/MAC association in the packet. But it doesn't seem to cache it either. Thanks for replying.

     

    I'll update the thread if I find anything else, but I'm thinking on the one that's not working, it learned the mac through some unknown communication where it needed to ARP out for .12. It didn't work after clearing the entry, possibly due to the linux implementation of the ARP cache, which keeps the entry in the table as "Incomplete" for 20+ seconds. It might be ARPing out during the timeout period.