Forum Discussion

slowpoke115_145's avatar
slowpoke115_145
Icon for Nimbostratus rankNimbostratus
Feb 21, 2014

Failover is far too slow

Hi guys, this is my first post and I'm relatively new to LTM's, so my apologies if this is a basic question...

 

Is there a way to speed up failover? I've setup a virtual LTM and configured HA, failover (via vlan fail-safe) and port mirroring - I've also setup an HA group with a high weight and threshold and configured failover unicast on the self-IP I intend to use for load balancing the vlans.

 

If I run constant pings on the VIP, when I switch off the active LTM the failover takes minutes.

 

I see lots of errors in the logs relating to a CMI peer being unavailable (this would be expected given the partner is switched off).

 

Let me know if more info is needed.

 

3 Replies

  • it would help if you share more details, you do provide a lot of settings but don't provide the actual values.

     

    also the exact scenario is unclear to me. you have the system setup, you start a ping on a vip, you power the active one down hard? then it takes minutes (how many, again details) for the secondary to start replying?

     

    what do you see in the GUI, does the secondary device become active according to the GUI? have you tried a manual failover? does that go better?

     

    it kinda sounds like a L2 problem, the packets are still send to the wrong switch port because the mac table isnt updated. can you check the MAC tables on your switches? if you clear them, is the VIP reachable directly afterwards?

     

    a lot of check and try, good luck.

     

  • VLAN failsave is slowish, but that is to be expected (the timer is 10 seconds on it lowest and like 90 seconds default) and has no influence if you take the active unit out fully. it is useful to detect no traffic on a network, so an issue somewhere else in your network not on the big-ip.

     

    as you have noticed the failover happens, pretty much instantly i expect if you shut down the active unit.

     

    but the traffic doesnt reach the now active unit. have you tried sniffing for the traffic to see if it actually reaches your big-ip?

     

    as pointed out before the issue feels outside the realm of the big-ip. do some basic network trouble shooting, try to determine if the http traffic actually reaches your big-ip. if it does, then try to determine if the secondary unit can also reach your nodes?