Does LB_Failed have the same criteria as using an HTTP Fallback Host?

Question

Nice quick one...&nbsp;&nbsp;Is there any difference at all between when the LB_FAILED event fires and an HTTP Fallback Host configured on an HTTP profile would fire?&nbsp;&nbsp;&nbsp;We occasionally have about 0.1% of connections over a narrow 10 second period receiving a fallback host from a number of very busy virtual services. Something like 200 failed connections out of 50,000,000 per day!&nbsp;&nbsp;&nbsp;Trying to track down this needle in a haystack we're trying to completely understand when the fallback 302 would be sent, and it appears that it's exactly 100% of the same reasons the LB_FAILED event would fail, which would mean that a scenario where, say, an HTTP request IS made to a pool member and then has it's connection reset etc, would NOT cause the fallback host to kick in? Once a TCP connection is established to the member, both the even and the fallback redirect can never occur?&nbsp;&nbsp;&nbsp;In terms of once we're out on the wire, we're looking at only unreplied to SYN's or instantly RST requests before a 3 way handshake occurs. yup?&nbsp;

hooleylist · Answer

Hi Chris, 
&nbsp;  
&nbsp; Trying to track down this needle in a haystack we're trying to completely understand when the fallback 302 would be sent, and it appears that it's exactly 100% of the same reasons the LB_FAILED event would fail.   
&nbsp;  
&nbsp; I think that's correct. 
&nbsp;  
&nbsp; an HTTP request IS made to a pool member and then has it's connection reset etc, would NOT cause the fallback host to kick in? Once a TCP connection is established to the member, both the even and the fallback redirect can never occur? 
&nbsp;  
&nbsp; That's also correct.  You can handle this failure scenario using the after command.  You'd need to set a timeout in milliseconds to wait for a server response.  If it doesn't come then you could send an HTTP response back to the client and/or log something.  The second example on the after wiki page should be a good start: 
&nbsp;  
&nbsp; http://devcentral.f5.com/wiki/iRules.after.ashx 
&nbsp;  
&nbsp; I put in an RFE to support this type of response timeout in an HTTP profile.  The ID is BZ373937.  You could open a case with F5 Support to raise the visibility of the request. 
&nbsp;  
&nbsp; Aaron

chris_phillips · Answer

Hmmmmmmmm, so why, with a default tcp-lan-optimized profile on an HTTP vs are we getting LB_FAILED after as long as 72 seconds?? This suggests, to me at least, than the connection is (half?)opened, but maybe no data ever gets acked back from it? I'm sensing more of a subtlety about when a connection is officially deemed to have been balanced. Being 72 seconds, that naturally feels like some sort of time out period expiring... 
&nbsp;  
&nbsp;  
&nbsp; when LB_FAILED { 
&nbsp;  
&nbsp;     log local0. "LB_FAILED EVENT! vs=[virtual name] local_addr=[IP::local_addr] client=[IP::client_addr]:[TCP::client_port] LB_pool=[LB::server pool] LB_addr=[LB::server addr] age=[IP::stats age]ms" 
&nbsp;  
&nbsp; } 
&nbsp;  
&nbsp; Jan 27 00:05:46 [10.X] tmm1 tmm1[5195]: Rule _temp_LB_FAILED_logging_rule : LB_FAILED EVENT! vs=t2_XXX_vs local_addr=10.X client=10.X:57786 LB_pool=t2_XXX_pool  LB_addr=10.X age=17861ms 
&nbsp; Jan 27 00:05:46 [10.X] local/tmm1 info tmm1[5195]: Rule _temp_LB_FAILED_logging_rule : LB_FAILED EVENT! vs=t2_XXX_vs local_addr=10.X client=10.X:57786 LB_pool=t2_XXX_pool LB_addr=10.X age=17861ms 
&nbsp; Jan 27 00:05:46 [10.X] tmm1 tmm1[5195]: Rule _temp_LB_FAILED_logging_rule : LB_FAILED EVENT! vs=t2_XXX_vs local_addr=10.X client=10.X:57776 LB_pool=t2_XXX_pool  LB_addr=10.X age=25016ms 
&nbsp; Jan 27 00:05:46 [10.X] local/tmm1 info tmm1[5195]: Rule _temp_LB_FAILED_logging_rule : LB_FAILED EVENT! vs=t2_XXX_vs local_addr=10.X client=10.X:57776 LB_pool=t2_XXX_pool LB_addr=10.X age=25016ms 
&nbsp; Jan 27 00:05:47 [10.X] tmm tmm[5129]: Rule _temp_LB_FAILED_logging_rule : LB_FAILED EVENT! vs=t2_XXX_vs local_addr=10.X client=10.X:45299 LB_pool=t2_XXX_pool  LB_addr=10.X age=33885ms 
&nbsp; Jan 27 00:05:48 [10.X] tmm tmm[5572]: Rule _temp_LB_FAILED_logging_rule : LB_FAILED EVENT! vs=t2_XXX_vs local_addr=10.X client=10.X:58975 LB_pool=t2_XXX_pool  LB_addr=10.X age=72012ms 
&nbsp; Jan 27 00:05:48 [10.X] tmm tmm[5572]: Rule _temp_LB_FAILED_logging_rule : LB_FAILED EVENT! vs=t2_XXX_vs local_addr=10.X client=10.X:45281 LB_pool=t2_XXX_pool  LB_addr=10.X age=38729ms 
&nbsp; Jan 27 00:05:50 [10.X] tmm1 tmm1[5573]: Rule _temp_LB_FAILED_logging_rule : LB_FAILED EVENT! vs=t2_XXX_vs local_addr=10.X client=10.X:59032 LB_pool=t2_XXX_pool  LB_addr=10.X age=10473ms 
&nbsp; Jan 27 00:05:50 [10.X] tmm1 tmm1[5243]: Rule _temp_LB_FAILED_logging_rule : LB_FAILED EVENT! vs=t2_XXX_vs local_addr=10.X client=10.X:59028 LB_pool=t2_XXX_pool  LB_addr=10.X age=11962ms 
&nbsp;  
&nbsp; So that's over night, with logs from multiple LTM's going to multiple members (XXX's obscured that fact though) via another LTM forwarding vs on a different pair of LTM's. Can you explain this huge delay in the LB failing??

hooleylist · Answer

Hi Chris, 
&nbsp;  
&nbsp; See the LB_FAILED wiki page for details.  What do you have the max syn retransmits set to on your TCP profile?  Let me know if the LB_FAILED info doesn't match up with what you're seeing in your TCP profile(s). 
&nbsp;  
&nbsp;  
&nbsp; http://devcentral.f5.com/wiki/iRules.lb_failed.ashx 
&nbsp;  
&nbsp; LB_FAILED is triggered when LTM is ready to send the request to a pool member and one hasn’t been chosen (the system failed to select a pool or a pool member), is unreachable (when no route to the target exists), or is non-responsive (fails to respond to a connection request). 
&nbsp;  
&nbsp; If the target fails to respond to a connection request, the "Maximum Syn Retransmissions" option in the TCP profile will affect the amount of time before LB_FAILED is triggered. 
&nbsp;  
&nbsp; When a client doesn't receive a response to the SYN, there is a defined algorithm for the specified number of re-tries. First retransmission if no response is typically 3 seconds, and typical back-off timer algorithm is to double the wait time after each failed attempt. 
&nbsp; ... 
&nbsp; LTM's default tcp profile sets "Maximum Syn Retransmissions" to 4, so with the default setting, LB_FAILED would be triggered if server didn't respond in 45 seconds: 
&nbsp;  
&nbsp; 11st SYN:  0 
&nbsp; 2 2nd SYN: +3 seconds 
&nbsp; 3 3rd SYN: +6 seconds 
&nbsp; 4 4th SYN: +12 seconds 
&nbsp; 5 5th SYN: +24 seconds 
&nbsp; 6====================== 
&nbsp; 7LB_FAILED: 45 seconds 
&nbsp;  
&nbsp;  
&nbsp; Aaron

chris_phillips · Answer

I have retransmits set to 3, But I think that that table is not at all correct. I did a test on a dev system on 10.2.0 and just saw retries (tcpdump) every 3 seconds until it hit the max, so with a fake pool member which would never connect, LB_FAILED always fired on 12001ms (well... ish). No sign of an incremental back off whatsoever. 
&nbsp;  
&nbsp; We've found that these blips are apparently all on members on a single physical host (but with multiple IP's), and so far these seem to only be on Solaris 10 v490 boxes, which are a significant minority of the estate, and we can also see all http traffic go awol for this brief time period... very strange. Feels arp-y to me, but who knows... But it doesn't look like an LTM / TMOS issue at heart. 
&nbsp;  
&nbsp; Can you think of a scenario where these LB_FAILED events would be firing on such vague times, when the members appear to be freezing in some way? I'm thinking it would need to be a RST as if it were not something coming from the server, but delayed, then the LB_FAILED's would still be firing based on retry intervals.

Forum Discussion

Does LB_Failed have the same criteria as using an HTTP Fallback Host?

4 Replies

Recent Discussions

Chrome V 124+ on MacOS - Virtual Server Access Issue

Port Group on F5OS network setting

F5Access | MacOS Sonoma

Rewrite uri translation not working

Transaction looks successful but file not downloading

Related Content

Manage F5 BIG-IP FAST with Terraform (Part 3 - Manage multiple AWAF policies based on HTTP criteria)

Transparent Kerberos Authentication and APM fallback authentication

HTTPS passthrough fallback URL

APM Kerberos Auth or fallback to another authentication method

Cookie persistence with source IP as fallback