Forum Discussion

Matt_Breedlove_'s avatar
Matt_Breedlove_
Icon for Nimbostratus rankNimbostratus
Apr 25, 2009

Help with avoiding Reselect bug in LB_Failed

Hi All,

 

Running BIG-IP 9.3.1 Build 37.1

 

Was planning on using LB:Reselect in LB_Failed then I saw the known issue on the reference page about causing a system crash...yikes. Here is the link I am referencing
 http://devcentral.f5.com/wiki/default.aspx/iRules/lb__reselect 

 

VS has cookie insert persistence profile with 17min timeout

 

Also concerned even if it doesn't cause a crash on the bigip, will re-select the same failed member. The whole reason I use "LB::reselect pool" instead of just "pool" is to make sure I get a member that is up.

 

Here are two working versions of the rule. I am concerned about a few things:

 

1) Can I really not safely use LB::reselect and/or LB::mode inside of an LB_Failed event without crashing out my BigIP? What is the alternative...to only use "pool"? Doesn't using "pool" only instead of "LB::reselect pool" potentially will send a request to a member node that is actually down? Wiki page makese it sound like that

 

2) Are there context rules as to what commands/statements you can issue inside of events? Can I call HTTP:Redirect inside of LB events with impunity or should I only call HTTP:anything inside of CLIENT_CONNECT, HTTP_REQUEST, and HTTP_RESPONSE?

 

3) In the first rule I am using a pool of static html web servers for the maintenance window page...in the second it is using a redirect to an external URL with the page. Is one or the other preferred? Should I just code the maintenance window page (very simple short html page...few lines) into the irule itself using HTTP::response and avoid having to use new pools or new external URL's. Performance impact?

 

4) Is the various uses of "persist none" making sense below? Not necessary for HTTP:Redirect?

 

First rule

 

      
 when CLIENT_ACCEPTED {      
   set lb_retrys 0      
   set clip [IP::client_addr]      
   log "Route connections to the VIP from a member node back to itself (if its the caller)      
   if { [LB::status pool webpool_80 member $clip 80] eq “up” } {      
     pool webpool_80 member $clip 80      
   } elseif { [active_members webpool_80] >= 4 } {      
     LB::mode rr      
     LB::reselect pool webpool_80      
   } else {      
     HTTP::uri /          
     persist none      
     pool maintpool_80      
   }      
 }      
 when LB_FAILED {      
   if { $lb_retrys < [active_members webpool_80] } {     
     persist none     
     LB::mode rr      
     LB::reselect pool webpool_80      
   } else {      
     HTTP::uri /      
     persist none      
     LB::mode rr      
     LB::reselect pool maintpool_80      
   }      
   incr lb_retrys      
 }      
 

 

Here is the other rule that uses the redirect to an external URL, but I still was planning on using LB:reselect as I can't figure a safe way to use the LB_FAILED retry logic inside that event without using LB::reselect.

 

Second rule

 

      
 when CLIENT_ACCEPTED {      
   set lb_retrys 0      
   set clip [IP::client_addr]      
   log "Route connections to the VIP from a member node back to itself (if its the caller)      
   if { [LB::status pool webpool_80 member $clip 80] eq “up” } {      
     pool webpool_80 member $clip 80      
   } elseif { [active_members webpool_80] >= 4 } {      
     LB::mode rr      
     LB::reselect pool webpool_80      
   } else {     
     persist none     
     HTTP::redirect http://maintenance_window_page      
   }      
 }      
 when LB_FAILED {      
   if { $lb_retrys < [active_members webpool_80] } {     
     persist none     
     LB::mode rr      
     LB::reselect pool webpool_80      
   } else {      
     persist none      
     HTTP::redirect http://maintenance_window_page      
   }      
   incr lb_retrys      
 }      
 

 

Can someone help provide feedback on these issues and the irules so far?

 

Thanks

 

Matt

3 Replies

  • Hi Matt,

     

     

    See below for feedback:

     

     

    "Also concerned even if it doesn't cause a crash on the bigip, will re-select the same failed member. The whole reason I use "LB::reselect pool" instead of just "pool" is to make sure I get a member that is up."

     

     

    You could mark the current pool member down using LB::down. I think selecting a pool in LB_FAILED using pool does just that: selects the pool. It wouldn't do anything to retry the request, so no new request would be made and no response would be sent to the client. I could be wrong on this, but I've only seen LB::reselect used as it forces a new selection and retry.

     

     

    1) Can I really not safely use LB::reselect and/or LB::mode inside of an LB_Failed event without crashing out my BigIP? What is the alternative...to only use "pool"? Doesn't using "pool" only instead of "LB::reselect pool" potentially will send a request to a member node that is actually down? Wiki page makese it sound like that

     

     

    It looks like TMM crashes when the reselected pool member is down or sends a reset. This combination of events might not happen frequently, but when it does the result is significant. I'd check with F5 Support to see if there is a hotfix available for 9.3.1. If not, see if they can build one for you. 9.3.1 might be two major versions back but it's till under support for the better part of a year.

     

     

     

    https://support.f5.com/kb/en-us/solutions/public/8000/700/sol8724.html

     

     

    This is the result of a known issue. When you use the LB::detach and LB::reselect commands simultaneously within an LB_FAILED event, and the target pool member is unreachable or rejects the connection, double freeing causes a system crash.

     

     

    The double freeing of the connection resources on the server side occur because the LB::detach command frees the connection resource initially. When the LB::reselect command initiates, it attempts to free the connection resource before it attempts to reselect another pool member.

     

     

     

     

    2) Are there context rules as to what commands/statements you can issue inside of events? Can I call HTTP:Redirect inside of LB events with impunity or should I only call HTTP:anything inside of CLIENT_CONNECT, HTTP_REQUEST, and HTTP_RESPONSE?

     

     

    CLIENT_ACCEPTED is triggered when the TCP connection has been established to the VS. The HTTP headers aren't parsed until HTTP_REQUEST. So you can't use any HTTP:: commands until HTTP_REQUEST. I'm not sure why you couldn't call HTTP::redirect from LB_SELECTED, but per the wiki page you can't. If this is actually correct, you could use LB::select in HTTP_REQUEST to make a load balancing selection and then use an HTTP:: command. You should be able to use HTTP::redirect from LB_FAILED.

     

     

    3) In the first rule I am using a pool of static html web servers for the maintenance window page...in the second it is using a redirect to an external URL with the page. Is one or the other preferred? Should I just code the maintenance window page (very simple short html page...few lines) into the irule itself using HTTP::response and avoid having to use new pools or new external URL's. Performance impact?

     

     

    There isn't much difference between using an HTTP redirect compared with selecting a new pool in terms of performance. The former would probably result in the client establishing a new TCP connection to the VS whereas selecting the sorry pool wouldn't. Also, selecting the sorry pool and not sending the client an HTTP redirect would mean the client doesn't see an update on the URI. If you go with the sorry pool option, you would want to make sure the sorry pool server sets appropriate caching headers to prevent the client, search engine spider, or intermediate proxy server from caching the sorry content.

     

     

    4) Is the various uses of "persist none" making sense below? Not necessary for HTTP:Redirect?

     

     

    It's not necessary, as there is no pool selection being made.

     

     

    Also, in the first rule, if you're always rewriting the URI to / then you can't have the root document reference any images, css files or other documents on the same VS, because the iRule will rewrite that request to /. You can get around this by putting the server content in a specific directory (like /maintenance/ and then only rewriting the URI to / if it doesn't already start with /maintenance/.

     

     

    It would be faster to serve the maintenance content from LTM itself. It eliminates the need to have specific servers designated to serve sorry content. But it's a bit more complicated to configure and update the set up on LTM compared with a standard web server. And the content you serve is stored in LTM memory and takes resources away from other functions.

     

     

    Aaron
  • Thanks for explaining that. Here is the modified rule

    Can you valdiate the use of the lb_retrys (not really retry but reselect) in this below iRule?

     
     when CLIENT_ACCEPTED { 
       set lb_retrys 0 
       set maint 0 
       set clip [IP::client_addr] 
       if { [LB::status pool pool_80 member $clip 80] eq “up” } { 
         pool pool_80 member $clip 80 
       } elseif { [active_members pool_80] >= 4 } { 
         LB::mode rr 
         LB::reselect pool pool_80 
       } else { 
         set maint 1 
       } 
     } 
      
     when HTTP_REQUEST { 
        if { $maint = 1 } { 
           HTTP::redirect http://maintenance_window_page 
        } 
     } 
      
     when LB_SELECTED { 
       if { lb_retries >= 1 } { 
         LB::mode rr 
         LB::reselect pool pool_80 
       } 
     } 
      
     when LB_FAILED { 
       if { $lb_retrys < [active_members pool_80] } { 
         persist none 
         LB::mode rr 
         LB::reselect pool pool_80 
       } else { 
         HTTP::redirect http://maintenance_window_page 
       } 
       incr lb_retrys 
     }