Forum Discussion

werner_v_113449's avatar
werner_v_113449
Icon for Nimbostratus rankNimbostratus
Dec 21, 2015

Pending rule event HTTP_REQUEST aborted

Hi ,

We are currently experiencing some issues with an irule we are using . We got this irule through a consultant and installed it more than a year ago . It always worked good , but recently we got some complaint.

Irule is used for checking traffic coming in onto our reverse proxies . And if we see connections coming from same client and trying to access same uri , we check for some threshold . (max connection allowed are 50 conn. in 20 seconds timeframe ) This was done in order to have some protection due to behavior we experienced before.

Since a week we got complaints for people accessing a specific url via our reverse proxies.It was working for most people , but some clients couldn't access .After checking we remarked we were getting the "abort" messages for those client IP's who were complaining.

I'm not an expert in programming irules. I've seen from other articles , that this could be related to connections not being present in the table anymore . But I'm puzzled how connection could be dropped before event "http-request" is activating . And why we only have this behavior quite recently.

Does anybody has an idea what is causing this behavior in this irule?

Following is the irule :

 

when RULE_INIT {
 METRICS FOR ALL IP CLIENTS
set static::maxRate 50
set static::windowSecs 20

 METRICS FOR KNOWN PROXY ONLY
set static::maxRateProxy 200
set static::windowSecsProxy 20

 SET BLOCKING METHOD: DROP OR REJECT
 0=reject 1=drop
set static::blocking 0

 DEBUG FLAG: 0 off , 1 on
 log to /var/log/ltm 
 be aware that every get request will be logged! use only if needed
set static::ratelimit_debug 0
}



when HTTP_REQUEST {
 check which URI to apply rate limiting , no specific http method
if { [class match [string tolower [HTTP::uri]] starts_with [URI::basename [virtual name]]-ratelimit-uri] } {
  if { $static::ratelimit_debug > 0 } { log local0. "HTTP-RATE-LIMITING: vs=[virtual name] client_ip=[IP::client_addr] uri=[HTTP::uri]" }

     whitelist: do nothing
    if { [class match [IP::client_addr] equals [URI::basename [virtual name]]-ratelimit-whitelist] }{
       return
    }

     set variables for readability
    set limiter [string tolower [HTTP::uri]]
    set clientip_limitervar [IP::client_addr]:$limiter
    set get_count [table key -count -subtable $clientip_limitervar]  

     known proxy: apply proxy metrics 
    if { [class match [IP::client_addr] equals [URI::basename [virtual name]]-ratelimit-proxy] }{
       main condition
      if { $get_count < $static::maxRateProxy } {
        incr get_count 1
         table set -subtable $clientip_limitervar $get_count $clientip_limitervar indefinite $static::windowSecsProxy
      } else {
        log local0. "HTTP-RATE-LIMITING: vs=[virtual name] proxy=yes client_ip=[IP::client_addr] has exceeded the number of allowed requests maxrate=$static::maxRateProxy time_window=$static::windowSecsProxy uri=[HTTP::uri]"
        if { $static::blocking > 0 } { drop } else { reject }
        return
      }
    } else {
       any other clients: apply standard metrics
       main condition
      if { $get_count < $static::maxRate } {
        incr get_count 1
         table set -subtable $clientip_limitervar $get_count $clientip_limitervar indefinite $static::windowSecs
      } else {
        log local0. "HTTP-RATE-LIMITING: vs=[virtual name] client_ip=[IP::client_addr] has exceeded the number of allowed requests maxrate=$static::maxRate time_window=$static::windowSecs uri=[HTTP::uri]"
        if { $static::blocking > 0 } { drop } else { reject }
        return
      }
  }
}

 

}

 

7 Replies

  • Hi Werner,

    this message may ocour when you use iRules with commands that pauses/suspends TCL processing (e.g. your [table] commands). In your case the iRule may eventually become "suspended" till the answer is received.

    If the underlying TCP connection breaks for whatever reason in between (e.g. [table] query send, TCL suspended and then connection terminated), then this warning message is logged when TCL tries to continues the former suspended TCL code.

    The cause of this problem is not the [table] command itself. Its just that the TCP session is closing at the wrong moment and lets the [table] command complain. Most of the time this warning message could be savely ignored...

    Yeah i know, my answer doesn't solves your problem. But the error message is not the "reason" for the connection problems, its more or less just a "reaction" to it.

    Cheers, Kai

  • thanks for the answer. We're trying to simulate the problem & have tcpdump of what's happening . but it's not occurring that often .

     

    So the connection that is opened to the Virtual server should be closed quite quickly then ? Because we use the standard tcp profiles , so idle timeout isn't adapted (300sec) . This would mean that client or server is closing the connection quite immediately , otherwise connection would remain in connection table . and table command wouldn't complain , right ?

     

    greetings , werner

     

  • Hi Werner,

     

    since the HTTP_REQUEST event was complaining, I do strongly believe its a client side connection termination, because a server side connection isn't utilized yet.

     

    Regarding the usage of the default TCP Profile, you may take a look to this lovely article. https://devcentral.f5.com/articles/stop-using-the-base-tcp-profile

     

    BTW: Your posted iRule contains a logic, that could exhaust all of your available memory very quickly. Did you already experienced memory problems in the past or during those events?

     

    Cheers, Kai

     

  • Hi Kai ,

     

    we are using a vcmp guest on a viprion 2400 platform . The tmm memory is fine for me , it's remaining below 20% usage of allocated tmm memory & I don't see spikes in usage. I think irules use tmm memory for storing variables & tables , correct? I didn't check the OS assigned memory.

     

    We are removing the irule from our production environment . The irule is also present on our acceptance environment , so we are trying to simulate the issue over there .Traffic volume is rather high on production so I want to avoid doing traces there .

     

    I agree that client is probably dropping connection. Loadbalancer setup is used for distributing traffic across reverse proxies.Those reverse proxies provide access to different websites of our company.

     

    Issue of aborting connection is only seen for 1 url. (other urls are also accessed via reverse proxy and use same irule, but nothing seen there) That specific url is working for some people , but not for others . (50% of users are failing) I'm suspecting it's linked to client parameters ,but i'll need tcpdump for more info .

     

    greetings , werner

     

    • Kai_Wilke's avatar
      Kai_Wilke
      Icon for MVP rankMVP
      Your iRule is performing some sort of a "sliding-window rate limit" and stores the requested URIs string twice for each [table] record in TMMs memory. So if someone is sending lof of long but bogus URI strings, the available TMM memory gots exhausted very quick. But if memory wasn't a problem in the past, then keep it and cross your fingers that noboby would exploit this functionality soon... :-) BTW: To consume just the half amount of memory for each tracked URI (without changing the functionality at all) you may want to ask your consultant to change the code to... table set -subtable $clientip_limitervar $get_count "1" indefinite $static::windowSecsProxy BTW2: In addition your counter mechanism includes some flaws which results in an to unaccurate tracking of rate limits. In the end the lovely (but very memory intensive) "sliding window" machnic is degrated to a rather simple "interval based counter" mechanic (wich would consume much less memory, but does not track that accurate). To get the sliding windows mechanic fixed, you may want to ask your consultant to change the code to... incr get_count 1 (remove this line or use a comment to disable) table set -subtable $clientip_limitervar [clock clicks] "1" indefinite $static::windowSecsProxy For further information on your sliding window flaw: https://devcentral.f5.com/questions/sliding-window-irule-block-too-many-requests Cheers, Kai
  • Hi Kai,

     

    he left our company some time ago , so I'm trying to get up to date with irules myself :-) .Basic irules isn't really an issue but I'm new to many of these commands used.

     

    From what I see in this irule , a subtable is created for each combination of clientIP&uri. Entries in the subtable are mentioned with a lifetime of 20sec.Irule is counting the entries for each new HTTP_REQUEST.The timeout value ensures that they are forced out of the table , after the defined interval.

     

    From your first remark , I understand it has no value putting the clientIP & uri into the subtable as a value . (as subtables are created per clientIP & uri) .The command you gave , is just putting a value "1" into the table . So we're consuming less memory per subtable because we are storing less info . Correct ?

     

    The 2nd remark is basically to avoid using strict numeric as keys .But instead using "clock ticks" as key.As older entries get deleted , the count could remain the same and irule would start to overwrite existing entries in a subtable. (so risk of re-using same key several times. Clock ticks as key will avoid this)

     

    I'm going to make the changes on acceptance setup & add some logging over there for further checking . we'll deploy to production also , but it gives me the opportunity to get acquainted with this irule.

     

    We are still investigating the issue with the aborted connections. But it's more and more linked to specific client networks.(always same client IP's are getting aborted ) So looks like some specific client network setting is causing this. We're trying to identify a client & start testing with them.

     

    greetings , werner

     

    • Kai_Wilke's avatar
      Kai_Wilke
      Icon for MVP rankMVP
      You got my remarks 1 and 2 right. Its an easy to implement change, which would just correct certain things as they should be. Also I didn't noticed the rather small 20sec timeframe setting. So in this case the maximum amount of blocked memory is somewhat limited and would most likely not be enought to land your entire LTM. Seems save to me then... Good luck finding the the root cause of your problem. My gut feeling says, that fragmented TCP datagrams may be the root cause of the connection problems of this specific network (e.g. too large MSS values) :-)