Forum Discussion

mtobkes_64700's avatar
Icon for Nimbostratus rankNimbostratus
May 12, 2010

Rate-Limiting Crawlers

Hi I found this iRule here that will limit requests to 1 request per n seconds. I would like to know how I'd be able to allow n requests per 1 second, e.g. allow 5 requests per 1 second.



when RULE_INIT {


array set ::active_crawlers { }


set ::min_interval 1


set ::rate_limit_message "You've been rate limited for sending more than 1 request every $::min_interval seconds."






set user_agent [string tolower [HTTP::header "User-Agent"]]


if { [matchclass $user_agent contains $::Crawlers] } {


Throttle crawlers.


set curr_time [clock seconds]


if { [info exists ::active_crawlers($user_agent)] } {


if { [ $::active_crawlers($user_agent) < $curr_time ] } {


set ::active_crawlers($user_agent) [expr {$curr_time + $::min_interval}]


} else {


block it somehow


HTTP::respond 503 content $::rate_limit_message }


} else {


set ::active_crawlers($user_agent) [expr {$curr_time + $::min_interval}]













7 Replies

  • For cleaner, more accurate rate-limiting, check out the table command article series that covers this in depth:


  Click Here
  • Thanks for the link. However I'm only running v9.4.7. Can you tell me what options that leaves me?



    Thanks again,


  • Check this version of the dns flood protection rule, the bones of the rate limiting are there:


  Click Here



  • I've modified the iRule I found to limit crawlers. I want to allow ::max_req_count for every ::min_interval, but I am getting a TCL error in my logs. Was wondering if someone can help me figure out what the problem is. The error I'm getting is:



    TCL error: googlebot_rate-limit_vb5 HTTP_REQUEST - invalid command name ::active_crawlersmozilla/4.0 compatible msie 7.0 windows nt 5.1 gtb6.4 .net clr 1.1.4322 .net clr 2.0.50727 .net clr 3.0.4506.2152 .net clr 3.5.30729 while executing ::active_crawlers$user_agent $curr_time




    when RULE_INIT {


    array set ::active_crawlers { }


    min_interval is the minimum amount of seconds


    set ::min_interval 10


    max_req_count variable is the maximum amount of request per min_interval


    set ::max_req_count 3


    set ::rate_limit_message "You've been rate limited for sending more than $::max_req_count request every $::min_interval seconds."





    when HTTP_REQUEST {


    set user_agent [string tolower [HTTP::header "User-Agent"]]


    remove below log when we go to production


    log local0. "user agent is $user_agent"


    if { [matchclass $user_agent contains $::Crawlers] } {


    Throttle crawlers.


    remove below log when we go to production


    log local0. "user agent matches $user_agent"


    set curr_time [clock seconds]


    if { [info exists ::active_crawlers($user_agent)] } {


    remove below log when we go to production


    log local0. "passed active Crawlers"


    if { [ ::active_crawlers($user_agent) < $curr_time ] } {


    set ::active_crawlers($user_agent) [expr {$curr_time + $::min_interval}]


    set reqcount 1


    remove below log when we go to production


    log local0. "passed set active crawlers"


    } else {


    if { [$reqcount > $::max_req_count] } {


    allow 10 request then block


    HTTP::respond 503 content $::rate_limit_message


    log when crawler hits more than 10 requests and block it


    log local0. "Rate Limit Has Reached $::max_req_count Requests Per $min_interval for $user_agent"


    } else {


    reqcount keeps track of request


    set reqcount [expr {$reqcount + 1}]






    } else {


    set ::active_crawlers($user_agent) [expr {$curr_time + $::min_interval}]


    set reqcount 1












  • Hi myles,



    Try changing this line:



    if { [ ::active_crawlers($user_agent) < $curr_time ] } {






    if { $::active_crawlers($user_agent) < $curr_time } {



  • Thanks Aaron. I changed the line however I now get this TCL error in my logs:



    TCL error: googlebot_rate-limit_vb5 HTTP_REQUEST - invalid command name 1273758282 while executing $::active_crawlers$user_agent $curr_time






  • Do you still have the parentheses around $user_agent and the less than sign in this line?



    if { $::active_crawlers($user_agent) < $curr_time } {



    Can you post a current copy of the iRule and the exact error message from /var/log/ltm?



    Thanks, Aaron