Forum Discussion

nrelihan_68143's avatar
nrelihan_68143
Icon for Nimbostratus rankNimbostratus
Jul 21, 2011

How to omit bots & crawlers from an iRule?

Hey!

 

 

I've written a Geo location iRule that redirects users to their regional websites if they try to go to a domain website, www.test.com.

 

So for instance if a visitor from the UK goes to www.test.com, they are redirected to www.test.co.uk.

 

Instead of using DNS lookup function to find a visitors region i use their local IP address to do so, new function of F5

 

So heres the thing i want to do:

 

 

It’s a requirement that when visitors from outside a region covered by the iRule (IP address cannont determine where to direct visitors), the visitor must be prompted to select a region. The region selector page will be like a splash page on the internet. Bots and crawlers are not to be redirected and will not see the splash page.

 

 

The first part of policy 3 should be OK , as I just need to redirect the non-regional visitors to a region selector page, however is there a way to omit bots and crawlers from falling under the iRule?

 

 

Thanks for any help.

 

 

Neil

 

10 Replies

  • Ok this is like something im looking for:

     

     

    http://devcentral.f5.com/wiki/iRules.ControllingBots.ashx

     

     

    so Ill have to get all the bots & crawlers from the database below and insert it into the code.... its going to be a long list!

     

    Any ideas if its possible to use a lookup table in iRules instead?

     

     

    http://www.user-agents.org/index.shtml
  • Hi Neil,

     

    You don't need to insert into the code per say. In the same URL http://devcentral.f5.com/wiki/iRules.ControllingBots.ashx you can use datagroups which can be added in as you the list grows without altering and growing the irule itself.

     

     

    Bhattman

     

     

     

  • Hey Bhattman,

     

     

    Thanks for your response, I have two questions:

     

    You mention datagroups to store the list of bots/crawlers, will this datagroup be in the same place as the iRule or where can it be stored?

     

     

    Do I need to include stars (*) before and after the bot/crawler strings in the list?

     

     

    Thanks,

     

    Neil
  • Data Groups and Classes are interchangeable (which can get confusing until you get use to it).

     

     

    Wiki Link for "Class" Command:

     

    http://devcentral.f5.com/wiki/iRules.class.ashx

     

     

    Wiki Link for "Data Group" Formatting:

     

    http://devcentral.f5.com/Tutorials/TechTips/tabid/63/articleType/ArticleView/articleId/1086448/iRules-Data-Group-Formatting-Rules.aspx

     

     

    Data Groups can exist within the BIG-IP Config file or can be placed on the local file system of the BIG-IP.

     

     

    Entries in a Data Group are absolute, so you may or may not need wild cards depending on how you configure your Data Group.

     

     

    Hope these help.
  • Thanks for the help Michael,

     

     

    So yes as my group is quite large I think ill create an external data group. So is the following correct?

     

     

    /config/bigip.conf

     

    class bots {

     

    type string

     

    filename "/var/class/bots.class"

     

    }

     

     

    /var/class/bots.class

     

    "bots1"

     

    "bots2"

     

    "bots3"

     

     

    In the irule i can just call this datagroup "bots"?

     

    if { [matchclass [string tolower [HTTP::header User-Agent]] contains $::bots] } {

     

    ...

     

     

    thanks,

     

     

    Neil
  • I'm almost afraid to post this one for fear of being laughed off of here, but you mentioned that you wanted to use the content from user-agents.org to seed the user-agent determination. There's actually another service called useragentstring.com that offers a simple user agent lookup service. I wrote this iRule a long while back to play with HTTP::retry but never did anything with it. It _does_ work, but it's a piece of... work.

    Fun to look at though.

    It takes an incoming web request, formulates an HTTP request to the UA lookup service, gets the response, and puts it into both cookies and an in-memory array (yeah, an array -- old TCL habits die hard -- if I had to do it again, I'd use tables). It also allows through only clients whose user-agents are recognized as a "Browser" type.

    Enjoy!

    
    when RULE_INIT {
       set static::browser_id_cookie_name "x_browser"
       set static::browser_id_header_name "X-Browser-Characteristics"      
    }
    
    when CLIENT_ACCEPTED {
    
     Set initial session variables to allow us to track where we are
     in the validation of the incoming user-agent string.
    
    set do_lookup 1
    set browsercookie 0   
    set content_collected 0
    set done_retrying 0   
    }
    
    when HTTP_REQUEST {
    
     Save the original pool and hostname. We will need these later to
     rebuild our original request.
    
    set original_pool [LB::server pool]  
    set original_host [HTTP::host]  
    
     First check to see if we: 
      1. Do NOT have a cookie set that indicates we've been through validation before.
      2. Are not in a user-agent validation retry.
      3. Have no HTTP content collected from a previous run.
      4. Are just inside a CLIENT_ACCEPTED event where we are expected to do a lookup.
     If these conditions are satisfied, we lookup the user-agent validation service IP
     address, save the current request, then construct an outgoing HTTP::request to the
     service IP.
    
     To do this, we use the existing request as a base, but use HTTP::header sanitize to
     strip out most data, then put back in the headers needed to connect and close.
    
    if { (! $done_retrying) && (! $content_collected) && ($do_lookup) && (! [HTTP::cookie exists $static::browser_id_cookie_name]) } {
    set original_request [HTTP::request]
    set uas_lookup_node [RESOLV::lookup @4.2.2.2 "www.useragentstring.com"]
    node $uas_lookup_node 80
    HTTP::uri "/?uas=[URI::encode [HTTP::header User-Agent]]&getText=all"
    HTTP::header sanitize "Accept-Encoding Connection Cookie Keep-Alive"
    HTTP::header replace Host "www.useragentstring.com"
    HTTP::header insert Connection "close"
    
     If we have a browser cookie set, we forgo the outgoing lookup and keep marching.
    
    } elseif { ([HTTP::cookie exists $static::browser_id_cookie_name]) } {
    set browsercookie 1  
    set do_lookup 0
    }
    }
    
    when HTTP_RESPONSE {
    
     If we're in a user-agent lookup loop, have no content currently collected, and
     have not yet sent our HTTP::retry, then collect the HTTP response from the
     UA lookup service.
    
    if {($do_lookup) && (! $content_collected) && (! $done_retrying) }{
    if {[HTTP::header exists Content-Length] && ([HTTP::header Content-Length] < 2048)} {
    set con_length [HTTP::header Content-Length]
    } else {
    set con_length 2048
    }
    HTTP::collect $con_length
    set content_collected 1
      }
    
     If the current connection has a charateristics array and doesn't have a user-agent
     type that's a browser, then deny access. This array is either seeded directly from
     the UA verification response or is derived from the cookies we set.
    
     This will block ALL but browser clients -- including robots, crawlers, etc.
    
    if { ([array exists browser_characteristics]) && ($browser_characteristics(agent_type) ne "Browser") } {
    HTTP::respond 403 content "Not Allowed403 - Not allowedYour browser type is not allowed here."
    }
    
     If we have a characteristics array but no cookie, formulate one and place it in
     the outgoing response.
     
     This example iRule inserts the common things derived from the UA check in multiple cookies
     named after their datafields. It doesn't have to do this, but might be helpful because the
     web application can get the benefit of the learned information. If you don't need this or
     like it, then comment out the foreach loop.
    
    if { (! $browsercookie) && ([array exists browser_characteristics]) } {
    HTTP::cookie insert name ${static::browser_id_cookie_name} value "1" domain .$original_host path /  
    foreach item [array get browser_characteristics] {
    switch $item {
    "os_type" -
    "agent_type" -
    "agent_name" -
    "agent_version" -
    "os_name" -
    "agent_language" {
    HTTP::cookie insert name ${static::browser_id_cookie_name}_$item value "$browser_characteristics($item)" domain .$original_host path /
    }
    }
    }
    set browsercookie 1
    }
    }
    
    when HTTP_RESPONSE_DATA {
    
     Determine if the response we just got was from the user-agent service. If
     it was, then we're going to parse it into an array and then replay the original
     HTTP request to the original pool.
    
    if { ($do_lookup) && ($content_collected) && (! $done_retrying) } {
    if { [HTTP::payload] contains "agent_type" } {
    set parse_payload "[split [string replace [lindex [split [HTTP::payload] "\n"] 1] end end] ";"]"
    set browser_array_list ""
    foreach record $parse_payload {
    if { ($record ne "") } {
    set record [split $record "="]
    set rtype [lindex $record 0]
    set rvalue [lindex $record 1]
    if {  ( $rvalue ne "") && ([string tolower $rvalue] ne "null") } {
    set browser_array_list "$browser_array_list{$rtype} {$rvalue} "
    }
    }
    }
    array set browser_characteristics $browser_array_list
    }
    HTTP::payload replace 0 [HTTP::payload length] ""
    set do_lookup 0
    set content_collected 0
    pool $original_pool
    HTTP::retry $original_request
    set done_retrying 1
    } 
    }
     
  • Try clicking edit and save to have the ampersands re-rendered correctly.

     

     

    Aaron
  • Thats an impressive piece of code Joel, but I dont think i'll be implementing it this time, Perhaps when i become more experienced with iRules, I'll look into doing something similar, thanks anyway. Its a pretty good idea though!

     

     

    With regards to my code above, I tried implementing it in my F5, but it didnt work, I found when i edited the bigip.conf file with the class info, it got deleted shortly afterwards, not sure what deletes it though... so in the logs I can see the iRule cant find the "bots" group.

     

     

    Any ideas?