Forum Discussion

winifred_corbet's avatar
winifred_corbet
Icon for Nimbostratus rankNimbostratus
Aug 10, 2010

How to block Googlebot/Nutch-1.0 Bot

We have this Bot that is killing us:

 

Googlebot/Nutch-1.0 (Prototype; http://en.wikipedia.org/wiki/Web_crawler; donotreply at prototype dot com)

 

 

We would like to block it completely. I see there are irules to redirect Bots, but is there a simple one that can block this all together?

 

1 Reply

  • Hi Winifred,

    Sure, you can use a simple iRule to send a TCP reset if the user agent header contains that string:

    when HTTP_REQUEST {
    
        Check the UA header value, set to lower case
       switch -glob [string tolower [HTTP::header User-Agent]] {
          "*googlebot/nutch*" {
              Bad UA, send a TCP reset
             reject
          }
       }
    }
    

    If you know the IP address(es) they typically make a request from you could do this more efficiently by adding the IPs to an address datagroup and then using the matchclass (v9) or class (v10) commands in CLIENT_ACCEPTED. This would avoid checking every HTTP request for the User-Agent header value.

    http://devcentral.f5.com/wiki/default.aspx/iRules/class

    http://devcentral.f5.com/wiki/default.aspx/iRules/matchclass

    Aaron