Forum Discussion

Dianna_129659's avatar
Dianna_129659
Icon for Nimbostratus rankNimbostratus
Sep 23, 2013

How can I see if Google is hitting us?

We are experiencing many problems with Google hitting us really hard. Our Tomcat log indicates this, and it is almost crashing our ecommerce website. Is there a way that I can check F5 to see when we are being hit by Google, and then block that IP address? I am looking at the Event Application Request logs, and do not even see that we are being hit by Google, nor does the web scraping indicate that we are being hit. I appreciate any suggestions, ideas or experience. Many thanks, Dianna

 

8 Replies

  • Afaik, the google bot can be identified by the user agent.

     

    But this can be faked. And sometimes the google bot does not identify itself to make sure you are not tuning your site for google.

     

    How to proceed?

     

    First thing might be a log analysis of your tomcat logs to figure out, what these requests have in common. Perhaps it´s always the same source IP, same user agent, same path, whatever ...

     

    Based on this, an iRule can be applied to limit the number of requests based on specific criterias. There are some table-based sample iRules for request rate limitation on DC.

     

    Just dropping all google requests may have some negative business impact ...

     

  • Hi Stephan. Thank you for this thoughtful reply. We recognize the negative impact of blocking google, but truly are having the webstore crash. I did not know that I could use iRules to limit google. That sounds like a good potential solution. Thank you!

     

  • Here is a good link for implementing irules for bots.

     

    https://devcentral.f5.com/wiki/iRules.Controlling-Bots.ashx

     

  • You could try rate limiting Google requests based on the user-agent header value, but I think as Steve Iveson pointed out on the codeshare example, it may affect your Google site ranking.

     

    It might be better to send a 503 response from the iRule when you want to block access to a search engine spider:

     

    http://googlewebmastercentral.blogspot.com/2011/01/how-to-deal-with-planned-site-downtime.html

     

    Outages that are not clearly marked as such can negatively affect a site’s reputation. ... it’s better to return a 503 HTTP result code (Service Unavailable) which tells search engine crawlers that the downtime is temporary.

     

    Aaron

     

  • Thank you, Aaron. There is much to consider, with the website eCommerce remaining open to our customers being the most important. Still, we don't want to cause negative google ratings, etc. I appreciate the knowledge being shared. Many thanks, Dianna

     

    • boneyard's avatar
      boneyard
      Icon for MVP rankMVP
      hopefully this works out for you. but when things calm down i would rexamine what is going on because i really doubt google is in the business of crashing sites.
  • Yesterday someone posted information about slowing down google instead of completely blocking with an iRule. I thought there was sample code to throttle user-agents. Do you know where I can find that sample code, please?