Forum Discussion

cjunior_138458's avatar
cjunior_138458
Icon for Altostratus rankAltostratus
Jul 24, 2014

Web scraping tunning issue

Hi, I have questions about how to apply Web scraping and read the great article "https://devcentral.f5.com/articles/more-web-scraping-bot-detection" written by John Wagnon. I tried to implement this feature in my policy ASM where my homepage have 125 requests in the first access to the site.

 

I understood that the value of "Grace Interval" should be at least greater than my number of initial requests but ASM will test whether it is a robot (here's my problem) and punish future access configured in "Unsafe Interval". If you do not detect a robot will allow navigation without checking the next N requests that are configured in "Safe Interval" and returns the validation flow.

 

My problem is that ASM performs a POST to test interactivity and makes me lose navigational information about google analytics.

 

Well, my questions are: Counters consider request source IP or trusted XFF? This counter is for each client connection or globally? What is the ideal values for a site with the characteristics value of my homepage? What is the ideal number for the "Safe Interval" value? I think "2000" is just too much for my case, am I wrong? Can anyone help me?

 

Thank you very much!

 

1 Reply

  • It really varies by use. I don't think anyone can give you optimal settings. You might want to talk to your FSE or possibly your account manager and see if you can get some guidance on optimizing ASM for your environment. It is possible this may involve a PS engagement, but if you're just looking to have bot detection optimized the costs should be minimal. They will need to look at the traffic flow for your website to answer these questions.

     

    Alternately you can experiment with different values and see which ones get you the result you need.