Web scraping tunning issue
Hi, I have questions about how to apply Web scraping and read the great article "https://devcentral.f5.com/articles/more-web-scraping-bot-detection" written by John Wagnon. I tried to implement this feature in my policy ASM where my homepage have 125 requests in the first access to the site.
I understood that the value of "Grace Interval" should be at least greater than my number of initial requests but ASM will test whether it is a robot (here's my problem) and punish future access configured in "Unsafe Interval". If you do not detect a robot will allow navigation without checking the next N requests that are configured in "Safe Interval" and returns the validation flow.
My problem is that ASM performs a POST to test interactivity and makes me lose navigational information about google analytics.
Well, my questions are: Counters consider request source IP or trusted XFF? This counter is for each client connection or globally? What is the ideal values for a site with the characteristics value of my homepage? What is the ideal number for the "Safe Interval" value? I think "2000" is just too much for my case, am I wrong? Can anyone help me?
Thank you very much!