Forum Discussion

Dave_Burnett_20's avatar
Dave_Burnett_20
Icon for Nimbostratus rankNimbostratus
Nov 10, 2008

How to allow Search Engine Robots/Slurps through ASM?

We have recently installed a pair of F56400s (v9.4.3) in front of our website with ASM in blocking mode.

 

 

We are seeing and blocking loads of Non-RFC compliant request violations. Examination of these violation entries reveals them to be predominantly Yahoo robots.

 

 

As RFC compliance checking is a standard feature of the ASM policy (which we have not changed in any way) I would have thought that anyone with an F5 using ASM will be blocking these robots, unless they have Non RFC blocking turned off.

 

 

Is this indeed the case? Are other users experiencing the same issues? Does anyone know how we can allow search engine robots access to our site through the ASM as blcoking them could impact on our website search rating?

 

 

Would be grateful for any adivce or pointers.

 

13 Replies

  • Ido_Breger_3805's avatar
    Ido_Breger_3805
    Historic F5 Account
    I think that the Yahoo robots can be identified by their IP address or range of IPs, (they probably share that info with the webmasters community for business related reporting ) you could route with iRule traffic which is "sourced" from these IPs to a different class which has these checks (or all checks) turned off.

     

    I would want to believe that the chance that attack will be coming from Yahoo servers is very low.
  • In addition to what Brailsford might have to add...

     

     

    There is a check for 'Several Content-Length headers'. So if the request smuggling attack depends on more than one Content-Length headers, it should be blocked with that check.

     

     

    You could try to contact Yahoo and ask why they include the LLF-Cache-Control header in their requests. I couldn't find any reference to it in any RFC or other document. I assume your web server would ignore it whether there was a value set or not.

     

     

    You could also use an iRule to rewrite the LLF-Cache-Control header to a static value for requests with a Yahoo search string in the User-Agent field. You could also remove it altogether from all requests. This seems like a bit of unnecessary overhead though if ASM can protect against the attack with other checks.

     

     

    Aaron