Forum Discussion

PowerShellDon_1's avatar
PowerShellDon_1
Icon for Nimbostratus rankNimbostratus
Jan 07, 2016
Solved

Geo IP + Search Engine Crawlers

Hi all

 

I recently implemented Geo IP redirecting for our site to various different language and locale versions of our corporate site. All worked well with some assistance from DevCentral.

 

A few weeks later we noticed Google was listing our US site rather than UK (where we are based) in all search results. Obviously this is because Google's crawler was coming out of the US and being directed to oursite.com/usa/ So i turned off the redirect while i could come up with a solution.

 

My thoughts are to read User-Agent strings and search for bot thus skip the redirect, else follow the iRule. According to: http://www.useragentstring.com/pages/Crawlerlist/ , bot should be sufficient for the majority of search engines.

 

My question is - is this the right way to handle this? How do you do it? Will i be introducing too much of an overhead by searching for a User-Agent string for each HTTP Request?

 

Thanks

 

  • It all depends on the volume of traffic you have generally. Irules impose some overhead, but if you're not maxing out your device, then this should not be a problem. You should consider doing this using Local Traffic policies if you can express the logic, as those are more efficient than irules.

     

    In any case, user-agent strings can be easily forged, so as long as you're not doing anything security-critical, then it sounds reasonable to key off this. If you're doing anything that relies on the integrity of this header, then you might want to work out a way to pair the user-agent strings with valid source IP addresses obtained via DNS, but this adds the challenge of managing the IP to user-agent mappings and keeping them up to date.

     

    My two-pence.

     

2 Replies

  • BinaryCanary_19's avatar
    BinaryCanary_19
    Historic F5 Account

    It all depends on the volume of traffic you have generally. Irules impose some overhead, but if you're not maxing out your device, then this should not be a problem. You should consider doing this using Local Traffic policies if you can express the logic, as those are more efficient than irules.

     

    In any case, user-agent strings can be easily forged, so as long as you're not doing anything security-critical, then it sounds reasonable to key off this. If you're doing anything that relies on the integrity of this header, then you might want to work out a way to pair the user-agent strings with valid source IP addresses obtained via DNS, but this adds the challenge of managing the IP to user-agent mappings and keeping them up to date.

     

    My two-pence.

     

  • Thanks for the quick reply. We have beefy Big-IPs (can't remember the models) and they run at about 10% so i think we're fine on performance, great.

     

    Nothing related to security at all... its our main customer facing site, not transactional. Basically wanting to ensure Google doesn't list our minor country specific versions of our sites as the main result when you google us.

     

    Cheers!