We all know that web site hits is a pretty useless statistic. First you have to filter out bots, which isn't always as easy as it would seem to be, then you have a pile of "this page loaded" with little relevant information about the users' path, time on site, etc.

Interestingly, there are far too many sites out there that still publicly list hits as a "valid" statistic. Well over at The Register, they've got a short little article about the UK's ASA ruling that hits was not a valid advertising statistic.

Problem is, if you filter out bots, most sights that are relatively frequently updated have much less traffic than they think. In some cases half the traffic they think.

Which brings us to the point. While we'd love to sell BIG-IPs to everyone, don't let bots drag your site down. They're getting pages just like regular users, and they're costing you CPU and network resources. If your site is on the edge of overload and you're starting to look for solutions in products like LTM, buy yourself some time, check the bots and make sure that robots.txt limits them to just what you think is needed. While most bots are well behaved, there are some - one from a prestigious university in the US comes to mind - that don't honor robots.txt. I guess I'd recommend blocking them from crawling your site at all if you're on the edge.

Just remember that if yslurp and googlebot can't get to your pages, you won't be in search results, but that doesn't necessarily mean that every single page of your site needs indexing. the subdirectory with FAQs and such in it often doesn't need to be crawled, so check your site and save some resources by blocking those types of paths.

Then come back. We'll be here with load balancing and optimization solutions that will solve the problem today and offer you options for the future.

Don.

Share this post :

/reading: US News and World Report Collectors' Edition - Secrets of the Civil War