Topics


Blogs


Forums


Samples


Media


Labs


Resources

Login | Register




Subscriptions: Video  |  Audio  |  Tutorials  |  Tech Tips  |  Features  |  More...
Docs & Tips

Current Articles | Categories | Search | Syndication

Using an iRule to Sort Out Spiders for Network Computing

posted @ Tuesday, February 06, 2007 4:42 PM by Joe - 3181 views


In their most recent L4-L7 product review, Network Computing decided to base their testing off of the requirement of a real-world IT shop – CMP Media (Click here to see the results). This review demonstrated the importance of product flexibility and more specifically, the power behind F5’s Programming Language - iRules.

The Challenge


Like many communication companies, part of CMP Media’s revenue is generated from ads hosted on their site. In order to provide accurate counts for ad impressions and click throughs, IT must filter out the illegitimate traffic like spiders and robots that hit a site to index it’s content.


These sources present an interesting challenge for many shops that must serve all content but identify and track real users separately from spiders/robots. Often, it’s not acceptable to simply block certain source IP addresses, because much of this traffic helps get your content listed in search engines like Google.


At CMP, this requirement for advanced user based routing is made even more complicated in their real world deployment where a company hosts multiple sites on the same IP address. Many organizations do the same – serving several websites which are all virtualized by a single address.


In order to meet these requirements, a solution has to first effectively identify if the user is a spider/robot and then identify the site being requested.


Configuration Can Get Messy Quickly


This problem can be visualized by thinking of 5 websites being hosted on a single IP address. For each site there are two possible destinations: the real content which is being tracked for billing purposes and spider/robot content. This means there is essentially twice the number of websites. So, if you had 5 real websites, you’re traffic management device would have to treat it as 10 websites, causing a replication of pools, nodes and many other pieces of the configuration that basically double the administration for any site.


Sorting Out the Traffic with a 9-Line iRule


During the review, Network Computing leveraged F5 resources and DevCentral to create an iRule in just 20 minutes which accomplished the required tasks with far greater simplicity. In addition to the speed of development and performance of the box, the real testimony from our perspective is the simplicity of the iRule and the fact that BIG-IP allowed the customer to forgo the complexities and costs of redundant configuration to meet their objective.

rule nwc_robot_routing_rule {
when HTTP_REQUEST {
if { [HTTP::header User-Agent] == "" } {
if { [matchclass [IP::remote_addr] equals $::blacklisted_clients] } {
pool spider_[HTTP::header Host]
}
}
elseif { [matchclass [HTTP::header User-Agent] contains $::blacklisted_useragents] } {
pool spider_[HTTP::header Host]
}
elseif { [string first -nocase "bot" [HTTP::header User-Agent]] >= 0 } {
pool spider_[HTTP::header Host]
}
else {
pool pool_[HTTP::header Host]
}
}
}

Step by Step Explanation of the NC iRule:


This is a rule that separates business logic from configuration.


Walking through the rule line-by-line, we have this:

when HTTP_REQUEST {

This signals that this rule applies to HTTP requests. Rules can be applied to various “events”, such as HTTP requests, responses, TCP data, connection establishment, et cetera.

if { [HTTP::header User-Agent] == "" } {

This checks to see if the HTTP user-agent header exists. If this header does not exist, then we execute this line of the rule:

if { [matchclass [IP::remote_addr] equals $::blacklisted_clients] } {

The iRule simply checks the client’s IP address against a list of known-bad clients that are stored in a list named “blacklisted_clients”. If the client does not present an HTTP user-agent header, and the client’s IP address does match one of these known-bad IP address, then we run the next line of the rule:

pool spider_[HTTP::header Host]

This says that we’re going to use a set of a servers (a “pool” of servers) named “spider_

Next, if the user did have a HTTP user-agent, it’s checked by this line:

elseif { [matchclass [HTTP::header User-Agent] contains $::blacklisted_useragents] } {

The iRule simply checks to see if the HTTP user-agent matches one of the user-agents in the list named “blacklisted_useragents”.

If the user-agent does match, the user is sent to this line of the rule, which does the same thing as the earlier appearance of this line:

pool spider_[HTTP::header Host]

Next, if the HTTP user-agent does exist, but didn’t match the previous check to see if it was a blacklisted-useragent, then this line:

elseif { [string first -nocase "bot" [HTTP::header User-Agent]]

This checks to see if the user-agent contains the string “bot” anywhere in the user-agent string.


If it does, the user is sent to the familiar rule line:

pool spider_[HTTP::header Host]

If none of the previous checks turned out to be true, then the user is deemed legitimate, and they are sent to this final line of the rule:

pool pool_[HTTP::header Host]

This sends the user to a pool named pool_


Learn More About the Network Computing Review


You can learn more about the specific review by visiting: http://www.f5.com/communication/articles/2005/article021105.html


More about DevCentral


We welcome you to explore our site further. DevCentral was created as a community for F5 customers and partners to learn and share the iRules that provide valuable solutions.

Email This   Bookmark and Share

Previous Page | Next Page

COMMENTS

Only registered users may post comments.

Essentials

Features

 Videos

 Audio
v10.1 - Configuring GTM's DNS Security Extensions
v.10 - Remote Authorization via TACACS+
v.10 - New class features in iRules
v.10 - iRules and the after command
v.10 - FastHTTP and Cookie Persistence
v.10 - A new iRules Namespace
Unbind your LDAP servers with iRules
Ten Steps to iRules Optimization
Ruby Meets iControl: Switching Policies
Ruby meets iControl: Making Wide IPs
Ruby meets iControl: Creating VIPs
Replacing the WebSphere Apache Plugin with iRules
Persisting SSL Connections
Managing The System Boot Location with iControl
iRules Event Order
iRules 101 - #15 - TCL List Handling Commands
iRules 101 - #14 - TCL String Commands Part 2
Investigating the LTM TCP Profile: Windows & Buffers
Investigating the LTM TCP Profile: The Finish Line
Investigating the LTM TCP Profile: Nagle’s Algorithm
Investigating the LTM TCP Profile: ECN & LTR
Investigating the LTM TCP Profile: Congestion Control Algorithms
Investigating the LTM TCP Profile: Acknowledgements
iControl Apps - #18 - Virtual Server Reverse Lookup
iControl Apps - #14 - Global Statistics
iControl Apps - #13 - System PVA Statistics
iControl Apps - #12 - Global SSL Statistics
iControl Apps - #11 - Global GTM Statistics
iControl Apps - #10 - Bigpipe List
iControl Apps - #09 - TMM Statistics
iControl Apps - #08 - System IP Statistics
iControl Apps - #07 - System Http Statistics
iControl Apps - #06 - Configuration Archiving
iControl Apps - #05 - Rate Based Statistics
iControl Apps - #04 - Graceful Server Shutdown
iControl Apps - #03 - Local Traffic Map
iControl 101 - #22 - GTM Data Centers
iControl 101 - #21 - Rate Classes
iControl 101 - #20 - Port Lockdown
iControl 101 - #19 - Time Conversions
Getting Started with pyControl
FTPS Offload via iRules
Exchange Persistence Duality and iRules
Custom SNMP Traps
Creating An iControl PowerShell Monitoring Dashboard With Google Charts
Cookie LoJack vi iRules
Concurrent iControl Programming Explained
Can iRules fix my cert mismatch errors?
Cache in with LTM and iRules
Investigating the LTM TCP Profile: Max Syn Retransmissions & Idle Timeout


Quick Start Guides

Tutorials

iControl

iRules

Monitoring & Management

Advanced Design & Config

DC4
ASM       Best Practices       BIG-IP       cacti       cookie       DNS       FirePass       http redirect       https       iControl       iRule Editor       iRules       LB_FAILED       log       matchclass       monitor       persist       persistence       pool       PowerShell       proxy       radius       redirect       SIP       SNAT       SNMP       SSL       stream       switch       syslog       wiki       X-Forwarded-For