Applying different policies for authorized web scrapers
I'm working on a method for dealing with external web scrapers in my organization. Some web scrapers are allowed, some aren't. My task is to define rules to authorize the "good" bots and block the rest, without impacting normal user traffic.
I'm thinking about implementing a local traffic policy iRule that routes "good" bots to one of two security policies specific for those bots with increased rate ceilings, while still using a default security policy to catch the unauthorized bots. Basically I'm trying to convince the ASM that the good bots AREN'T bots, as long as the bot handlers throttle themselves to stay under the rate ceiling. I think this is possible by turning off Bot Detection and tuning the Session Opening and/or Session Transactions Anomaly settings. Does this sound right?
The main thing I haven't figured out is whether this setup will allow me to ONLY apply the security policy I want to the specified "good" bots, while still applying the default security policy with more standard anti-scraping settings. Will this iRule accomplish this? Also, does it look like it would be really performance intensive for the ASM?
Portions of code shamelessly stolen from others on this site. Any suggestions or criticism are very welcome.
when HTTP_REQUEST {
set start and end time
set start_time "20:00"
set end_time "05:30"
convert start/end times to seconds from the epoch for easier date comparisons
set start [clock scan $start_time]
set end [clock scan $end_time]
get the current time in seconds since the Unix epoch of 0-0-1970
set now [clock seconds]
only do the next section if it's an authorized bot, otherwise the request should go to the default security policy
authorized_bots is a data group of addresses known to belong to authorized bot handlers
not relying on ASM_REQUEST_VIOLATION or ASM_REQUEST_DONE to decide if it's a bot - just going by IP
if { [class match [IP::client_addr] equals authorized_bots] } {
currently outside business hours?
if {$now > $start and $now < $end} {
check if bot is scraping the app it's authorized for
if it's authorized to scrape that app, send to the low volume security policy
have to check for the app's URI path as well as its dependencies that aren't under the app root dir
if { ([HTTP::uri] starts_with "/app") or ([HTTP::uri] starts_with "/dependency1") or ([HTTP::uri] starts_with "/dependency2") } {
use the security policy with a higher rate ceiling for bot detection
ASM::enable /Common/auth_scrape_high_volume
} else { the URI doesn't match - the bot isn't authorized for this URI
drop
}
} else { if we get here, it's currently within business hours
check if bot is scraping the app it's authorized for
if it's authorized to scrape that app, send to the high volume security policy
if { ([HTTP::uri] starts_with "/app") or ([HTTP::uri] starts_with "/dependency1") or ([HTTP::uri] starts_with "/dependency2") } {
use the security policy with a lower rate ceiling for bot detection
ASM::enable /Common/auth_scrape_low_volume
} else { the URI doesn't match - the bot isn't authorized for this URI
drop
}
}
}
}