Forum Discussion
Deb_Allen_18
Nov 30, 2007Historic F5 Account
I think you mostly just need to move the release outside of the loop, otherwise you will always be releasing on the first match.
I think I'd also make a few adjustments to your loop variables to extract the URI string to a variable, then flip-flop your matchclass comparison to look for the host in the URI string and negate it to eliminate the empty "if" body (make sure I got that logic right, because I wasn't really clear which condition should result in the masked URI string),
Also for regex operations, you have to limit collection to a 1MB max payload, so I'd also modify the code as below to set the max collection size even if the header value is larger.
Here is an adjusted iRule with those changes:
when HTTP_REQUEST {
Don't allow data to be chunked
if { [HTTP::version] eq "1.1" } {
if { [HTTP::header is_keepalive] } {
HTTP::header replace "Connection" "Keep-Alive"
}
HTTP::version "1.0"
}
}
when HTTP_RESPONSE {
Only check responses that are a text content type
(text/html, text/xml, text/plain, etc).
if { [HTTP::header "Content-Type"] starts_with "text/" } {
Get the content length so we can request the data to be
processed in the HTTP_RESPONSE_DATA event.
if { [HTTP::header exists "Content-Length"] && [HTTP::header "Content-Length"] < 1048577 } {
set content_length [HTTP::header "Content-Length"]
} else {
set content_length 1048576
}
log local0.info "Content Length: $content_length"
if { $content_length > 0 } {
HTTP::collect $content_length
}
}
}
when HTTP_RESPONSE_DATA {
Find ALL the possible URLs in one pass
log local0.info "Time for some regex action baby"
set url_indices [regexp -all -inline -indices {^((http[s]?):\/)?\/?([^:\/\s]+)((\/\w+)*\/)([\w\-\.]+[^?\s]+)(.*)?([\w\-]+)?$} [HTTP::payload]]
log local0.info "url_indices: $url_indices"
foreach url_idx $url_indices {
set url_start [lindex $url_idx 0]
set url_end [lindex $url_idx 1]
set url_len [expr {$url_end - $url_start + 1}]
log local0.info "url_start: $url_start url_end: $url_end url_len: $url_len"
set url_address [string range [HTTP::payload] $url_start $url_end]
log local0.info "url_address: $url_address"
Check to see if URL is not part of allowed hosts data group
if { !([matchclass $url_address contains $::valid_hosts]) } {
If not a valid URL, then mask out URLs with X's
HTTP::payload replace $url_start $url_len [string repeat "X" $url_len]
}
}
HTTP::release
}
As for your regex expression, I'd say you don't want to start with ^ or end with $, since URI could start or end mid-line. Just removing the start & endpoint restrictions, I can see what look like some issues with too-greedy wildcards. You mention you want to replace hrefs, but the regex doesn't look for the string "href", and it would most likely be a good optimization to lock down the regex op as much as possible, so a better definition of what you want to mask is in order and we can refine the expression.
HTH
/deb