Learn F5 Technologies, Get Answers & Share Community Solutions Join DevCentral

Filter by:
  • Solution
  • Technology
Answers

Regsub always returns a 1 never 0

I've tried running the script that is posted which uses regsub to search for Social Security Numbers in the form xxx-xx-xxxx. I've tried several different permuations of this but can never get anything but a value of "1" for $new_response1.
Click here to see the link to the iRule:
http://devcentral.f5.com/Default.aspx?TabID=29&newsType=ArticleView&articleId=25


The line in the last section:
 
if {$new_response1 !=0} {
then replace content...
}


Always returns a value of 1, even if there is no SSN found. I'm guessing that someone used != 0 to get it to work since it is never set to 0. i.e. if no SS# is found.
0
Rate this Question

Answers to this Question

placeholder+image
USER ACCEPTED ANSWER & F5 ACCEPTED ANSWER
Yes, that example is wrong. Here is the correct HTTP_RESPONSE_DATA:
 
when HTTP_RESPONSE_DATA {
set payload [HTTP::payload [HTTP::payload length]]
set ssnx “xxx-xxx-xxxx”
# Find the SSN numbers
if { [regsub -all {\d{3}-\d{2}-\d{4}} $payload $ssnx new_response] > 0 } {
# Replace the content if there was any matches
HTTP::payload replace 0 [HTTP::payload length] $new_response
}
}

Basically, the variable "new_response" always contains the original payload, except that is has been modified per the regsub. The command returns the count of the number of matching ranges that were found and replaced.

Thanks for catching and pointing this mistake out. I will get the article corrected.
0
placeholder+image
USER ACCEPTED ANSWER & F5 ACCEPTED ANSWER


I get basically the some result. Even if I have a SSN in the HTTP payload this fails through to the else statement i.e. no SSN Found.
Same thing if there is not an SSN number in the payload. Finding
the SSN's was never a problem but now it doesn't look like its even finding them.

when HTTP_RESPONSE_DATA {
set payload [HTTP::payload [HTTP::payload length]]
set mcnx ?xxMASTERCARDxx?
set visanx ?xxxxxVISAxxxxx?
set amexnx ?xxxxxAMEXxxxxx?
set ssnx xxx-xx-xxxx
# Find the SSNumbers
if { [regsub -all {/d{3}-/d{2}-/d{4}} $payload ssnx new_response] > 0 } {
log local0. "Outbound SSNumber Alert!!!"
log local0. $new_response
}
else {
log local0. "NO SSN Found!"
log local0. $new_response
}


Here is the output with the SSN:

Jul 12 00:36:35 tmm tmm[744]: Rule creditcard_detector <HTTP_RESPONSE_DATA>: NO SSN Found!
Jul 12 00:36:35 tmm tmm[744]: Rule creditcard_detector <HTTP_RESPONSE_DATA>: File: 4k.htm, Block 0000/0004.................................. 0000 00: 012-34-56789ABCDEFABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghyjkl 111-22-3333 0000 01: 0123456789ABCDEFABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghyjkl 0000 02: 0123456789ABCDEFABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghyjkl 0000
0
placeholder+image
USER ACCEPTED ANSWER & F5 ACCEPTED ANSWER
Perhaps it's because the regex isn't right. It looks light you have forward slashes / instead of backslashes \:
 
{/d{3}-/d{2}-/d{4}}

should be:
 
{\d{3}-\d{2}-\d{4}}

0
placeholder+image
USER ACCEPTED ANSWER & F5 ACCEPTED ANSWER
Thank you. That looks like it works. \ and / look the same to me at this point in my day! Is there a way for me to
grab the SSN's or Credit Card Numbers (I'm adding those) to variables
so I can process them further e.g. log them, do a MOD10 check for
valid credit card numbers, etc.

Thx
0
placeholder+image
USER ACCEPTED ANSWER & F5 ACCEPTED ANSWER
You can use the Tcl command 'regexp' to match a regular expression and have it return the matched portions.

So, for example,
 
set card_nums [regexp -all -inline {\d{4}-\d{4}-\d{4}-\d{4}} $payload]
if { $card_nums ne "" } {
log "Found credit card numbers: $card_nums"
}
0
placeholder+image
USER ACCEPTED ANSWER & F5 ACCEPTED ANSWER

Thanks that seems to work. I'm trying to log the entire page
if I find an SSN/CCNUM but it only seems to log up to a certain
number of bytes, so large pages get cutoff. I don't see any settings anywhere that sets the maximum log message size. Can this be set?

0
placeholder+image
USER ACCEPTED ANSWER & F5 ACCEPTED ANSWER
Unfortunately, we have followed the standard for syslog (RFC3164) which states:

4.1 syslog Message Parts

The full format of a syslog message seen on the wire has three
discernable parts. The first part is called the PRI, the second part
is the HEADER, and the third part is the MSG. The total length of
the packet MUST be 1024 bytes or less. There is no minimum length of
the syslog message although sending a syslog packet with no contents
is worthless and SHOULD NOT be transmitted.

Therefore, the only way for you to log more than 1024 bytes would be through multiple log statements. Also be sure to read this post:
Click here - http://devcentral.f5.com/default.aspx?tabid=28&view=topic&forumid=5&postid=2921
0
placeholder+image
USER ACCEPTED ANSWER & F5 ACCEPTED ANSWER
Thanks.

I've searched DevCentral for any creditcard number parsers but
can't find anything other than the SSN iRule. Do you have
anything more relavent to CCN's? I've used the SSN
to develop a CCN parser but I see that its CPU hungry and
it's not very efficient. It works but it's a first shot at it, also
I have a programming background but am new to TCL.
I'm basically doing the following:

when HTTP_RESPONSE_DATA {

set payload [HTTP::payload [HTTP::payload length]]

and so on...

then the CC# searches and logging:

set card_nums [regexp -all -inline {5[1-5]\d{14}} $payload]
if { $card_nums ne "" } {
log "Possible MasterCard Number(s) Found: $card_nums Client SourceIP: $clientip Accessing URI: $clienturi via ServerIP: $serverip"
}
set card_nums [regexp -all -inline {3[4-7]\d{13}} $payload]
if { $card_nums ne "" } {
log "Possible AmerianExpress Card Number(s) Found: $card_nums Client SourceIP: $clientip Accessing URI: $clienturi via ServerIP: $serverip"
}
set card_nums [regexp -all -inline {4\d{15}} $payload]
if { $card_nums ne "" } {
log "Possible Visa Card Number(s)Found: $card_nums Client SourceIP: $clientip Accessing URI: $clienturi via ServerIP: $serverip"
}
set card_nums [regexp -all -inline {6011\d{12}} $payload]
if { $card_nums ne "" } {
log "Possible Discover Card Number(s)Found: $card_nums Client SourceIP: $clientip Accessing URI: $clienturi via ServerIP: $serverip"
}



I'm sure this could be more efficient since I need to do a regexp for
each credit card vendor (more than one in some cases). Do you
have any recommendations on this? Ideally this is what I would like to do:


1. Find all instances of a credit card numbers for multiple vendors in server responses(their could be more than one CCN per/response), as above.
2. Perform a MOD10 check on each CCN to see if it's a valid
CCN or not (I have a TCL MOD10 check that I run manually)
3. Either log the valid CC numbers and/or possibly wipe them from the response(just the valid ones that passed the MOD10 - eliminates false
positives).

Easy right


0
placeholder+image
USER ACCEPTED ANSWER & F5 ACCEPTED ANSWER
So far, I like where this is going - more complicated!

First, I would like to point out that [HTTP::payload] is equivalent to [HTTP::payload [HTTP::payload length]], but obviously less overhead as one less command is being evaluated.

There are (at least) two routes of optimization we could take here:
A) Stick with regex, but improve the way it's extracting.
B) Switch to more of a strings/scanf pattern based approach

Of these two methods, I'm ultimately unsure which will yield the best result but I'd probably stick with regex (I'll leave this for you to explore). Also, I can show how to include a LUHN check so you can log valid/invalid CC #'s.

So, on to the optimizations:

A) The first thing you should look at is how to combine all the regexp's into one. This can be accomplished with regular expression grouping and logical or's. So, your code might look like this:
 
# Find ALL the possible credit card numbers in one pass
set card_nums [regexp -all -inline {(?:3[4-7]\d{13})|(?:4\d{15})|(?:5[1-5]\d{14})|(?:6011\d{12})} $payload]

# Now iterate over each one and check, categorize and log it
foreach cardnum $card_nums {
set cclen [string length $cardnum]
set double [expr {($cclen & 1) + 1}]
set chksum 0
set i 0
while { $i < $cclen } {
set c [string index $cardnum $i]
if {[incr i] & $double} {
if {[incr c $c] >= 10} {incr c -9}
}
incr chksum $c
}
switch [string index $cardnum 0] {
3 { set type AmericanExpress }
4 { set type Visa }
5 { set type MasterCard }
6 { set type Discover }
default { set type Unknown }
}
if { ($chksum%10) == 0 } {
set isCard valid
} else {
set isCard invalid
}
log local0. "Found $isCard $type CC# $cardnum - Client SourceIP: $clientip Accessing URI: $clienturi via ServerIP: $serverip"
}


B) Use a non-regular expression, scanf-based matching to find the credit card numbers:
 
set i 0
set len [string length $payload]
while { $i < $len } {
if {[scan [string range $payload $i $len] "%[^0-9]%n%[0-9]%n" junk spos cc epos] != 4} {
break
}
set cclen [expr {$epos - $spos}]
switch -glob $cc {
3[4-7]????????????? { set type AmericanExpress }
4??????????????? { set type Visa }
5[1-5]?????????????? { set type MasterCard }
6011???????????? { set type Discover }
default { set type Unknown }
}
incr i $epos
# Insert LUHN check here or do "lappend card_nums $cc" and check later
}

Note: Above examples have not been tested.
0
placeholder+image
USER ACCEPTED ANSWER & F5 ACCEPTED ANSWER
Oh, I forgot to add that if you want to scrub the CC#, then you'll likely want to change the regexp to --indices which will return a list of the start, end index of each match. You can then use string range to extract each card num for validation but then use the indices with the command "HTTP::payload replace <offset> <length> <replacement_string>" to scrub out the card number (replacing just sections of the payload is going to be more efficient than replacing the entire thing with a modified $payload variable).
0
placeholder+image
USER ACCEPTED ANSWER & F5 ACCEPTED ANSWER
gwegener,

This is a GREAT candidate for the iRules contest (Hint Hint Hint...). Once you get the CCN MOD10 check included in the Rule ( You'll have to include something beyond what unRuleY has supplied you with ), I'd suggest you submit it to the contest (Click here).

Unfortunately, since unRuleY is one of the judges (and an F5 employee), he's not able to enter himself. Sorry unRuleY.
-Joe
0
placeholder+image
USER ACCEPTED ANSWER & F5 ACCEPTED ANSWER
Thank you. I tried combining the regsub's and it looks like
it reduced the CPU load 5-10% compared to doing sequential searches. Furthermore if I add expressions to the combined regsub it appears to have a minimial impact on CPU vs adding another sequential search!
Good call!

I was just testing your MOD10 check and it looks OK for MC and VISA but
does not work for AMEX. I've tried 2 valid amex card numbers and it flags them both as invalid. I've looked at the code but quite frankly I'm still working on deciphering it. Can you please take a look?

Thx
0
placeholder+image
USER ACCEPTED ANSWER & F5 ACCEPTED ANSWER
I did some more troubleshooting on this. Best I can tell it looks like valid AMEX numbers are flagged as invalid and invalid numbers are flagged as valid, at least with a few test card numbers.
0
placeholder+image
USER ACCEPTED ANSWER & F5 ACCEPTED ANSWER

I logged the checksum for a valid amexcard number (which I x'd out). I'm
sure it's valid. This is a log enrty:


Jul 13 20:07:21 tmm tmm[733]: Rule CC_Parser_03 <HTTP_RESPONSE_DATA>: Found invalid AmericanExpress CC# xxxxxxxxxxxxxxx Checks
um (49%10)- Client SourceIP: 10.254.101.1 Accessing URI: /plhomepage_ALL.htm via ServerIP: 10.254.105.14
0
placeholder+image
USER ACCEPTED ANSWER & F5 ACCEPTED ANSWER
unRuleY is right, this is some fun stuff!

Give this a try. I don't think unRuleY's code took into account even and odd digits in the MOD10 calculation. Here's a modified version of unRuleY's rule that should work:

 # Find ALL the possible credit card numbers in one pass   
set card_nums [regexp -all -inline {(?:3[4-7]\d{13})|(?:4\d{15})|(?:5[1-5]\d{14})|(?:6011\d{12})} $payload]

foreach cardnum $cardnums {
set cclen [string length $cardnum]
set parity [expr {$cclen % 2}]
set chksum 0
set i 0
set isCard "invalid"
while { $i < $cclen } {
set c [string index $cardnum $i]
if {($i % 2) == $parity} {
if {[incr c $c] >= 10} {
incr c -9
}
}
incr chksum $c
incr i
}

switch [string index $cardnum 0] {
3 { set type AmericanExpress }
4 { set type Visa }
5 { set type MasterCard }
6 { set type Discover }
default { set type Unknown }
}

if { ($chksum%10) == 0 } {
set isCard valid
}
log local0. "Found $isCard $type CC# $cardnum - Client SourceIP: $clientip Accessing URI: $clienturi via ServerIP: $serverip"
}


Or, here's a bit more exhaustive check based on the following Luhn Algoritm (Click here)

The algorithm proceeds in three steps. Firstly, every second digit, beginning with the next-to-rightmost and proceeding to the left, is doubled. If that result is greater than nine, its digits are summed (which is equivalent, for any number in the range 10 though 18, of subtracting 9 from it). Thus, a 2 becomes 4 and a 7 becomes 5. Secondly, all the digits are summed. Finally, the result is divided by 10. If the remainder is zero, the original number is valid.


 function checkLuhn(string purportedCC) {  
int sum := 0
int nDigits := length(purportedCC)
int parity := nDigits modulus 2
for i from 0 to nDigits - 1 {
int digit := integer(purportedCC[ i ])
if i modulus 2 = parity
digit := digit × 2
if digit > 9
digit := digit - 9
sum := sum + digit
}
return (sum modulus 10) = 0
}


with a few extra sanity checks for a given types first number and length.

 # Find ALL the possible credit card numbers in one pass   
set card_nums [regexp -all -inline {(?:3[4-7]\d{13})|(?:4\d{15})|(?:5[1-5]\d{14})|(?:6011\d{12})} $payload]
foreach cardnum $cardnums {
set cclen [string length $cardnum]
set isCard "invalid"
set type "Unknown"

# See if card number is 13, 15, or 16 digits,
# The only legal size for credit cards
if { ($cclen == 13) || ($cclen == 15) || ($cclen == 16) } {

# Calculate
set chksum 0
set nDigits $cclen
set parity [expr {$nDigits % 2}]
set i 0;
while { $i < $nDigits } {
set digit [string index $cardnum $i]
if { $parity == ($i % 2) } {
set digit [expr {$digit * 2}]
}
if { $digit > 9 } {
incr digit -9
}
incr chksum $digit
incr i
}

if { ($chksum%10) == 0 } {
set isCard valid
}

# figure out card type
set first_digit [string index $cardnum 0]

# VISA is 13 or 16 chars starting with a 3
if { (4 == $first_digit) && ((13 == $cclen) || (16 == $cclen)) } {
set type Visa
} elseif { (3 == $first_digit) && (15 == $cclen) } {
set type AmericanExpress
} elseif { (5 == $first_digit) && (16 == $cclen) } {
set type MasterCard
} elseif { (6 == $first_digit) && (16 == $cclen) } {
set type Discover
}
}

log local0. "Found $isCard $type CC# $cardnum - Client SourceIP: $clientip Accessing URI: $clienturi via ServerIP: $serverip"
}


I'll leave it to you to try to optimize this

*Standard Disclaimer, this rule is not fully tested...

Cheers!

-Joe
0
placeholder+image
USER ACCEPTED ANSWER & F5 ACCEPTED ANSWER
I found the problem with my version of the LUHN algorithm. I was trying to avoid the use of % as it is a relatively expensive operation compared to & (by a factor of about 10). Here is the corrected version:
 
# Find ALL the possible credit card numbers in one pass
set card_nums [regexp -all -inline {(?:3[4-7]\d{13})|(?:4\d{15})|(?:5[1-5]\d{14})|(?:6011\d{12})} $payload]

# Now iterate over each one and check, categorize and log it
foreach cardnum $card_nums {
set cclen [string length $cardnum]
set double [expr {$cclen & 1}]
set chksum 0
for { set i 0 } { $i < $cclen } { incr i } {
set c [string index $cardnum $i]
if {($i & 1) == $double} {
if {[incr c $c] >= 10} {incr c -9}
}
incr chksum $c
}
switch [string index $cardnum 0] {
3 { set type AmericanExpress }
4 { set type Visa }
5 { set type MasterCard }
6 { set type Discover }
default { set type Unknown }
}
if { ($chksum % 10) == 0 } {
set isCard valid
} else {
set isCard invalid
}
log local0. "Found $isCard $type CC# $cardnum - Client SourceIP: $clientip Accessing URI: $clienturi via ServerIP: $serverip"
}

I found a buddy with an Amex card and now have tested this with both my Visa cards and his Amex number and it appears to be working for both now.

Also, there is no need to check the lengths and or more than the first digit as the regular expression is only going to find card numbers according to the RE.
0
placeholder+image
USER ACCEPTED ANSWER & F5 ACCEPTED ANSWER
As usual, unRuleY out-optimizes me...

I'd go with his solution

-Joe
0
placeholder+image
USER ACCEPTED ANSWER & F5 ACCEPTED ANSWER
I did some testing and it it looks like the lastest code works for MC/VISA/AMEX card numbers. I'll be digging in some more over the next days and weeks. Since I'm still getting up to speed on TCL/iRules I may tap your expertise again on this topic...

Thank you very much for your help with this, I really appreciate it.
0
placeholder+image
USER ACCEPTED ANSWER & F5 ACCEPTED ANSWER
Here is an improved example of the SSN scrubber which uses regexp -indices to only replace the specific portions of the payload. This has significantly better performance (I also changed the check for matching uris to use a class instead of a single if check):

class scrub_uris {
"/cgi-bin",
"/account"
}
rule ssn_scrubber {
when HTTP_REQUEST {
if { [matchclass [HTTP::uri] starts_with $::scrub_uris] } {
set scrub_content 1
# Don't allow data to be chunked
if { [HTTP::version] eq "1.1" } {
HTTP::version "1.0"
}
} else {
set scrub_content 0
}
}
when HTTP_RESPONSE {
if { $scrub_content } {
if { [HTTP::header exists "Content-Length"] } {
set content_length [HTTP::header "Content-Length"]
} else {
set content_length 4294967295
}
if { $content_length > 0 } {
HTTP::collect $content_length
}
}
}
when HTTP_RESPONSE_DATA {
# Find the SSN numbers
set ssn_indices [regexp -all -inline -indices {\d{3}-\d{2}-\d{4}} [HTTP::payload]]
# Scrub the SSN's from the response
foreach ssn_idx $ssn_indices {
set ssn_start [lindex $ssn_idx 0]
set ssn_len [expr {[lindex $ssn_idx 1] - $ssn_start + 1}]
HTTP::payload replace $ssn_start $ssn_len "xxx-xx-xxxx"
}
}
}
0
placeholder+image
USER ACCEPTED ANSWER & F5 ACCEPTED ANSWER
FYI -

I was testing your latest credit card scrubber iRule and I noticed that the AMEX search portion looks incorrect. From what I can find publically AMEX cards can begin with a 34 or a 37. I looks like you are flagging 3[4-7] in the iRule. Should it be
3[4||7] instead?
0
placeholder+image
USER ACCEPTED ANSWER & F5 ACCEPTED ANSWER
You are correct. It's the same issue for Mastercard that starts with 51 or 55, not 51 through 55. I'll update the CreditCardScrubber sample in the iRules CodeShare section of the wiki..

http://devcentral.f5.com/wiki/default.aspx/iRules/CreditCardScrubber.html
Click here


-Joe
0
placeholder+image
USER ACCEPTED ANSWER & F5 ACCEPTED ANSWER

I think Mastercard is 51-55 (51 though 55):

http://www.beachnet.com/~hstiles/cardtype.html
0
placeholder+image
USER ACCEPTED ANSWER & F5 ACCEPTED ANSWER
Correct again. I was going by this page (Click here) and I guess the commas and dashes are just too much for my mind to take in today

The wiki sample has been updated with the Amex fix.

Thanks for the corrections - we really appreciate it!

-Joe
0
placeholder+image
USER ACCEPTED ANSWER & F5 ACCEPTED ANSWER
It looks like this iRule would check all returned objects/content (i.e. gifs, jpeg, html, css, etc.) for credit card numbers. Would it be possible to modify it to check only a limited number of content types e.g. html ?

0
placeholder+image
USER ACCEPTED ANSWER & F5 ACCEPTED ANSWER
Good point. I've updated the wiki sample with a check to only test for text based responses (ie. Content-Type starts_with "text/"). If you need to be more specific, you could easily create a data group containing the content types you want to check for and replace my conditional with a matchclass command comparing the content-type to a member of the data group.

-Joe
0
placeholder+image
USER ACCEPTED ANSWER & F5 ACCEPTED ANSWER
I tried running the lastest script using matchclass but can't get it working. I first created a data group (string) called "ContentTypes" and added text/html, text/css, etc. - and replaced:

when HTTP_RESPONSE {
# Only check responses that are a text content type
# (text/html, text/xml, text/plain, etc).
if { [HTTP::header "Content-Type"] starts_with "text/" } {

with

when HTTP_RESPONSE {
# Only check responses that in the data group "ContentTypes"
# (text/html, text/xml, text/plain, etc).
if { [matchclass [HTTP::header "Content-Types] matches $::ContentTypes] } {

I tried a few variations but can't get it to work.
0
placeholder+image
USER ACCEPTED ANSWER & F5 ACCEPTED ANSWER
I believe the header is "Content-Type", not "Content-Types". Also, you aren't terminating the header name with a quote. That could be where your problem lies. I would use something like this:

  ...
if { [matchclass [HTTP::header "Content-Type"] equals $::ContentTypes] } {


If that still doesn't work, the docs for matchclass say that you should use the data group first, but I've been told it should work both ways. You can try this as well.

  ...
if { [matchclass $::ContentTypes equals [HTTP::header "Content-Type"]] } {


If it still doesn't work, I'd throw in some logging as to what the value of [HTTP::header "Content-Type"] is.

-Joe
0
placeholder+image
USER ACCEPTED ANSWER & F5 ACCEPTED ANSWER
I found a bug -- sorta -- in this iRule!

This tends to match against FedEx tracking numbers, so if you're generating those and sending them out, you'll find a few digits, then a string of X characters, then a few more digits. Perhaps you need to add a whitespace border to the rule?
0