I had the privilege of watching an exercise in string handling unfold on an internal list recently that should have its day in the community.

The Problem

URLs for this particular application arrive in this format:

/session/636971/abcd/hlsa/dcwiuh/index.html
/session/128766/ssrg/hlsb/yuyt/yutyu/index.html
/session/128766/dwlmhlsb/ewfwef/index.html
/session/122623/wqs/wedew/mhlsb/ewfwef/uiy/index.html

The useful data to extract exists between /session/[nnn] and index.html, so the final data we want, respectively, is:

/abcd/hlsa/dcwiuh
/ssrg/hlsb/yuyt/yutyu
/dwlmhlsb/ewfwef
/wqs/wedew/mhlsb/ewfwef/uiy

The Solutions

There are almost always multiple solutions to a problem set, and this scenario is no different. Each approach is shown below. The proc wrapper is only for testing purposes, the code within the proc is the actual solution for each approach.

Approach 1: Split & Lists

proc splitjoin {arg} {
  # convert: /session/1111/abcd/efg/hij/index.html
  # to: { {} session 1111 abcd efg hih index.html }
  set sl [split $arg {/}]
  # remove leading { {} session 1111 } and trailing index.html from $sl,
  # then produce / followed by remainder of $sl joined with /
  set result "/[join [lrange $sl 3 [expr { [llength $sl] -2}]] {/}]"
}

Approach 2a: Scan with all variables

proc scana {arg} {
  scan $arg {/%[^/]/%[^/]%s} a b c
  set result [string range $c 0 [expr { [string last {/} $c] - 1 }]]
}

Approach 2b: Scan with the necessary variable

proc scanb {arg} {
  # split the path, skipping ‘session’ and ‘[nnn]’ and setting c to ‘/[xxx]/[yyy]’
  scan $arg {/%*[^/]/%*[^/]%s} c
  # remove the set of characters following the last slash (i.e., ‘/[yyy]’) in c
  set result [string range $c 0 [expr { [string last {/} $c] - 1 }]]
}

Approach 3: Regular Expressions

proc regex1 {arg} {
  # after match, $whole is /session/[nnn]/[xxx]/ and $result is /[xxx]
  regexp {^/[^/]+/[^/]+(/.+)/} $arg whole result
}

The Results

In order from least to most efficient:

Approach Command Time
Regular Expressions time {regex1 $x} 100000 17.22531 microseconds
Scan will all variables time {scana $x} 100000 7.999566 microseconds
Split & Lists time {splitjoin $x} 100000 6.90115 microseconds
Scan with necessary variable time {scanb $x} 100000 6.12924 microseconds

 

Where $x is the longest of the original strings at the top of this article. I ran these tests in the tclsh on BIG-IP LTM VE 11.2.1 running on an ESXi 4.1 installation. Actual numbers in iRules will likely be different, but the performance of these commands in relation to one another shouldn't vary much. Many thanks to F5er Vernon Wells, Cameron Jenkins, and Ken Wong for the source information!