TLS Fingerprinting to profile SSL/TLS clients without decryption

In the F5 SIRT we are always looking for new and better ways to profile incoming traffic to try and sort the wheat from the chaff; or in our case, usually, legitimate from illegitimate traffic in order to apply some kind of blocklist/allowlist to the traffic – and if we can do that automatically based on some thresholds, all the better.

If the traffic is being decrypted – either in front of or on the BIG-IP – and it is some easily inspectable format like HTTP then we have plenty of tools at our disposal and could, for example, look for unusual User-Agent strings or other artefacts. There are times, though, when we need to look at something lower down the OSI model to classify the traffic, either because the traffic isn’t being decrypted or because the adversary is employing a wide range of L7 evasions (picking from a large list of potentially valid User-Agent strings for example).

Often, though, malicious tools are built upon a standard set of lower-layer libraries or have hard coded TLS handshake parameters, and that’s where fingerprinting traffic at the TLS layer has great power.

F5 already has a solution for TLS fingerprinting (https://devcentral.f5.com/s/articles/tls-fingerprinting-a-method-for-identifying-a-tls-client-without-decrypting-24598) based on work by Lee Brotherston, and that works extremely well when you want to consume the fingerprints directly in some way – for example, have a Class full of fingerprints and use matches to direct traffic – but I wanted a solution that would produce a smaller, easier to handle hash of the fingerprint that could be easily counted in tables or shipped off to a log aggregator. Enter JA3!

Originally developed at Salesforce, but now an open-source methodology (https://github.com/salesforce/ja3), JA3 produces a nice easy-to-use hash. For example, the hash of the handshake used by Trickbot malware is:

6734f37431670b3ab4292b8f60f29984

As I say, you can then either use that fingerprint directly (for example you could count how many times you’ve seen that fingerprint in any given space of time and rate limit specific client groups) or use HSL to send it off to a remote logger.

So to get into the iRules themselves, we’ll have two parts – a library rule and a rule that calls this library on each TLS handshake. Credit where credit is due, these are both heavily based on Kevin Stewarts earlier work as well as the JA3 algorithm.

Library Rule

Create the following iRule PROC – remember, whatever you call it in the TMUI is what appears in the first part of the [call] statement. Our call statement says [call Library-Rule::fingerprintTLS] so you’d want to save it as Library-Rule – if you already have Kevin’s rule deployed you’ll need to adjust to accommodate, of course.

## Library-Rule

## JA3 TLS Fingerprint Procedure #################
##
## Author: Aaron Brailsford, 06/2020
## Based on the TLS Fingerprinting iRule by Kevin Stewart @ https://devcentral.f5.com/s/articles/tls-fingerprinting-a-method-for-identifying-a-tls-client-without-decrypting-24598
## Derived from Lee Brotherston's "tls-fingerprinting" project @ https://github.com/LeeBrotherston/tls-fingerprinting
## Purpose: to identify the user agent based on unique characteristics of the TLS ClientHello message
## Input:
##   Full TCP payload collected in CLIENT_DATA event of a TLS handshake ClientHello message
##   Record length (rlen)
##   TLS inner version (sslversion)
##############################################
proc fingerprintTLS { payload rlen sslversion } {

  ## The first 43 bytes of a ClientHello message are the record type, TLS versions, some length values and the
  ## handshake type. We should already know this stuff from the calling iRule. We're also going to be walking the
  ## packet, so the field_offset variable will be used to track where we are.
  set field_offset 43

  ## The first value in the payload after the offset is the session ID, which may be empty. Grab the session ID length
  ## value and move the field_offset variable that many bytes forward to skip it.
  binary scan ${payload} @${field_offset}c sessID_len
  set field_offset [expr {${field_offset} + 1 + ${sessID_len}}]

  ## The next value in the payload is the ciphersuite list length (how big the ciphersuite list is.
  binary scan ${payload} @${field_offset}S cipherList_len

  ## Now that we have the ciphersuite list length, let's offset the field_offset variable to skip over the length (2) bytes
  ## and go get the ciphersuite list.
  set field_offset [expr {${field_offset} + 2}]
  binary scan ${payload} @${field_offset}S[expr {${cipherList_len} / 2}] cipherlist_decimal

  ## Next is the compression method length and compression method. First move field_offset to skip past the ciphersuite
  ## list, then grab the compression method length. Then move field_offset past the length (2)
  ## Finally, move field_offset past the compression method bytes.
  set field_offset [expr {${field_offset} + ${cipherList_len}}]
  binary scan ${payload} @${field_offset}c compression_len
  set field_offset [expr {${field_offset} + 1}]
  set field_offset [expr {${field_offset} + ${compression_len}}]

  ## We should be in the extensions section now, so we're going to just run through the remaining data and
  ## pick out the extensions as we go. But first let's make sure there's more record data left, based on
  ## the current field_offset vs. rlen.
  if { [expr {${field_offset} < ${rlen}}] } {
    ## There's extension data, so let's go get it. Skip the first 2 bytes that are the extensions length
    set field_offset [expr {${field_offset} + 2}]

    ## Make a variable to store the extension types we find
    set extensions_list ""

    ## Pad rlen by 1 byte
    set rlen [expr {${rlen} + 1}]

    while { [expr {${field_offset} <= ${rlen}}] } {
      ## Grab the first 2 bytes to determine the extension type
      binary scan ${payload} @${field_offset}S ext
      set ext [expr {$ext & 0xFFFF}]

      ## Store the extension in the extensions_list variable
      lappend extensions_list ${ext}

      ## Increment field_offset past the 2 bytes of the extension type
      set field_offset [expr {${field_offset} + 2}]

      ## Grab the 2 bytes of extension lenth
      binary scan ${payload} @${field_offset}S ext_len

      ## Increment field_offset past the 2 bytes of the extension length
      set field_offset [expr {${field_offset} + 2}]

      ## Look for specific extension types in case these need to increment the field_offset (and because we need their values)
      switch $ext {
        "11" {
          ## ec_point_format - there's another 1 byte after length
          ## Grab the extension data
          binary scan ${payload} @[expr {${field_offset} + 1}]s ext_data
          set ec_point_format ${ext_data}
        }
        "10" {
          ## elliptic_curves - there's another 2 bytes after length
          ## Grab the extension data
          binary scan ${payload} @[expr {${field_offset} + 2}]S[expr {(${ext_len} - 2) / 2}] ext_data
          set elliptic_curves ${ext_data}
        }
        default {
          ## Grab the otherwise unknown extension data
          binary scan ${payload} @${field_offset}H[expr {${ext_len} * 2}] ext_data
        }
      }

      ## Increment the field_offset past the extension data length. Repeat this loop until we reach rlen (the end of the payload)
      set field_offset [expr {${field_offset} + ${ext_len}}]
    }
  }

  ## Now let's compile all of that data.
  ## The cipherlist values need masking with 0xFFFF to return the unsigned integers we need
  foreach cipher $cipherlist_decimal {
   lappend cipd [expr {$cipher & 0xFFFF}]
  }
  set cipd_str [join $cipd "-"]
  if { ( [info exists extensions_list] ) and ( ${extensions_list} ne "" ) } { set exte [join ${extensions_list} "-"] } else { set exte "" }
  if { ( [info exists elliptic_curves] ) and ( ${elliptic_curves} ne "" ) } { set ecur [join ${elliptic_curves} "-"] } else { set ecur "" }
  if { ( [info exists ec_point_format] ) and ( ${ec_point_format} ne "" ) } { set ecfp [join ${ec_point_format} "-"] } else { set ecfp "" }

  set ja3_str "${sslversion},${cipd_str},${exte},${ecur},${ecfp}"
  ## binary scan [md5 ${ja3_str}] H* ja3_digest

  ## Un-comment this line to display the fingerprint string in the LTM log for troubleshooting
  #log local0. "ja3 = ${ja3_str}"

  return ${ja3_str}
}

Exactly as in Kevin’s rules, the PROC takes the full TCP payload as input along with the record length and the “inner” TLS version. The PROC then walks the payload extracting the values we need to calculate the hash before calculating and passing the hash back to the calling iRule.

Create the caller iRule

Now there’s just one thing left to do, again just as with Kevin’s rules. This iRule has enough code to detect and collect the entire TLS ClientHello and then send the fingerprint payload off to the PROC. This is really just an example – if you’re using this then you want to put your logic, or code to ship the hash off to a logger, between the “Do Something here” lines.

when CLIENT_ACCEPTED {
  ## Collect the TCP payload
  TCP::collect
}
when CLIENT_DATA {
  ## Get the TLS packet type and versions
  if { ! [info exists rlen] } {
    ## We actually only need the recort type (rtype), record length (rlen) handshake type (hs_type) and 'inner' SSL version (inner_sslver) here
    ## But it's easiest to parse them all out of the payload along with the bytes we don't need (outer_sslver & rilen)
    binary scan [TCP::payload] cSScH6S rtype outer_sslver rlen hs_type rilen inner_sslver

    if { ( ${rtype} == 22 ) and ( ${hs_type} == 1 ) } {
      ## This is a TLS ClientHello message (22 = TLS handshake, 1 = ClientHello)

      ## Call the fingerprintTLS proc
      set ja3_fingerprint [call Library-Rule::fingerprintTLS [TCP::payload] ${rlen} ${inner_sslver}]
      binary scan [md5 ${ja3_fingerprint}] H* ja3_digest

### Do Something here ###
      log local0. "[IP::client_addr]:[TCP::client_port] ja3 ${ja3_fingerprint}->${ja3_digest}"
### Do Something here ###

    }
  }

  # Collect the rest of the record if necessary
  if { [TCP::payload length] < $rlen } {
    TCP::collect $rlen
  }

  ## Release the paylaod
  TCP::release
}

Caveats and considerations

The first thing to mention is that binary operations in iRules are very computationally expensive – to the extent that on a system under attack you might not be able to deploy something like this if the BIG-IP is already under significant stress. That said, we are most often in situations where the BIG-IP isn’t under stress but the back end applications are and in those situations, these rules can be very handy indeed.

Because we aren’t comparing hashes against a pre-computed or pre-compiled list (as the work by Kevin & Lee does) there is no worry about having an incomplete hash table, but it does mean that you aren’t getting any 0th-second intelligence until you’ve had a chance to look at traffic volumes per hash and collect some information.

Hopefully you find these rules useful and if you have any feedback let us know!

Published Jul 23, 2020

Version 1.0