#infosec #adcfw

My previous post was a quick analysis of the US-Cert Vulnerability #903934, coined the “Post Of Doom.”  Much of the buzz has been about ASP.NET’s vulnerability to the attack, but that’s more because Microsoft has gone out of its way to quickly issue and publicize a patch.  Microsoft has plenty of company; essentially all major back end web platforms are vulnerable including Ruby, PHP, Java, Oracle, et al. The DevCentral community cultivated an iRule to mitigate the issue for all back-end platforms. Now that the initial buzz (and panic) has passed, let’s take a closer look at the specific mechanics of the vulnerability.

Modern web applications include web forms with hundreds of name+value pairs, many of them hidden, that communicate data between the browser and the application. All of these variables are managed, almost universally, by placing them into a language construct called a dictionary. This dictionary is implemented using a hash table, which has all sorts of wonderful characteristics such as fast insertion and fast random lookups that you’d like if you were looking up, say, a word in the dictionary. That hash table, alas, is where the problem comes in.

Example Form with two variables.

 
First name: "text" name="firstname" />

Last name: "text" name="lastname" />

The hosting platform, Java in this case, takes these parameters from the web browser and stores them in the dictionary. The application on the back end can then ask the hosting platform to look up the parameter’s value in its hash table-backed dictionary.

out.println(request.getParameter("firstname"));

Internally, the hosting platform will have hashed the name of the parameters into this table using a fast hash function, like Dan Bernstein’s “times 33” function, used by PHP 5 (ASP.NET and Java use slight variants of this).

/* Efficient hash function created by Daniel J. Bernstein - DJBX33A */
hash_t DJBX33A (const unsigned char *key)
{
hash_t h=0;

while
(*key) h=33*h + *key++;
 return h;
}

Even if you can’t read C code, you can tell that this is a really simple, fast hash function. The algorithm’s simplicity is also its Achilles’ heel; hash collisions will be easy to create and this will allow attackers to overwhelm the hash table dictionary with identically-hashed keys, leading to a degradation of performance as the CPU spends its time managing the collisions.

In their paper, Klink and Wälde show that for simple hashes like the Bernstein hash, when the hash value is only 32 bits (as opposed to 64 bits), collisions can be found quickly (faster than brute force) using a shortcut called “equivalent substrings.”

Some hash functions have the property that if two strings collide, e.g. hash('string1') = hash('string2'), then hashes having this substring at the same position collide as well, e.g. hash('prefixstring1postfix') = hash('prefixstring2postfix'). If for example 'Ez' and 'FY' collide under a hash function with this property, then 'EzEz', 'EzFY', 'FYEz', 'FYFY' collide as well. An observing reader may notice that this is very similar to binary counting from zero to four. Using this knowledge, an attacker can construct arbitrary numbers of collisions (2^n for 2*n-sized strings in this example).

Using the actual code above, here are the hash values for the Ez, FY, and G8.

DJBX33A (“Ez”) = 2399
DJBX33A (“FY”) = 2399
DJBX33A (“G8”) = 2399

According to the “Equivalent Substrings” technique, what’s also true is that these strings are equivalent:

DJBX33A (“EzEzEz”) = 34653
DJBX33A (“EzEzFY”) = 34653
DJBX33A (“EzFYFY”) = 34653
DJBX33A (“FYFYFY”) = 34653

See how easy it is to create colliding form variables when the hash function is susceptible to equivalent strings? More advanced functions, like the MD5 and the SHA hashes, don’t have this characteristic. Ron Rivest once said that the change of a single bit in the input to MD5 results in a change of 50% of the bits in the output. MD5 would be a better choice than these simple Bernstein hashes, but MD5 has its own collision vulnerabilities and shouldn’t necessarily be used anymore. SHA-1 might be a better choice yet, but of course it is much more computationally expensive than multiplying by 33.

Klink and Wälde also detail another shortcut called “meet-in-the-middle,” which does a divide-and-conquer approach to brute-force. Meet-in-the-middle isn’t as fast or easy as equivalent substrings, but it can reduce the key space to the square root of the hash size and would be applicable to either MD5 or SHA.

What is the ultimate solution? In 2007, after MD5 was shown to be much more susceptible to collisions than previously thought, Haveli and Krawczyk authored a paper showing how to modify existing hash functions to be collision resistant. One straightforward technique detailed in their paper is to simply take a global random string and XOR it over the input data before hashing. If the global random string is never non-deterministic and can remain hidden from the attackers (perhaps by being changed periodically), this can secure the hash functions against collision, even for simple hashes like the Bernstein hashes. The use of Target Collision Resistant (TCR) hash functions is very likely the best solution to this problem and has already been implemented by the Ruby Security Team. Other vendors, including Tomcat and Microsoft, simply limit the number of incoming variables.

Here is a collision-resistant version of the above Bernstein hash:

/* DJBX33A, XORing random data to make target collision resistant */
hash_t DJBX33A_TCR(const unsigned char *key)
{
hash_t h=0;
 int i=0;

while(*key) h=33*h + (*key++ ^ rdata[i++ % RMAX]);
 return h;
}

The resulting hash data changes, breaking the equivalent strings attack:

DJBX33A_TCR (“Ez”) = 4659 
DJBX33A_TCR (“FY”) = 4595

Note, these values will change every time the random data changes.

Until all the vendors implement target collision resistant hash functions, the temporary mitigations recommended by the authors are designed to limit post size (the default is 8 MB on many of these systems) and to limit the number of form variables to something reasonable. The DevCentral iRule we talked about last week does both, and consequently protects against this attack no matter which hash function is used on the back-end platform.