posted on Friday, July 16, 2010 4:08 AM
Detecting bots requires more than a simple USER_AGENT check today…
Anyone who’s taken an artificial intelligence class in college or grad school knows all about the Turing Test. If you aren’t familiar with the concept, it was a “test proposed by Alan Turing in his 1950 paper Computing Machinery and Intelligence, which opens with the words: "I propose to consider the question, 'Can machines think?'" Traditional Turing Tests always involve three players, and the goal is to fool a human interviewer such that the interviewer cannot determine which of the two players is human and which is a computer. There are variations on this theme, but they are almost always focused on “fooling” an interviewer regarding some aspect of the machine that it is attempting to imitate.
Common understanding has it that the purpose of the Turing Test is not specifically to determine whether a computer is able to fool an interrogator into believing that it is a human, but rather whether a computer could imitate a human.[44] While there is some dispute whether this interpretation was intended by Turing — Sterrett believes that it was[43] and thus conflates the second version with this one, while others, such as Traiger, do not[41] — this has nevertheless led to what can be viewed as the "standard interpretation." In this version, player A is a computer and player B a person of either gender. The role of the interrogator is not to determine which is male and which is female, but which is a computer and which is a human.[45]
-- Wikipedia, Turing Test
Over the past decade, as the web has grown more connected and intelligent, so too have the bots that crawl its voluminous pages attempting to index the web and make it possible for search engines like Google and Bing to be useful. Simultaneously have come the evil bots, the scripts, the automated attempts at exploiting vulnerabilities and finding holes in software that enable malicious miscreants to access data and systems to which they are not authorized. While a web application firewall and secure software development lifecycle practices can detect an attempted exploit, neither are necessarily very good at determining whether the request is coming from a bot (machine) or a real user.
Given the very real threat posed by bots, it’s becoming increasingly important for organizations to detect and prevent these automated digital rodents from having access to web applications, especially business-critical applications. The trick is, however, to determine which requests are coming from bots and which ones are coming from real users. It’s a trick not only because this determination is difficult to make with a high degree of confidence in the result, but because it needs to be determined on-demand, in real-time.
What organizations need is a sort of “on-demand Turing test” that can sort out the bots from the not bots.
HUMAN INTERACTION HEURISTICS
In a recent release of F5’s web application firewall, ASM (Application Security Manager), there was a very neat little “trick” that essentially provides just what we’ve described: an on-demand Turing test. This capability of ASM attempts to detect whether or not the client side is human by monitoring client-side
interaction events. The assumption is, of course, that a bot will not be interacting with the page but a human must interact – clicking a mouse, typing – in order to interact with the application.
The intention behind this feature is to allow administrators to configure policies that act upon the determination. This is particularly helpful in preventing web scraping and other automated attack tools that make use of scripts and bots to attack sites.
How does it work?
Because ASM is deployed as part of a unified application delivery platform, it is able to take advantage of the systems’ underlying foundational technologies. In this case, TMOS – a full proxy that provides the capability to intercept, inspect, and transform application data. It is application aware, and can interact in real-time with the messages being exchanged between the client and the server. In much the same way as we leverage this capability to inject the Gomez Javascript code necessary to monitor any application for which BIG-IP provides application delivery services, we can inject Javascript into clients that provides the information necessary to determine whether the client is a human being or an automated system.
Based on the results of the determination, the administrator can take action. Rejecting the connection or redirecting to a honey-pot are common actions to take, but ASM like all TMOS-enabled solutions offered as part of the BIG-IP family of solutions the limits are really based on organizational need. iRules (network-side scripting) can be employed to provide additional custom and unique actions, logging, etc…or a pre-defined set of actions can be easily applied. This also provides some measure of future-proofing against modifications by miscreants to bots and automated scripts. Evolving the heuristics used to determine human from bot can be updated without requiring upgrades to the system, requiring modification of injected scripts. the ability to leverage iRules further allows organizations to develop their own methods that may or may not be based on ASM’s included functionality.
This is not a panacea. There is no web application security solution that is. But it is more than what is currently available in most web application security solutions today, and it’s something that isn’t generally included in the application itself, for a variety of reasons. It’s certainly not listed in the OWASP top ten methods for securing an application. There are legitimate uses for bots, after all, so the use of such a deterministic system to unilaterally deny or permit requests must be carefully balanced across the typical usage patterns and needs of the organization.
Regardless, it is another tool in the information security toolbox that can be applied to better secure applications against “users” of malicious intent, and it’s certainly moving web application security in the right direction by being as context-aware as possible when attempting to respond to a request for a resource.