Topics


Blogs


Forums


Samples


Media


Labs


Resources

 




DevCentral > Weblogs > Lori MacVittie - Two Different Socks
 Why it's so hard to secure JavaScript
posted on Friday, September 12, 2008 4:49 AM

The discussion yesterday on JavaScript and security got me thinking about why it is that there are no good options other than script management add-ons like NoScript for securing JavaScript.

In a compiled language there may be multiple ways to write a loop, but the underlying object code generated is the same. A loop is a loop, regardless of how it's represented in the language. Security products that insert shims into the stack, run as a proxy on the server, or reside in the network can look for anomalies in that object code. This is the basis for many types of network security - IDS, IPS, AVS, intelligent firewalls. They look for anomalies in signatures and if they find one they consider it a threat.

While the execution of a loop in an interpreted language is also the same regardless of how it's represented, it looks different to security devices because it's often text-based as is the case with JavaScript and XML. There are only two good options for externally applying security to languages that are interpreted on the client: pattern matching/regex and parsing.

Pattern matching and regular expressions provide minimal value for securing client-side interpreted languages, at best, because of the incredibly high number of possible combinations of putting together code.

      Where's F5?

                    

As we learned from preventing SQL injection and XSS, attackers are easily able to avoid detection by these systems by simply adding white space, removing white space, using encoding tricks, and just generally finding a new permutation of their code.

Parsing is, of course, the best answer. As 7rans noted yesterday regarding the Billion More Laughs JavaScript hack, if you control the stack, you control the execution of the code. Similarly, if you parse the data you can get it into a format more akin to that of a compiled language and then you can secure it. That's the reasoning behind XML threat defense, or XML firewalls. In fact, all SOA and XML security devices necessarily parse the XML they are protecting - because that's the only way to know whether or not some typical XML attacks, like the Billion Laughs attack, are present.

But this implementation comes at a price: performance. Parsing XML is compute intensive, and it necessarily adds latency. Every device you add into the delivery path that must parse the XML to route it, secure it, or transform it failed-securityadds latency and increases response time, which decreases overall application performance. This is one of the primary reasons most XML-focused solutions prefer to use a streaming parser. Streaming parser performance is much better than a full DOM parser, and still provides the opportunity to validate the XML and find malicious code. It isn't a panacea, however, as there are still some situations where streaming can't be used - primarily when transformation is involved.

We know this already, and also know that JavaScript and client-side interpreted languages in general are far more prolific than XML. Parsing JavaScript externally to determine whether it contains malicious code would certainly make it more secure, but it would also likely severely impact application performance - and not in a good way. We also know that streaming JavaScript isn't a solution because unlike an XML document, JavaScript is not confined. JavaScript is delimited, certainly, but it isn't confined to just being in the HEAD of an HTML document. It can be anywhere in the document, and often is.

Worse, JavaScript can self-modify at run-time - and often does. That means that the security threat may not be in the syntax or the code when it's delivered to the client, but it might appear once the script is executed. Not only would an intermediate security device need to parse the JavaScript, it would need to execute it in order to properly secure it.

While almost all web application security solutions - ours included - are capable of finding specific attacks like XSS and SQL injection that are hidden within JavaScript, none are able to detect and prevent JavaScript code-based exploits unless they can be identified by a specific signature or pattern. And as we've just established, that's no guarantee the exploits won't morph and change as soon as they can be prevented.

That's why browser add-ons like NoScript are so popular. Because JavaScript security today is binary: allow or deny. Period. There's no real in between. There is no JavaScript proxy that parses and rejects malicious script, no solution that proactively scans JavaScript for code-based exploits, no external answer to the problem. That means we have to rely on the browser developers to not only write a good browser with all the bells and whistles we like, but for security, as well.

I am not aware of any security solution that currently parses out JavaScript before it's delivered to the client. If there are any out there, I'd love to hear about them.

Follow me on Twitter View Lori's profile on SlideShare AddThis Feed Button Bookmark and Share



 
      

Feedback


9/12/2008 11:08 AM
Gravatar
I am not aware of any security solution that currently parses out JavaScript before it's delivered to the client. If there are any out there, I'd love to hear about them.

WebCleaner does, and so do NoScript's anti-XSS filters.
See http://hackademix.net/2008/09/12/firewall-makers-vs-evil-javascript/ for my full answer to this interesting post.
Giorgio Maone

9/12/2008 12:20 PM
Gravatar This article is wrong from beginning to end. I will start with the major conclusion, but I doubt that I will have enough time to correct every error.

There are things like Caja, FBJS, and ADsafe, which define safe subsets of JavaScript and enforce them by parsing the JavaScript. But they exclude features that people use frequently, so you can't use them to run arbitrary JavaScript; you have to write the JavaScript with the intent of being Cajoled or whatever.

There are lots of ways to represent loops in the underlying object code, and compilers use lots of them. IDSs do not, as far as I know, look for anomalies in object code, because nobody, as far as I know, has any idea how to do that. There are IDSes that look for anomalies in network activity and system call activity, and there are IDSes that scan object code, but the ones I'm familiar with scan object code by using blacklists of known-evil code segments, not by scanning for anomalies. If there are IDSes that do some kind of anomaly scanning on object code, I'd be interested in hearing about them. Email me: kragen@canonical.org.

The fundamental problem we're talking about is that blacklisting is only effective against known attacks — not attacks against known vulnerabilities, but known attacks. That's true whether we're talking about blacklisting of binary code, system calls, spammers' IP addresses, or fragments of JavaScript.

There's a lot of interesting work in JavaScript in between "allow" and "deny". The first instance is the browser itself, which (when it's working properly, and servers aren't vulnerable to XSS and CSRF and so on) can run JavaScript programs without granting them access to your whole computer, or even your whole browser. Firefox is itself written largely in JavaScript, but still without allowing random web pages to take over the whole browser. And in addition to things like the browser itself and Caja, there are things like GreaseMonkey, which runs "user scripts" with privileges greater than those available to scripts in arbitrary web pages, but still less than those available to the scripts of the chrome of the browser.

Caja and FBJS are particularly interesting because their purpose is to permit many different pieces of JavaScript to collaborate with one another without being vulnerable to, or relying on, one another. I haven't looked into them enough to see if they succeed at this.
Kragen Javier Sitaker

9/12/2008 12:35 PM
Gravatar @Kragen

IDS' are looking for anomalies in the stream. The stream is often text, and in some cases it's executable content - such as when an e-mail is carrying an attachment, or an attachment in a SOAP message, or a download via HTTP. In those cases they are most certainly looking for known signatures indicating problems inside executable content.

Is it *really* the object code? No. I skipped a step in there, my deepest apologies. It doesn't change the basic fact that they're signature based and can't really do anything about interpreted code.

FBJS is in a way different category than generalized JavaScript. ADsafe is a library that runs as part of the page, IOW it's on the browser. I assume Caja is similar. These are not *external* they are still libraries, on the client, running in the confines of the browser.

The general point is that no external - as in *external to the browser* - solution exists that parses/verifies/contains JavaScript in the manner that solutions exist for XML, HTML, and other text-based languages.

But I am glad to see pointers to *some* option for JavaScript, though FBJS is too specific to FB, and the documentation on ADsafe isn't enough to convince me. That's a failure of the documentation, however, not a condemnation of the library.
Lori MacVittie

12/2/2008 6:58 PM
Gravatar Why not try to implement data tainting or information flow to guard against XSS exploits?
Alex
 Leave Feedback
Title  
Name  
Email
Url
Comments   
Please add 5 and 1 and type the answer here: