Topics


Blogs


Forums


Samples


Media


Labs


Resources

 




DevCentral > Weblogs > Lori MacVittie - Two Different Socks
 HTTP Pipelining: A security risk without real performance benefits
posted on Thursday, April 02, 2009 3:30 AM

Everyone wants web sites and applications to load faster, and there’s no shortage of folks out there looking for ways to do just that. But all that glitters is not gold, and not all acceleration techniques actually do all that much to accelerate the delivery of web sites and applications. Worse, some actual incur risk in the form of leaving servers open to exploitation.

A BRIEF HISTORY

Back in the day when HTTP was still evolving, someone came up with the concept of persistent IMAGE COURTESY WIKIPEDIA COMMONS connections. See, in ancient times – when administrators still wore togas in the data center – HTTP 1.0 required one TCP connection for every object on a page. That was okay, until pages started comprising ten, twenty, and more objects. So someone added an HTTP header, Keep-Alive, which basically told the server not to close the TCP connection until (a) the browser told it to or (b) it didn’t hear from the browser for X number of seconds (a time out). This eventually became the default behavior when HTTP 1.1 was written and became a standard.

I told you it was a brief history.

This capability is known as a persistent connection, because the connection persists across multiple requests. This is not the same as pipelining, though the two are closely related. Pipelining takes the concept of persistent connections and then ignores the traditional request – reply relationship inherent in HTTP and throws it out the window. The general line of thought goes like this:

IMAGE COURTESY WIKIPEDIA COMMONS “Whoa. What if we just shoved all the requests from a page at the server and then waited for them all to come back rather than doing it one at a time? We could make things even faster!”

Tada! HTTP pipelining.

In technical terms, HTTP pipelining is initiated by the browser by opening a connection to the server and then sending multiple requests to the server without waiting for a response. Once the requests are all sent then the browser starts listening for responses. The reason this is considered an acceleration technique is that by shoving all the requests at the server at once you essentially save the RTT (Round Trip Time) on the connection waiting for a response after each request is sent.

WHY IT JUST DOESN’T MATTER ANYMORE (AND MAYBE NEVER DID)

Unfortunately, pipelining was conceived of and implemented before broadband connections were widely utilized as a method of accessing the Internet. Back then, the RTT was significant enough to have a negative impact on application and web site performance and the overall user-experience was improved by the use of pipelining. Today, however, most folks have a comfortable speed at which they access the Internet and the RTT impact on most web application’s performance, despite the increasing number of objects per page, is relatively low.

There is no arguing, however, that some reduction in time to load is better than none. Too, anyone who’s had to access the Internet via high latency links can tell you anything that makes that experience faster has got to be a Good Thing. So what’s the problem?

The problem is that pipelining isn’t actually treated any differently on the server than regular old persistent connections. In fact, the HTTP 1.1 specification requires that a “server MUST send its responses to those requests in the same order that the requests were received.” In other words, the requests are return in serial, despite the fact that some web servers may actually process those requests in parallel. Because the server MUST return responses to requests in order that the server  has to do some extra processing to ensure compliance with this part of the HTTP 1.1 specification. It has to queue up the responses and make certain responses are returned properly, which essentially negates the performance gained by reducing the number of round trips using pipelining.

Depending on the order in which requests are sent, if a request requiring particularly lengthy processing – say a database query – were sent relatively early in the pipeline, this could actually cause a degradation in performance because all the other responses have to wait for the lengthy one to finish before the others can be sent back. 

Application intermediaries such as proxies, application delivery controllers, and general load-balancers can and do support pipelining, but they, too, will adhere to the protocol specification and return responses in the proper order according to how the requests were received. This limitation on the server side actually inhibits a potentially significant boost in performance because we know that processing dynamic requests takes longer than processing a request for static content. If this limitation were removed it is possible that the server would become more efficient and the user would experience non-trivial improvements in performance. Or, if intermediaries were smart enough to rearrange requests such that they their execution were optimized (I seem to recall I was required to design and implement a solution to a similar example in graduate school) then we’d maintain the performance benefits gained by pipelining. But that would require an understanding of the application that goes far beyond what even today’s most intelligent application delivery controllers are capable of providing.

THE SILVER LINING

At this point it may be fairly disappointing to learn that HTTP pipelining today does not result in as significant a performance gain as it might at first seem to offer (except over high latency links like satellite or dial-up, which are rapidly dwindling in usage). But that may very well be a good thing.

As miscreants have become smarter and more intelligent about exploiting protocols and not just application code, they’ve learned to take advantage of the protocol to “trick” servers into believing their requests are legitimate, even though the desired result is usually malicious. In the case of pipelining, it would be a simple thing to exploit the capability to enact a layer 7 DoS attack on the server in question. Because pipelining assumes that requests will be sent one after the other and that the client is not waiting for the response until the end, it would have a difficult time distinguishing between someone attempting to consume resources and a legitimate request.

Consider that the server has no understanding of a “page”. It understands individual requests. It has no way of knowing that a “page” consists of only 50 objects, and therefore a client pipelining requests for the maximum allowed – by default 100 for Apache – may not be seen as out of the ordinary. Several clients opening connections and pipelining hundreds or thousands of requests every second without caring if they receive any of the responses could quickly consume the server’s resources or available bandwidth and result in a denial of service to legitimate users.

So perhaps the fact that pipelining is not really all that useful to most folks is a good thing, as server administrators can disable the feature without too much concern and thereby mitigate the risk of the feature being leveraged as an attack method against them.

Pipelining as it is specified and implemented today is more of a security risk than it is a performance enhancement. There are, however, tweaks to the specification that could be made in the future that might make it more useful. Those tweaks do not address the potential security risk, however, so perhaps given that there are so many other optimizations and acceleration techniques that can be used to improve performance that incur no measurable security risk that we simply let sleeping dogs lie.

IMAGES COURTESTY WIKIPEDIA COMMONS

Follow me on Twitter View Lori's profile on SlideShare friendfeedicon_facebook AddThis Feed Button Bookmark and Share

Reblog this post [with Zemanta]


 
      

Feedback


4/2/2009 5:40 AM
Gravatar so what was the point of requiring servers to respond to request in order? Seems silly.
mike

4/2/2009 6:17 AM
Gravatar @mike

There was actually a good discussion on the subject I found on a mailing list while researching (and of course can't find it now, grrr...). There was actually a good technical reason behind the decision - at least at the time. Whether they are still valid or not is up for debate.

I will keep looking for the discussion and if/when I find it I'll post a link to it.

Lori
Lori MacVittie

4/23/2009 3:39 AM
Gravatar Jedi Mind Tricks: HTTP Request Smuggling
Lori MacVittie

4/10/2009 4:07 PM
Gravatar Lori, in your description you make one assumption that is not required: even though a client may pipeline HTTP requests, it is not necessary that the server process the requests in parallel, i.e. pipelining achieves the desired goal of hiding network RTT's even if the server processes requests in series (it's just that the requests arrive at the server much sooner than without pipelining).

We use pipelining all the time, and see tremendous benefit, even from browsers over DSL links. Try out firebug etc. if you'ld like to confirm for yourself.
pipelining works

4/29/2009 3:47 AM
Gravatar Lori probably sees it with the eyes of a user (client) and not with the eye of a heavily operating web server who has to open and close thousands of requests per second or a firewall which has maybe a hundred heavily operated web servers behind it and REALLY HATES extra connections:p The clients don't see much difference, nor a web server which operates at 100 requests per second or so.
Anonymous

8/22/2009 9:52 PM
Gravatar Drug levitra. Levitra.
Levitra.

1/4/2010 3:40 PM
Gravatar Wouldn't pipelining be simpler and more efficient if there was a matching request id in the request's and response's header?

You could throw as many pending requests as needed on one socket and wait to receive a response with a matching request id.
JeffCyr

1/5/2010 10:14 AM
Gravatar @JeffCyr

You mean something more along the lines of Google SPDY? It probably would be at that. Though that might complicate the client as it doesn't currently support that kind of a model.

macvittie

1/5/2010 12:50 PM
Gravatar @macvittie

It's the first time I hear about SPDY, they nicely address the concerns I had with HTTP limitations.

Thanks.
JeffCyr
 Leave Feedback
Title  
Name  
Email
Url
Comments   
Please add 6 and 6 and type the answer here: