Monday, April 19, 2010

When doGet is not really doGet and why it is dangerous?

HTTP specification defines 7 HTTP methods: GET, PUT, HEAD, TRACE, POST, DELETE and OPTIONS. When you write a HttpServlet you may decide to implement each and every one of these methods. For example if i implement the doPost method, this method will be invoked when a HTTP POST request is sent to my server. Simple enough. I don't expect any other method to be executed when i implement the POST method.

Now if we look at the HTTP specification at the definition of the HEAD method we see :

The HEAD method is identical to GET except that the server MUST NOT return a message-body in the response.

Now what happens if you implement the doGet method, and don't implement the doHead? I would expect the servlet container return 405 method not allowed status, as it would do for any other HTTP method that i did not implement. But what really happens is the servlet container is calling the doGet instead, and drops the body (if you wrote anything to it). I can understand why this was the servlet container implementation, according the the specification the functionality still holds if we drop the body, we still return the HTTP headers and everybody happy...

So what is the problem?

The problem is performance. If the body generation is "CPU / IO / MEMORY" intensive operation you would like to do this operation as little as possible, so what you really should do in this case is implement the doHead method and calculate the HTTP headers for this request and set them, then implement the doGet method in the same way in addition to the body generation and setting. Because the default servlet container is working in this way, it is very easy not to pay attention to this, and implement only the doGet (which is very common thing to do) and don't think of implement anything else.

Now for browser based applications, this is fine, because when writing a web UI you usually will implement only doGet and doPost methods, but when working with servlets you often do not write regular user UI application. It is very low level API for this, so it is used for web services, or other processing methods. At this time i would expect the container not to act as it acts.

By the way, if you implement doPost for example and execute HEAD on the same servlet, you get 405 method not allowed status, as expected, so the only confusion happens when you implement doGet.

I even looked at the servlet 2.5 spec itself to see if the spec authors are mentioning this behavior anywhere, but could not find anything. Hence this is just container's optimization :)

I guess that in some cases this could even lead to some memory problems if the servlet container implementation is poor...

Beware...

5 comments:

  1. Why does it matter? HEAD is almost never called, right?

    ReplyDelete
  2. HEAD is used when you don't actually need a file's contents.
    This is useful to check characteristics of a resource without actually downloading it, thus saving bandwidth.
    Basically some other implementations may call your servlet, and they may decide to call the HEAD method before invoking the real GET method (like sending a spy to see what type of resource is it, it's size and other interesting information). This is what happened in my case, I had to write a servlet which is called by other server which I don't own. My goGet method is very resource intensive call, so calling it twice is a big waste..

    ReplyDelete
  3. That seems like sensible behaviour, IMO. It could perhaps be better documented, but it does meet with the HTTP standard and it does ensure that your HEAD is identical to your GET except for the content.

    If you can return identical headers without the overhead then that's great - separate it out and implement the doHead() method. If you can't return identical headers without the overhead then the only way to do it is to generate everything and drop the body, which is exactly what they do.

    ReplyDelete
  4. I know what HEAD is for, I'm just saying that it doesn't get called often. When I look at apache logs for the web sites I run, only GET and POST get called. None of the others get called. Ever.

    If it never gets called, why worry about it?

    ReplyDelete
  5. It is called. Maybe not in your scenario.. In fact in our scenario, it is called each time before the actual GET is called, the client (another server) is executing it to see if it really need to call the expensive GET method..

    ReplyDelete