Now if we look at the HTTP specification at the definition of the HEAD method we see :
The HEAD method is identical to GET except that the server MUST NOT return a message-body in the response.
So what is the problem?
The problem is performance. If the body generation is "CPU / IO / MEMORY" intensive operation you would like to do this operation as little as possible, so what you really should do in this case is implement the doHead method and calculate the HTTP headers for this request and set them, then implement the doGet method in the same way in addition to the body generation and setting. Because the default servlet container is working in this way, it is very easy not to pay attention to this, and implement only the doGet (which is very common thing to do) and don't think of implement anything else.
Now for browser based applications, this is fine, because when writing a web UI you usually will implement only doGet and doPost methods, but when working with servlets you often do not write regular user UI application. It is very low level API for this, so it is used for web services, or other processing methods. At this time i would expect the container not to act as it acts.
By the way, if you implement doPost for example and execute HEAD on the same servlet, you get 405 method not allowed status, as expected, so the only confusion happens when you implement doGet.
I even looked at the servlet 2.5 spec itself to see if the spec authors are mentioning this behavior anywhere, but could not find anything. Hence this is just container's optimization :)
I guess that in some cases this could even lead to some memory problems if the servlet container implementation is poor...
Beware...

Why does it matter? HEAD is almost never called, right?
ReplyDeleteHEAD is used when you don't actually need a file's contents.
ReplyDeleteThis is useful to check characteristics of a resource without actually downloading it, thus saving bandwidth.
Basically some other implementations may call your servlet, and they may decide to call the HEAD method before invoking the real GET method (like sending a spy to see what type of resource is it, it's size and other interesting information). This is what happened in my case, I had to write a servlet which is called by other server which I don't own. My goGet method is very resource intensive call, so calling it twice is a big waste..
That seems like sensible behaviour, IMO. It could perhaps be better documented, but it does meet with the HTTP standard and it does ensure that your HEAD is identical to your GET except for the content.
ReplyDeleteIf you can return identical headers without the overhead then that's great - separate it out and implement the doHead() method. If you can't return identical headers without the overhead then the only way to do it is to generate everything and drop the body, which is exactly what they do.
I know what HEAD is for, I'm just saying that it doesn't get called often. When I look at apache logs for the web sites I run, only GET and POST get called. None of the others get called. Ever.
ReplyDeleteIf it never gets called, why worry about it?
It is called. Maybe not in your scenario.. In fact in our scenario, it is called each time before the actual GET is called, the client (another server) is executing it to see if it really need to call the expensive GET method..
ReplyDelete