modperl_handing_head_requests

This is part of The Pile, a partial archive of some open source mailing lists and newsgroups.

To: modperl <modperl@apache.org>
From: Robin Berjon <robin@knowscape.com>
Subject: handling HEAD requests
Date: Fri, 15 Dec 2000 20:58:50 +0100

Hi,

I'm working on a modperl site that doesn't presently handle HEAD requests
properly (it returns the entire content). That's definitely a waste
(especially seeing how browsers bang on it with HEAD requests), is not
compliant, and makes telnet debugging a pain.

I know how to detect a HEAD and to return correctly. The problem is that
there are a *lot* of content handlers and that would require patching them all.

>From the discussion we had around it, it seems that there are two
solutions: either put the check in the TransHandler (it knows which urls
map to which content handlers) which would return immediately a 404 or a
200, or put that in the part of the logic that deals with rendering the
content (all content handlers use it) so that it would return only the
headers and skip the actual content.

Choosing between either is hard. The advantage of the TransHandler solution
is that it would avoid potentially costly processing (lots of db requests)
that happens in the content handlers. It's problem is that some of these
content handlers might decide to return 404s themselves, or redirects, or
403s, etc... if certain conditions that require processing are not met.
Thus the TransHandler would return false positives because it only knows
whether there is a registered handler, not what that handler would actually
return. The advantage of putting HEAD checks in the rendering code is that
it would already know the return code for the request as it runs at the end
of the content phase and would thus be able to return correct answers as
well as to add content-length headers, which a certain number of search
engines will probably very much like. However, all the costly processing
will have happened for (next to) nothing, and we wouldn't be saving much.

Does anyone else have experience in dealing with such problems or ideas on
which choice is best ?

===

To: Robin Berjon <robin@knowscape.com>
From: Perrin Harkins <perrin@primenet.com>
Subject: Re: handling HEAD requests
Date: Fri, 15 Dec 2000 12:16:42 -0800 (PST)

On Fri, 15 Dec 2000, Robin Berjon wrote:
> I'm working on a modperl site that doesn't presently handle HEAD
> requests properly (it returns the entire content).

If all the information you need to generate a given page is in the URL,
you can also let mod_proxy cache it and handle the HEAD requests for you.  
Even if these pages depend on cookies, you can use mod_rewrite in the
proxy server to put them into the URL before requesting the page from the
mod_perl server, creating a unique and cacheable (sp?) URL.

Of course you could also do this caching on the mod_perl server yourself
and let Apache handle these pages with core (i.e. as static), but that
sounds like more work.

===
To: Perrin Harkins <perrin@primenet.com>
From: Robin Berjon <robin@knowscape.com>
Subject: Re: handling HEAD requests
Date: Fri, 15 Dec 2000 21:38:40 +0100

At 12:16 15/12/2000 -0800, Perrin Harkins wrote:
>On Fri, 15 Dec 2000, Robin Berjon wrote:
>> I'm working on a modperl site that doesn't presently handle HEAD
>> requests properly (it returns the entire content).
>
>If all the information you need to generate a given page is in the URL,
>you can also let mod_proxy cache it and handle the HEAD requests for you.  
>Even if these pages depend on cookies, you can use mod_rewrite in the
>proxy server to put them into the URL before requesting the page from the
>mod_perl server, creating a unique and cacheable (sp?) URL.
>
>Of course you could also do this caching on the mod_perl server yourself
>and let Apache handle these pages with core (i.e. as static), but that
>sounds like more work.

That's indeed how I would handle it (and in fact, do handle it in other
cases) if it were possible. However, a great majority of those pages are
generated from a database that changes very very frequently so that
caching/static writing wouldn't be really efficient in most cases. There's
no telling when the db changes (which rules out caching) and writing all
the possible pages to disk is, well, probably not really advisable in this
case (lots of data and possible combinations).

===
the rest of The Pile (a partial mailing list archive)
doom@kzsu.stanford.edu