modperl-discussion_of_apache_cachecontent_proposal_new_cpan_upload

This is part of The Pile, a partial archive of some open source mailing lists and newsgroups.

To: modperl@apache.org
From: Paul Lindner <lindner@inuus.com>
Subject: [RFC] Apache::CacheContent - Caching
PerlFixupHandler
Date: Thu, 6 Dec 2001 08:19:09 -0800

Hi,

I would like to propose a new Apache module before I send it off to
CPAN.  The name chosen is Apache::CacheContent.

It's pretty generic code, and is intended to be subclassed.  It
handles the gory details of caching a page to disk and serving it up
until it expires.  

It's derived from work done on the mod_perl Developer's Cookbook, so
it's already been reviewed by a number of people.

I've attached a README below.  To download it go to
http://www.modperlcookbook.org/code.html




NAME
    Apache::CacheContent - PerlFixupHandler class that caches dynamic
    content

SYNOPSIS
    * Make your method handler a subclass of Apache::CacheContent
    * allow your web server process to write into portions of your document
    root.
    * Add a ttl() subroutine (optional)
    * Add directives to your httpd.conf that are similar to these:
      PerlModule MyHandler

      # dynamic url
      <Location /dynamic>
        SetHandler perl-script
        PerlHandler MyHandler->handler
      </Location>

      # cached URL
      <Location /cached>
        SetHandler perl-script
        PerlFixupHandler MyHandler->disk_cache
        PerlSetVar CacheTTL 120   # in minutes...
      </Location>

DESCRIPTION
    Note:          This code is derived from the *Cookbook::CacheContent*
                   module, available as part of "The mod_perl Developer's
                   Cookbook"

    The Apache::CacheContent module implements a PerlFixupHandler that helps
    you to write handlers that can automatically cache generated web pages
    to disk. This is a definite performance win for sites that end up
    generating the exact same content for many users.

    The module is written to use Apache's built-in file handling routines to
    efficiently serve data to clients. This means that your code will not
    need to worry about HTTP/1.X, byte ranges, if-modified-since, HEAD
    requests, etc. It works by writing files into your DocumentRoot, so be
    sure that your web server process can write there.

    To use this you MUST use mod_perl method handlers. This means that your
    version of mod_perl must support method handlers (the argument
    EVERYTHING=1 to the mod_perl build will do this). Next you'll need to
    have a content-generating mod_perl handler. If isn't a method handler
    modify the *handler* subroutine to read:

      sub handler ($$) {
        my ($class, $r) = @_;
        ....

    Next, make your handler a subclass of *Apache::CacheContent* by adding
    an ISA entry:

      @MyHandler::ISA = qw(Apache::CacheContent);

    You may need to modify your handler code to only look at the *uri* of
    the request. Remember, the cached content is independent of any query
    string or form elements.

    After this is done, you can activate your handler. To use your handler
    in a fully dyamic way configure it as a PerlHandler in your httpd.conf,
    like this:

      PerlModule MyHandler
      <Location /dynamic>
        SetHandler perl-script
        PerlHandler MyHandler->handler
      </Location>

    So requests to *http://localhost/dynamic/foo.html* will call your
    handler method directly. This is great for debugging and testing the
    module. To activate the caching mechanism configure httpd.conf as
    follows:

      PerlModule MyHandler
      <Location /cached>
        SetHandler perl-script
        PerlFixupHandler MyHandler->disk_cache
        PerlSetVar CacheTTL 120  # in minutes..
      </Location>

    Now when you access URLs like *http://localhost/cached/foo.html* the
    content will be generated and stored in the file
    *DocumentRoot*/cached/foo.html. Subsequent request for the same URL will
    return the cached content, depending on the *CacheTTL* setting.

    For further customization you can write your own *ttl* function that can
    dynamically change the caching time based on the current request.

AUTHOR
    Paul Lindner <lindner@inuus.com>, Geoffrey Young, Randy Kobes

SEE ALSO
    The example mod_perl method handler the CacheWeather manpage.

    The mod_perl Developer's Cookbook

===


To: <lindner@inuus.com>, <modperl@apache.org>
From: "Perrin Harkins" <perrin@elem.com>
Subject: Re: [RFC] Apache::CacheContent - Caching
PerlFixupHandler
Date: Thu, 6 Dec 2001 12:31:15 -0500

> I would like to propose a new Apache module before I send it off to
> CPAN.  The name chosen is Apache::CacheContent.

This is very cool.  I was planning to write one of these, and now I don't
have to.  Your implementation is short and interesting.  I was planning to
do it with a PerlFixupHandler and an Apache::Filter module to capture the
output.  While that approach wouldn't require the use of method handlers, I
think yours may be easier for newbies because it doesn't require them to
understand as many modules.  The only real advantage of using Apache::Filter
is that it would work well with existing Registry scripts.

A couple of other C's for your R:

A cache defines parameters that constitute a unique request.  Your cache
currently only handles the filename from the request as a parameter.  It
would be nice to also handle query args, POST data, and arbitrary headers
like cookies or language choices.  You could even support an optional
request_keys method for handlers which would let people generate their own
unique key based on their analysis of the request.

Doing this would mean you would need to generate filenames based on the
unique keys (probably by hashing, as in Cache::FileCache) and do an internal
redirect to that file if available when someone sends a request that
matches.

Another thing that might be nice would be to store the TTL with the file
rather than making the handler give it to you again each time.  This is done
in mod_proxy by putting an Expires header in the file and reading it before
sending the file, but you could also store them in a dbm or something.
Support for sending Expires headers automatically would also be useful.

When I first thought about this problem, I wanted to do it the way Vignette
StoryServer does: by having people link to the cached files directly and
making the content generating code be the 404 handler for those files.  That
gives the best possible performance for cached files, since no
PerlFixupHandler needs to run.  The downside is that then you need an
external process to go through and clean up expired files.  It's also hard
to handle complex cache criteria like query args.  StoryServer does it by
having really crazy generated file names and processing all the links to
files on the way out so that they use the cached file names.  Pretty ugly.

I know you guys are pushing to get the book done, so don't feel pressured to
address this stuff now.  I think the current module looks more than good
enough for an initial CPAN release.

===

To: modperl@apache.org
From: Bill Moseley <moseley@hank.org>
Subject: Re: [RFC] Apache::CacheContent - Caching
PerlFixupHandler
Date: Thu, 06 Dec 2001 10:04:26 -0800

At 08:19 AM 12/06/01 -0800, Paul Lindner wrote:

Ok, hit me over the head.  Why wouldn't you want to use a caching proxy?

BTW -- I think where the docs are cached should be configurable.  I don't
like the idea of the document root writable by the web process.

===

To: Bill Moseley <moseley@hank.org>
From: Tatsuhiko Miyagawa <miyagawa@edge.co.jp>
Subject: Re: [RFC] Apache::CacheContent - Caching
PerlFixupHandler
Date: Fri, 07 Dec 2001 03:26:48 +0900

On Thu, 06 Dec 2001 10:04:26 -0800
Bill Moseley <moseley@hank.org> wrote:

> BTW -- I think where the docs are cached should be configurable.  I don't
> like the idea of the document root writable by the web process.

Maybe:

  Alias /cached /tmp/cache


===

To: lindner@inuus.com
From: Tatsuhiko Miyagawa <miyagawa@edge.co.jp>
Subject: Re: [RFC] Apache::CacheContent - Caching
PerlFixupHandler
Date: Fri, 07 Dec 2001 03:28:19 +0900

On Thu, 6 Dec 2001 08:19:09 -0800
Paul Lindner <lindner@inuus.com> wrote:

> I've attached a README below.  To download it go to
> http://www.modperlcookbook.org/code.html

Nice one. here's a patch to make the sample code work :)


--- CacheContent.pm~    Thu Dec  6 22:11:35 2001
+++ CacheContent.pm     Fri Dec  7 03:23:39 2001
@@ -6,6 +6,7 @@
 @Apache::CacheContent::ISA = qw(Apache);

 use Apache;
+use Apache::Log;
 use Apache::Constants qw(OK SERVER_ERROR DECLINED);
 use Apache::File ();

--- eg/CacheWeather.pm~ Thu Dec  6 08:10:09 2001
+++ eg/CacheWeather.pm  Fri Dec  7 03:24:14 2001
@@ -8,7 +8,7 @@

 use strict;

-@CacheWeather::ISA = qw(Cookbook::CacheContent);
+@CacheWeather::ISA = qw(Apache::CacheContent);

 sub ttl {
   my($self, $r) = @_;


===

To: Bill Moseley <moseley@hank.org>
From: Paul Lindner <lindner@inuus.com>
Subject: Re: [RFC] Apache::CacheContent - Caching
PerlFixupHandler
Date: Thu, 6 Dec 2001 10:33:09 -0800

On Thu, Dec 06, 2001 at 10:04:26AM -0800, Bill Moseley wrote:
> At 08:19 AM 12/06/01 -0800, Paul Lindner wrote:
> 
> Ok, hit me over the head.  Why wouldn't you want to use a caching proxy?

Apache::CacheContent gives you more control over the caching process
and keeps the expiration headers from leaking to the browser.  Or
maybe you want to dynamically control the TTL?

sub ttl {
  ...
  if ($load_avg > 5) {
     return 60 * 5;
  } else {
     return 60;
  }
}

> BTW -- I think where the docs are cached should be configurable.  I don't
> like the idea of the document root writable by the web process.

That's the price you pay for this functionality.  Because we use
Apache's native file serving code we need a url->directory mapping
somewhere.

Of course you don't need to make the entire docroot writable, just the
directory corresponding to your script.

===
To: Paul Lindner <lindner@inuus.com>
From: Matt Sergeant <matt@sergeant.org>
Subject: Re: [RFC] Apache::CacheContent - Caching
PerlFixupHandler
Date: Thu, 6 Dec 2001 18:45:48 +0000 (GMT)

On Thu, 6 Dec 2001, Paul Lindner wrote:

> On Thu, Dec 06, 2001 at 10:04:26AM -0800, Bill Moseley wrote:
> > At 08:19 AM 12/06/01 -0800, Paul Lindner wrote:
> >
> > Ok, hit me over the head.  Why wouldn't you want to use a caching proxy?
>
> Apache::CacheContent gives you more control over the caching process
> and keeps the expiration headers from leaking to the browser.  Or
> maybe you want to dynamically control the TTL?
>
> sub ttl {
>   ...
>   if ($load_avg > 5) {
>      return 60 * 5;
>   } else {
>      return 60;
>   }
> }

While a ttl might be useful to some projects, others I'm sure would prefer
a per-hit checking, so you can say "Yes, this thing has changed now".

===

To: lindner@inuus.com
From: Bill Moseley <moseley@hank.org>
Subject: Re: [RFC] Apache::CacheContent - Caching
PerlFixupHandler
Date: Thu, 06 Dec 2001 10:47:35 -0800

At 10:33 AM 12/06/01 -0800, Paul Lindner wrote:
>On Thu, Dec 06, 2001 at 10:04:26AM -0800, Bill Moseley wrote:
>> At 08:19 AM 12/06/01 -0800, Paul Lindner wrote:
>> 
>> Ok, hit me over the head.  Why wouldn't you want to use a caching proxy?
>
>Apache::CacheContent gives you more control over the caching process
>and keeps the expiration headers from leaking to the browser.

Ok, I see.

>Or maybe you want to dynamically control the TTL?

Would you still use it with a front-end lightweight server?  Even with
caching, a mod_perl server is still used to send the cached file (possibly
over 56K modem), right?

===

To: Bill Moseley <moseley@hank.org>
From: Paul Lindner <lindner@inuus.com>
Subject: Re: [RFC] Apache::CacheContent - Caching
PerlFixupHandler
Date: Thu, 6 Dec 2001 11:19:58 -0800

On Thu, Dec 06, 2001 at 10:47:35AM -0800, Bill Moseley wrote:
> >> Ok, hit me over the head.  Why wouldn't you want to use a caching proxy?
> >
> >Apache::CacheContent gives you more control over the caching process
> >and keeps the expiration headers from leaking to the browser.
> 
> Ok, I see.
> 
> >Or maybe you want to dynamically control the TTL?
> 
> Would you still use it with a front-end lightweight server?  Even with
> caching, a mod_perl server is still used to send the cached file (possibly
> over 56K modem), right?

You definitely want a proxy-cache in front of your mod_perl server.

One thing that I like about this module is that you can control the
server-side caching of content separate from the client/browser cache.

So, on to the RFC.  Is the name acceptable for Apache::*

I will endeavor to add any functionality that makes it worthy :)

For example, I think adding a virtual method that generates the
filename might be useful.

===

To: modperl@apache.org
From: Igor Sysoev <is@rambler-co.ru>
Subject: Re: [RFC] Apache::CacheContent - Caching
PerlFixupHandler
Date: Thu, 6 Dec 2001 23:13:50 +0300 (MSK)

On Thu, 6 Dec 2001, Paul Lindner wrote:

> On Thu, Dec 06, 2001 at 10:04:26AM -0800, Bill Moseley wrote:
> > At 08:19 AM 12/06/01 -0800, Paul Lindner wrote:
> > 
> > Ok, hit me over the head.  Why wouldn't you want to use a caching proxy?
> 
> Apache::CacheContent gives you more control over the caching process
> and keeps the expiration headers from leaking to the browser.  Or
> maybe you want to dynamically control the TTL?

You can use mod_accel to cache flexible backend:
ftp://ftp.lexa.ru/pub/apache-rus/contrib/mod_accel-1.0.7.tar.gz

mod_accel understands standard "Expires" and "Cache-Control" headers
and special "X-Accel-Expires" header (it is not sent to client).
Besides it allows to ignore "Expires" and "Cache-Control" headers
from backend and set expiration by AccelDefaultExpire directive.

Comparing to mod_proxy mod_accel reads backend response
and closes connection to backend as soon as possible.
There is no 2-second backend lingering close timeout
with big answers and slow clients. Big answer means bigger then frontend
kernel TCP-send buffer - 16K in FreeBSD and 64K in Linux by default.
Besides mod_accel read whole POST body before connecting to backend.

mod_accel can ignore client's "Pragma: no-cache",
"Cache-Control: no-cache" and even "Authorization" headers.
mod_accel allow to not pass to backend some URLs.
mod_accel allow to tune various buffer size and timeouts.
mod_accel can cache responses with cookie-depended content.
mod_accel can use busy locks and can limit number of connection to backend.
mod_accel allows simple fault-tolerance with DNS-balanced backends.
mod_accel logs various information about request processing.
mod_accel can invalidate cache on per-URL basis.

mod_accel has two drawbacks only - too much memory per connection
(inherited Apache drawback) and Russian only documentation.

===

To: Paul Lindner <lindner@inuus.com>
From: Andrew Ho <andrew@tellme.com>
Subject: Re: [RFC] Apache::CacheContent - Caching
PerlFixupHandler
Date: Thu, 6 Dec 2001 12:55:25 -0800 (PST)

Hello,

PL>That's the price you pay for this functionality.  Because we use
PL>Apache's native file serving code we need a url->directory mapping
PL>somewhere.
PL>
PL>Of course you don't need to make the entire docroot writable, just the
PL>directory corresponding to your script.

Apologies if this is obvious--I haven't downloaded and tried this module
yet. But would it not be possible to specify a separate directory
altogether and make it serveable (<Directory ...> ... Allow from all ...)?
If so perhaps it'd be easy to add this as a configurable parameter.

In general it is a fine idea to not make the DocumentRoot writeable by the
web user. In fact, I believe it is a good policy that the web user should
be able to write only to a small subset of controlled locations.

===

To: Andrew Ho <andrew@tellme.com>
From: Paul Lindner <lindner@inuus.com>
Subject: Re: [RFC] Apache::CacheContent - Caching
PerlFixupHandler
Date: Thu, 6 Dec 2001 13:06:12 -0800

On Thu, Dec 06, 2001 at 12:55:25PM -0800, Andrew Ho wrote:
> Hello,
> 
> PL>That's the price you pay for this functionality.  Because we use
> PL>Apache's native file serving code we need a url->directory mapping
> PL>somewhere.
> PL>
> PL>Of course you don't need to make the entire docroot writable, just the
> PL>directory corresponding to your script.
> 
> Apologies if this is obvious--I haven't downloaded and tried this module
> yet. But would it not be possible to specify a separate directory
> altogether and make it serveable (<Directory ...> ... Allow from all ...)?
> If so perhaps it'd be easy to add this as a configurable parameter.

Yes, you can do this using the regular Apache directives:

# mkdir /var/cache/www/mydir
# chown apache /var/cache/www/mydir
# vi /etc/httpd/conf/httpd.conf
....

<Directory /var/cache/www/mydir>
    AllowOverride None
    Order allow,deny
    Allow from all
</Directory>

Alias /mydir/ /var/cache/www/mydir/

> In general it is a fine idea to not make the DocumentRoot writeable by the
> web user. In fact, I believe it is a good policy that the web user should
> be able to write only to a small subset of controlled locations.

Yes, I agree totally!  I'll add some warning to the docs to make sure
that people do not inadvertently misconfigure their servers..

===
the rest of The Pile (a partial mailing list archive)
doom@kzsu.stanford.edu