modperl_caching_static_dynamic_content

This is part of The Pile, a partial archive of some open source mailing lists and newsgroups.



To: modperl@apache.org
From: Philip Mak <pmak@aaanime.net>
Subject: Dynamic content that is static
Date: Fri, 22 Dec 2000 21:08:48 -0500 (EST)

Hi everyone,

I have been going over the modperl tuning guide and the suggestions that
people on this list sent me earlier. I've reduced MaxClients to 33 (each
httpd process takes up 3-4% of my memory, so that's how much I can fit
without swapping) so if the web server overloads again, at least it won't
take the machine down with it.

Running a non-modperl apache that proxies to a modperl apache doesn't seem
like it would help much because the vast majority of pages served require
modperl.

I realized something, though: Although the pages on my site are
dynamically generated, they are really static. Their content doesn't
change unless I change the files on the website. (For example,
http://www.animewallpapers.com/wallpapers/ccs.htm depends on header.asp,
footer.asp, series.dat and index.inc. If none of those files change, the
content of ccs.htm remains the same.)

So, it would probably be more efficient if I had a /src directory and a
/html directory. The /src directory could contain my modperl files and a
Makefile that knows the dependencies; when I type "make", it will evaluate
the modperl files and parse them into plain HTML files in the /html
directory.

Does anyone have any suggestions on how to implement this? Is there an
existing tool for doing this? How can I evaluate modperl/Apache::ASP files
from the command line?

===

To: Philip Mak <pmak@aaanime.net>
From: "G.W. Haywood" <ged@www.jubileegroup.co.uk>
Subject: Re: Dynamic content that is static
Date: Sat, 23 Dec 2000 02:24:53 +0000 (GMT)

Hi there,

On Fri, 22 Dec 2000, Philip Mak wrote:

> I realized something, though: Although the pages on my site are
> dynamically generated, they are really static.

You're not alone.

> Does anyone have any suggestions on how to implement this? Is there an
> existing tool for doing this? How can I evaluate modperl/Apache::ASP files
> from the command line?

You could use 'lynx -source URI >filename'.

===
To: Philip Mak <pmak@aaanime.net>
From: Edward Moon <em@mooned.org>
Subject: Re: Dynamic content that is static
Date: Fri, 22 Dec 2000 18:40:29 -0800 (PST)

Not necessarily.

You can use mod_proxy to cache the dynamically generated pages on the
lightweight apache.

Check out <http://perl.apache.org/guide/strategy.html#Apache_s_mod_proxy>
for details on what headers you'll need to set for caching to work.

===
To: Edward Moon <em@mooned.org>
From: Philip Mak <pmak@aaanime.net>
Subject: Re: Dynamic content that is static
Date: Fri, 22 Dec 2000 21:45:40 -0500 (EST)

On Fri, 22 Dec 2000, Edward Moon wrote:

> > Running a non-modperl apache that proxies to a modperl apache doesn't seem
> > like it would help much because the vast majority of pages served require
> > modperl.
>
> Not necessarily.
> 
> You can use mod_proxy to cache the dynamically generated pages on the
> lightweight apache.

I thought about this... but I'm not sure how I would tell the lightweight
Apache to refresh its cache when a file gets changed. I suppose I could
graceful restart it, but the other webmasters of the site do not have root
access. (Or is there another way? Is it possible to teach Apache or Squid 
that ccs.htm depends on header.asp, footer.asp, series.dat and index.inc?)

Also, does this mess up the REMOTE_HOST variable, or is Apache smart
enough to replace that with X-Forwarded-For when the forwarded traffic is
being sent from a local priviledged process?

===

To: <modperl@apache.org>
From: brian d foy <comdog@panix.com>
Subject: Re: Dynamic content that is static
Date: Fri, 22 Dec 2000 21:51:55 -0500 (EST)

On Fri, 22 Dec 2000, Philip Mak wrote:

> So, it would probably be more efficient if I had a /src directory and a
> /html directory. The /src directory could contain my modperl files and a
> Makefile that knows the dependencies; when I type "make", it will evaluate
> the modperl files and parse them into plain HTML files in the /html
> directory.

i've had great success with squid in http accelerator mode.  we squeezed a
factor of 100 in speed with just that. :)

however, i have been talking to a few people about something like a
mod_makefile. :)

===
To: modperl@apache.org
From: "Dave Seidel" <dave@superluminal.com>
Subject: Re: Dynamic content that is static
Date: 22 Dec 2000 22:21:17 EST

>  I realized something, though: Although the pages on my site are
>  dynamically generated, they are really static. Their content doesn't
>  change unless I change the files on the website. (For example,
>  http://www.animewallpapers.com/wallpapers/ccs.htm depends on header.asp,
>  footer.asp, series.dat and index.inc. If none of those files change, the
>  content of ccs.htm remains the same.)

You might want to consider using a tool other than mod_perl in this situation. 
There are preprocessor/compiler-type such as htmlpp or WML (both written in
Perl), or you can build the pages in PHP (e.g.) and compile them into static
pages with the command-line version.  I've used both htmlpp and PHP, and both
work well, and I drive them both with make.

I don't know if either Mason or Embperl offer static compilation, but Mason has
caching and I believe that Embperl is getting caching.	AxKit is also very
cool, and caches.

===
To: Dave Seidel <dave@superluminal.com>
From: Dave Rolsky <autarch@urth.org>
Subject: Re: Dynamic content that is static
Date: Fri, 22 Dec 2000 21:28:25 -0600 (CST)

On 22 Dec 2000, Dave Seidel wrote:

> I don't know if either Mason or Embperl offer static compilation, but Mason has
> caching and I believe that Embperl is getting caching.	AxKit is also very
> cool, and caches.

Using Mason to generate a set of HTML pages would not be too terribly
difficult.

If someone is interested in doing this and needs some guidance I'd be
happy to help.  It would be nice to include an offline site builder with
Mason (or as a separate project).

===
To: <modperl@apache.org>
From: "John Michael" <johnm@acadiacom.net>
Subject: Re: Dynamic content that is static
Date: Fri, 22 Dec 2000 22:29:16 -0600

I may be out of my realm here.  I use mostly perl for everything and have
done similar things.  Create a directory tree with the source files.  In the
source files use something like %%INCLUDE_HEADER%% for each part of the page
that changes and have the script use flat text files for the build.  Have
the script traverse the tree of the source files writing the output to the
html directory.  Whenver you update the flat text files, just run the script
from the command line or write it to run from the web with a password.

Mason does something similar I believe.
You can even have the script write in the %%INCLUDES%% dynamically if you
take in the input and assign it like so.
$$Var = $value;  instead of  $input{'key'} = $value;
Then do the substitutions like so.

foreach (@variables){
     $template_txt =~ s/%%$_%%/$$_/gi;
}

Works great for me.  You can then make any change to the source.html page
and the flat text file without having to change the script.
John Michael
Not a mod perl solution, but it will work.


===
To: Philip Mak <pmak@aaanime.net>
From: Edward Moon <em@mooned.org>
Subject: Re: Dynamic content that is static
Date: Fri, 22 Dec 2000 22:06:54 -0800 (PST)

You should check out the documentation on mod_proxy to see what it's
capable of: <http://httpd.apache.org/docs/mod/mod_proxy.html>

You can specify expiration values and be assured that cached files older
than expiry will be deleted.

So, for example, if you know that your content gets updated every 48 hours
you can specify 'CacheMaxExpire 48' and force the proxy server to
retrieve a new copy every 48 hours.

You can also set headers within a dynamic document that specifies an
expiration time. Check out the link in my previous e-mail for more info.

===

To: modperl@apache.org
From: Bill Moseley <moseley@hank.org>
Subject: Re: Dynamic content that is static
Date: Fri, 22 Dec 2000 23:41:42 -0800

At 09:08 PM 12/22/00 -0500, Philip Mak wrote:
>I realized something, though: Although the pages on my site are
>dynamically generated, they are really static. Their content doesn't
>change unless I change the files on the website.

This doesn't really help with your ASP files, but have you looked at ttree
in the Template Toolkit distribution?

The problem, AFAIK, is that ttree only looks only at the top level
documents and not included templates.  I started to look at
Template::Provider to see if there was an easy way to write out dependency
information to a file, and then stat all those files every five minutes
from a cron job and if anything changes, touch the top level files and then
run ttree again.

I'd like this because I'm generating cobarnded pages with mod_perl, and
many of the pages are really static content.


===

To: Philip Mak <pmak@aaanime.net>
From: Ask Bjoern Hansen <ask@valueclick.com>
Subject: Re: Dynamic content that is static
Date: Fri, 22 Dec 2000 23:44:56 -0800 (PST)

On Fri, 22 Dec 2000, Philip Mak wrote:

> > > Running a non-modperl apache that proxies to a modperl apache doesn't seem
> > > like it would help much because the vast majority of pages served require
> > > modperl.
> >
> > Not necessarily.
> > 
> > You can use mod_proxy to cache the dynamically generated pages on the
> > lightweight apache.
> 
> I thought about this... but I'm not sure how I would tell the lightweight
> Apache to refresh its cache when a file gets changed. I suppose I could
> graceful restart it, but the other webmasters of the site do not have root
> access. (Or is there another way? Is it possible to teach Apache or Squid 
> that ccs.htm depends on header.asp, footer.asp, series.dat and index.inc?)

I don't know with Apache::ASP, but it can probably be done. With
HTMl::Embperl it would be pretty trivial.

You could probably get quite a gain with having a squid or a
mod_proxy process in front. Both because they would slurp up the
data and feed it to slow clients and because if you could cache the
documents for just a minute or so it might save quite some hits on
the backend.
 
> Also, does this mess up the REMOTE_HOST variable, or is Apache smart
> enough to replace that with X-Forwarded-For when the forwarded traffic is
> being sent from a local priviledged process?

Have a look at ftp://ftp.netcetera.dk/pub/apache/mod_proxy_add_forward.c


===

To: Philip Mak <pmak@aaanime.net>
From: Matt Sergeant <matt@sergeant.org>
Subject: Re: Dynamic content that is static
Date: Sat, 23 Dec 2000 09:32:56 +0000 (GMT)

On Fri, 22 Dec 2000, Philip Mak wrote:

> I realized something, though: Although the pages on my site are
> dynamically generated, they are really static. Their content doesn't
> change unless I change the files on the website. (For example,
> http://www.animewallpapers.com/wallpapers/ccs.htm depends on header.asp,
> footer.asp, series.dat and index.inc. If none of those files change, the
> content of ccs.htm remains the same.)

Thats just the sort of layout AxKit is great for. Its basically how Take23
works - even though it looks like a dynamically generated site, its not.
Everything just comes from static files, and when those files change,
AxKit recompiles the pages and caches the results (with some
intelligence associated with the parameters used to determine if this is
the same view of that URL).

===
To: brian d foy <comdog@panix.com>
From: barries <barries@slaysys.com>
Subject: Re: Dynamic content that is static
Date: Sat, 23 Dec 2000 07:57:07 -0500

On Fri, Dec 22, 2000 at 09:51:55PM -0500, brian d foy wrote:
> 
> however, i have been talking to a few people about something like a
> mod_makefile. :)

I've used this approach succesfully on a lower volume site where the it
was taking lots of time to build the final HTML but the data sources
didn't change much.  <plug>I have a module (Slay::Maker) I use for
exactly this purpose that takes a "makefile" written in Perl and I use
that to rebuild the pages, and if no page needs to be rebuilt, I can 304
the result</plug>.  A mod_makefile would be even nicer, being written in
C.

If you're looking for Perlish makes, Nick Ing-Simmons also has a
Make.pm, there's a Makepp project out there, and <plug>I have an unreleased
but releasable Make.pm that supports most of the GNU constructs</plug>.

Putting squid or something in front of a heavily trafficed site (and
remembering to flush all or part of it's cache when you change the back
end would definitely help) and using a makefile approach on the backend
to avoid reading & writing lots of rarely updated data every page view
should both help.

===

To: Philip Mak <pmak@aaanime.net>, modperl@apache.org
From: Joshua Chamas <ayafm@yahoo.com>
Subject: Re: Dynamic content that is static
Date: Mon, 25 Dec 2000 16:36:47 -0800 (PST)

Apache::ASP has a cgi/asp script in the distribution that 
I use to generate apache-asp.org and chamas.com.  Its 
a bit rough but works for static HTML generation from
ASP scripts.  Also you can consider using a combination
of mod_proxy and specific headers like Expires to cache
your content for X time like 1 hour.  

I have never done the latter, and you should use 
lwp-request -ed do see what headers your site is really 
sending, as ASP sends a no-cache pragma by default which 
you'll need to override, I forget which header at this time.  
The proxy method should work generally in this case, though
I like generating content that can be offline.

==

To: "'Philip Mak'" <pmak@aaanime.net>, <modperl@apache.org>
From: "Christian Gilmore" <cgilmore@tivoli.com>
Subject: RE: Dynamic content that is static
Date: Thu, 28 Dec 2000 13:09:50 -0600

You might want to take a look at Strudel. It is a project people from my
last job were working on: http://www.research.att.com/~mff/strudel/.

===

To: modperl@apache.org
From: Elman Vagif Abdullaev <eva2e@neon.mail.virginia.edu>
Subject: Re: dynamic cache allocation
Date: Tue, 9 Jan 2001 20:04:09 -0500 (EST)

Well right now proxy caching have static cache allocation, i.e. it caches
whatever the last request was asking for. But what I mean by dynamic
caching is to have some sort of script or module that will check if the
data stored on the proxy server is outdated. It will check if the data in
the database that was used to generate the responce to some query was
chenged and therefore data in proxy is not valid anymore. If it was
changed then we again go to database and generate the responce. If the
database was not changed we can just give whatever was stored in the
databse to the rerquesting user.
I am just wondering if something like that was done already since it would
speed up internet.

===

To: Elman Vagif Abdullaev <eva2e@neon.mail.virginia.edu>
From: Perrin Harkins <perrin@primenet.com>
Subject: Re: dynamic cache allocation
Date: Wed, 10 Jan 2001 15:00:53 -0800 (PST)

On Tue, 9 Jan 2001, Elman Vagif Abdullaev wrote:
> dynamic caching is to have some sort of script or module that will
> check if the data stored on the proxy server is outdated. It will
> check if the data in the database that was used to generate the
> responce to some query was chenged and therefore data in proxy is not
> valid anymore. If it was changed then we again go to database and
> generate the responce.

That only makes sense if you can test whether or not the data was changed
much more quickly than you can actually re-generate the page.  IN most
cases, the point of the cache is to avoid going to the database in the
first place.  A better idea might be to have a background process that
checks your database every few minutes to determine what has changed and
deletes anything in your proxy cache that contains that cached
data.  Users may get slightly out of date data using this method, but it's
much faster than going to the database every time.

By far the simplest approach is to just set a time-to-live (TTL) on each
page using Expires headers and let mod_proxy or Squid worry about removing
the page after that.  But the approach you choose must be appropriate to
the type of data you have and how often it changes.

===


the rest of The Pile (a partial mailing list archive)

doom@kzsu.stanford.edu