This is part of The Pile, a partial archive of some open source mailing lists and newsgroups.
Date: Thu, 21 Dec 2000 14:21:10 +0800 To: mod_perl list <modperl@apache.org> From: Gunther Birznieks <gunther@extropia.com> Subject: Fwd: [speedycgi] Speedycgi scales better than mod_perl with scripts that contain un-shared memory FYI -- Sam just posted this to the speedycgi list just now. >X-Authentication-Warning: www.newlug.org: majordom set sender to >owner-speedycgi@newlug.org using -f >To: speedycgi@newlug.org >Subject: [speedycgi] Speedycgi scales better than mod_perl with scripts >that contain un-shared memory >Date: Wed, 20 Dec 2000 20:18:37 -0800 >From: Sam Horrocks <sam@daemoninc.com> >Sender: owner-speedycgi@newlug.org >Reply-To: speedycgi@newlug.org > >Just a point in speedy's favor, for anyone interested in performance tuning >and scalability. > >A lot of mod_perl performance tuning involves trying to keep from creating >"un-shared" memory - that is memory that a script uses while handling >a request that is private to that script. All perl scripts use some >amount of un-shared memory - anything derived from user-input to the >script via queries or posts for example has to be un-shared because it >is unique to that run of that script. > >You can read all about mod_perl shared memory issues at: > > http://perl.apache.org/guide/performance.html#Sharing_Memory > >The underlying problem in mod_perl is that apache likes to spread out >web requests to as many httpd's, and therefore as many mod_perl interpreters, >as possible using an LRU selection processes for picking httpd's. For >static web-pages where there is almost zero un-shared memory, the selection >process doesn't matter much. But when you load in a perl script with >un-shared memory, it can really bog down the server. > >In SpeedyCGI's case, all perl memory is un-shared because there's no >parent to pre-load any of the perl code into memory. It could benefit >somewhat from reducing this amount of un-shared memory if it had such >a feature, but the fact that SpeedyCGI chooses backends using an MRU >selection process means that it is much less prone to problems that >un-shared memory can cause. > >I wanted to see how this played out in real benchmarks, so I wrote the >following test script that uses un-shared memory: > >use CGI; >$x = 'x' x 50000; # Use some un-shared memory (*not* a memory leak) >my $cgi = CGI->new(); >print $cgi->header(); >print "Hello "; >print "World"; > >I then ran ab to benchmark how well mod_speedycgi did versus mod_perl >on this script. When using no concurrency ("ab -c 1 -n 10000") >mod_speedycgi and mod_perl come out about the same. However, by >increasing the concurrency level, I found that mod_perl performance drops >off drastically, while mod_speedycgi does not. In my case at about level >100, the rps number drops by 50% and the system starts paging to disk >while using mod_perl, whereas the mod_speedycgi numbers stay at about >the same level. > >The problem is that at a high concurrency level, mod_perl is using lots >and lots of different perl-interpreters to handle the requests, each >with its own un-shared memory. It's doing this due to its LRU design. >But with SpeedyCGI's MRU design, only a few speedy_backends are being used >because as much as possible it tries to use the same interpreter over and >over and not spread out the requests to lots of different interpreters. >Mod_perl is using lots of perl-interpreters, while speedycgi is only using >a few. mod_perl is requiring that lots of interpreters be in memory in >order to handle the requests, wherase speedy only requires a small number >of interpreters to be in memory. And this is where the paging comes in - >at a high enough concurency level, mod_perl starts using lots of memory >to hold all of those interpreters, eventually running out of real memory >and at that point it has to start paging. And when the paging starts, >the performance really nose-dives. > >With SpeedyCGI, at the same concurrency level, the total memory >requirements for all the intepreters are much much smaller. Eventually >under a large enough load and with enough un-shared memory, SpeedyCGI >would probably have to start paging too. But due to its design the point >at which SpeedyCGI will start doing this is at a much higher level than >with mod_perl. === Date: Thu, 21 Dec 2000 02:01:48 -0600 Message-ID: <20001221020150-r01010600-10f24e81@10.0.0.2> From: "Ken Williams" <ken@forum.swarthmore.edu> To: "mod_perl list" <modperl@apache.org> Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl with scripts that contain un-shared memory Well then, why doesn't somebody just make an Apache directive to control how hits are divvied out to the children? Something like NextChild most-recent NextChild least-recent NextChild (blah...) but more well-considered in name. Not sure whether a config directive would do it, or whether it would have to be a startup command-line switch. Or maybe a directive that can only happen in a startup config file, not a .htaccess file. === Date: Thu, 21 Dec 2000 00:41:18 -0800 From: Perrin Harkins <perrin@primenet.com> Reply-To: perrin@primenet.com To: Gunther Birznieks <gunther@extropia.com>, Sam Horrocks <sam@daemoninc.com> CC: mod_perl list <modperl@apache.org> Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory Gunther Birznieks wrote: > Sam just posted this to the speedycgi list just now. [...] > >The underlying problem in mod_perl is that apache likes to spread out > >web requests to as many httpd's, and therefore as many mod_perl interpreters, > >as possible using an LRU selection processes for picking httpd's. Hmmm... this doesn't sound right. I've never looked at the code in Apache that does this selection, but I was under the impression that the choice of which process would handle each request was an OS dependent thing, based on some sort of mutex. Take a look at this: http://httpd.apache.org/docs/misc/perf-tuning.html Doesn't that appear to be saying that whichever process gets into the mutex first will get the new request? In my experience running development servers on Linux it always seemed as if the the requests would continue going to the same process until a request came in when that process was already busy. As I understand it, the implementation of "wake-one" scheduling in the 2.4 Linux kernel may affect this as well. It may then be possible to skip the mutex and use unserialized accept for single socket servers, which will definitely hand process selection over to the kernel. > >The problem is that at a high concurrency level, mod_perl is using lots > >and lots of different perl-interpreters to handle the requests, each > >with its own un-shared memory. It's doing this due to its LRU design. > >But with SpeedyCGI's MRU design, only a few speedy_backends are being used > >because as much as possible it tries to use the same interpreter over and > >over and not spread out the requests to lots of different interpreters. > >Mod_perl is using lots of perl-interpreters, while speedycgi is only using > >a few. mod_perl is requiring that lots of interpreters be in memory in > >order to handle the requests, wherase speedy only requires a small number > >of interpreters to be in memory. This test - building up unshared memory in each process - is somewhat suspect since in most setups I've seen, there is a very significant amount of memory being shared between mod_perl processes. Regardless, the explanation here doesn't make sense to me. If we assume that each approach is equally fast (as Sam seems to say earlier in his message) then it should take an equal number of speedycgi and mod_perl processes to handle the same concurrency. That leads me to believe that what's really happening here is that Apache is pre-forking a bit over-zealously in response to a sudden surge of traffic from ab, and thus has extra unused processes sitting around waiting, while speedycgi is avoiding this situation by waiting for someone to try and use the processes before forking them (i.e. no pre-forking). The speedycgi way causes a brief delay while new processes fork, but doesn't waste memory. Does this sound like a plausible explanation to folks? This is probably all a moot point on a server with a properly set MaxClients and Apache::SizeLimit that will not go into swap. I would expect mod_perl to have the advantage when all processes are fully-utilized because of the shared memory. It would be cool if speedycgi could somehow use a parent process model and get the shared memory benefits too. Speedy seems like it might be more attractive to ISPs, and it would be nice to increase interoperability between the two projects. === Date: Thu, 21 Dec 2000 08:40:47 +0000 (GMT) From: Matt Sergeant <matt@sergeant.org> To: Ken Williams <ken@forum.swarthmore.edu> cc: mod_perl list <modperl@apache.org> Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl with scripts that contain un-shared memory On Thu, 21 Dec 2000, Ken Williams wrote: > Well then, why doesn't somebody just make an Apache directive to control how > hits are divvied out to the children? Something like > > NextChild most-recent > NextChild least-recent > NextChild (blah...) > > but more well-considered in name. Not sure whether a config directive > would do it, or whether it would have to be a startup command-line > switch. Or maybe a directive that can only happen in a startup config > file, not a .htaccess file. Probably nobody wants to do it because Apache 2.0 fixes this "bug". === Date: Thu, 21 Dec 2000 19:38:45 +0800 To: Sam Horrocks <sam@daemoninc.com>, perrin@primenet.com From: Gunther Birznieks <gunther@extropia.com> Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory Cc: mod_perl list <modperl@apache.org> I think you could actually make speedycgi even better for shared memory usage by creating a special directive which would indicate to speedycgi to preload a series of modules. And then to tell speedy cgi to do forking of that "master" backend preloaded module process and hand control over to that forked process whenever you need to launch a new process. Then speedy would potentially have the best of both worlds. Sorry I cross posted your thing. But I do think it is a problem of mod_perl also, and I am happily using speedycgi in production on at least one commercial site where mod_perl could not be installed so easily because of infrastructure issues. I believe your mechanism of round robining among MRU perl interpreters is actually also accomplished by ActiveState's PerlEx (based on Apache::Registry but using multithreaded IIS and pool of Interpreters). A method similar to this will be used in Apache 2.0 when Apache is multithreaded and therefore can control within program logic which Perl interpeter gets called from a pool of Perl interpreters. It just isn't so feasible right now in Apache 1.0 to do this. And sometimes people forget that mod_perl came about primarily for writing handlers in Perl not as an application environment although it is very good for the later as well. I think SpeedyCGI needs more advocacy from the mod_perl group because put simply speedycgi is way easier to set up and use than mod_perl and will likely get more PHP people using Perl again. If more people rely on Perl for their fast websites, then you will get more people looking for more power, and by extension more people using mod_perl. Whoops... here we go with the advocacy thing again. === To: modperl@apache.org Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl with scripts that contain un-shared memory From: Joe Schaefer <joe@sunstarsys.com> Date: 21 Dec 2000 09:53:06 -0500 [ Sorry for accidentally spamming people on the list. I was ticked off by this "benchmark", and accidentally forgot to clean up the reply names. I won't let it happen again :( ] Matt Sergeant <matt@sergeant.org> writes: > On Thu, 21 Dec 2000, Ken Williams wrote: > > > Well then, why doesn't somebody just make an Apache directive to control how > > hits are divvied out to the children? Something like > > > > NextChild most-recent > > NextChild least-recent > > NextChild (blah...) > > > > but more well-considered in name. Not sure whether a config directive > > would do it, or whether it would have to be a startup command-line > > switch. Or maybe a directive that can only happen in a startup config > > file, not a .htaccess file. > > Probably nobody wants to do it because Apache 2.0 fixes this "bug". > KeepAlive On :) All kidding aside, the problem with modperl is memory consumption, and to use modperl seriously, you currently have to code around that (preloading commonly used modules like CGI, or running it in a frontend/backend config similar to FastCGI.) FastCGI and modperl are fundamentally different technologies. Both have the ability to accelerate CGI scripts; however, modperl can do quite a bit more than that. Claimed benchmarks that are designed to exploit this memory issue are quite silly, especially when the actual results are never revealed. It's overzealous advocacy or FUD, depending on which side of the fence you are sitting on. === To: Gunther Birznieks <gunther@extropia.com> Cc: modperl@apache.org Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl with scripts that contain un-shared memory From: Joe Schaefer <joe@sunstarsys.com> Date: 21 Dec 2000 10:37:28 -0500 Gunther Birznieks <gunther@extropia.com> writes: > But instead he crafted an experiment to show that in this particular case > (and some applications do satisfy this case) SpeedyCGI has a particular > benefit. And what do I have to do to repeat it? Unlearn everything in Stas' guide? > > This is why people use different tools for different jobs -- because > architecturally they are designed for different things. SpeedyCGI is > designed in a different way from mod_perl. What I believe Sam is saying is > that there is a particular real-world scenario where SpeedyCGI likely has > better performance benefits to mod_perl. Sure, and that's why some people use it. But to say "Speedycgi scales better than mod_perl with scripts that contain un-shared memory" is to me quite similar to saying "SUV's are better than cars since they're safer to drive drunk in." > > Discouraging the posting of experimental information like this is where the > FUD will lie. This isn't an advertisement in ComputerWorld by Microsoft or > Oracle, it's a posting on a mailing list. Open for discussion. Maybe I'm wrong about this, but I didn't see any mention of the apparatus used in his experiment. I only saw what you posted, and your post had only anecdotal remarks of results without detailing any config info. I'm all for free and open discussions because they can point to interesting new ideas. However, some attempt at full disclosure (comments on the config used are as important important than anecdotal remarks about the results) is necessary so objective opinions can be formed. === Date: Thu, 21 Dec 2000 23:24:43 +0800 To: Joe Schaefer <joe@sunstarsys.com>, modperl@apache.org From: Gunther Birznieks <gunther@extropia.com> Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl with scripts that contain un-shared memory At 09:53 AM 12/21/00 -0500, Joe Schaefer wrote: >[ Sorry for accidentally spamming people on the > list. I was ticked off by this "benchmark", > and accidentally forgot to clean up the reply > names. I won't let it happen again :( ] Not sure what you mean here. Some people like the duplicate reply names especially as the mod_perl list is still a bit slow on responding. I know I prefer to see replies to my messages ASAP and they tend to come faster if I am CCed on the list. >All kidding aside, the problem with modperl is memory consumption, >and to use modperl seriously, you currently have to code around >that (preloading commonly used modules like CGI, or running it in >a frontend/backend config similar to FastCGI.) FastCGI and modperl >are fundamentally different technologies. Both have the ability >to accelerate CGI scripts; however, modperl can do quite a bit >more than that. > >Claimed benchmarks that are designed to exploit this memory issue >are quite silly, especially when the actual results are never >revealed. It's overzealous advocacy or FUD, depending on which >side of the fence you are sitting on. I think I get your point on the first paragraph. But the 2nd paragraph is odd. Are you classifying the original post as being overzealous advocacy or FUD? I don't think I would classify it as such. I could see it bordering on FUD if there was one benchmark which Sam produced and he just posted "SpeedyCGI is faster than mod_perl" without providing any details. But instead he crafted an experiment to show that in this particular case (and some applications do satisfy this case) SpeedyCGI has a particular benefit. This is why people use different tools for different jobs -- because architecturally they are designed for different things. SpeedyCGI is designed in a different way from mod_perl. What I believe Sam is saying is that there is a particular real-world scenario where SpeedyCGI likely has better performance benefits to mod_perl. Discouraging the posting of experimental information like this is where the FUD will lie. This isn't an advertisement in ComputerWorld by Microsoft or Oracle, it's a posting on a mailing list. Open for discussion. === Date: Thu, 21 Dec 2000 11:11:03 -0500 To: "mod_perl list" <modperl@apache.org> Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl with scripts that contain un-shared memory >>>>> "KW" == Ken Williams <ken@forum.swarthmore.edu> writes: KW> Well then, why doesn't somebody just make an Apache directive to KW> control how hits are divvied out to the children? Something like According to memory, mod_perl 2.0 uses a most-recently-used strategy to pull perl interpreters from the thread pool. It sounds to me like with apache 2.0 in thread-mode and mod_perl 2.0 you get the same effect of using the proxy front end that we currently need. Date: Thu, 21 Dec 2000 11:06:54 -0600 From: "Keith G. Murphy" <keithmur@mindspring.com> To: mod_perl list <modperl@apache.org> Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory Perrin Harkins wrote: > [cut] > > Doesn't that appear to be saying that whichever process gets into the > mutex first will get the new request? In my experience running > development servers on Linux it always seemed as if the the requests > would continue going to the same process until a request came in when > that process was already busy. > Is it possible that the persistent connections utilized by HTTP 1.1 just made it look that way? Would happen if the clients were MSIE. Even recent Netscape browsers only use 1.0, IIRC. (I was recently perplexed by differing performance between MSIE and NS browsers hitting my system until I realized this.) === To: perrin@primenet.com cc: Gunther Birznieks <gunther@extropia.com>, mod_perl list <modperl@apache.org> cc: speedycgi@newlug.org Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory Date: Thu, 21 Dec 2000 02:50:28 -0800 > Gunther Birznieks wrote: > > Sam just posted this to the speedycgi list just now. > [...] > > >The underlying problem in mod_perl is that apache likes to spread out > > >web requests to as many httpd's, and therefore as many mod_perl interpreters, > > >as possible using an LRU selection processes for picking httpd's. > > Hmmm... this doesn't sound right. I've never looked at the code in > Apache that does this selection, but I was under the impression that the > choice of which process would handle each request was an OS dependent > thing, based on some sort of mutex. > > Take a look at this: http://httpd.apache.org/docs/misc/perf-tuning.html > > Doesn't that appear to be saying that whichever process gets into the > mutex first will get the new request? I would agree that whichver process gets into the mutex first will get the new request. That's exactly the problem I'm describing. What you are describing here is first-in, first-out behaviour which implies LRU behaviour. Processes 1, 2, 3 are running. 1 finishes and requests the mutex, then 2 finishes and requests the mutex, then 3 finishes and requests the mutex. So when the next three requests come in, they are handled in the same order: 1, then 2, then 3 - this is FIFO or LRU. This is bad for performance. > In my experience running > development servers on Linux it always seemed as if the the requests > would continue going to the same process until a request came in when > that process was already busy. No, they don't. They go round-robin (or LRU as I say it). Try this simple test script: use CGI; my $cgi = CGI->new; print $cgi->header(); print "mypid=$$\n"; WIth mod_perl you constantly get different pids. WIth mod_speedycgi you usually get the same pid. THis is a really good way to see the LRU/MRU difference that I'm talking about. Here's the problem - the mutex in apache is implemented using a lock on a file. It's left up to the kernel to decide which process to give that lock to. Now, if you're writing a unix kernel and implementing this file locking code, what implementation would you use? Well, this is a general purpose thing - you have 100 or so processes all trying to acquire this file lock. You could give out the lock randomly or in some ordered fashion. If I were writing the kernel I would give it out in a round-robin fashion (or the least-recently-used process as I referred to it before). Why? Because otherwise one of those processes may starve waiting for this lock - it may never get the lock unless you do it in a fair (round-robin) manner. THe kernel doesn't know that all these httpd's are exactly the same. The kernel is implementing a general-purpose file-locking scheme and it doesn't know whether one process is more important than another. If it's not fair about giving out the lock a very important process might starve. Take a look at fs/locks.c (I'm looking at linux 2.3.46). In there is the comment: /* Insert waiter into blocker's block list. * We use a circular list so that processes can be easily woken up in * the order they blocked. The documentation doesn't require this but * it seems like the reasonable thing to do. */ static void locks_insert_block(struct file_lock *blocker, struct file_lock *waiter) > As I understand it, the implementation of "wake-one" scheduling in the > 2.4 Linux kernel may affect this as well. It may then be possible to > skip the mutex and use unserialized accept for single socket servers, > which will definitely hand process selection over to the kernel. If the kernel implemented the queueing for multiple accepts using a LIFO instead of a FIFO and apache used this method instead of file locks, then that would probably solve it. Just found this on the net on this subject: http://www.uwsg.iu.edu/hypermail/linux/kernel/9704.0/0455.html http://www.uwsg.iu.edu/hypermail/linux/kernel/9704.0/0453.html > > >The problem is that at a high concurrency level, mod_perl is using lots > > >and lots of different perl-interpreters to handle the requests, each > > >with its own un-shared memory. It's doing this due to its LRU design. > > >But with SpeedyCGI's MRU design, only a few speedy_backends are being used > > >because as much as possible it tries to use the same interpreter over and > > >over and not spread out the requests to lots of different interpreters. > > >Mod_perl is using lots of perl-interpreters, while speedycgi is only using > > >a few. mod_perl is requiring that lots of interpreters be in memory in > > >order to handle the requests, wherase speedy only requires a small number > > >of interpreters to be in memory. > > This test - building up unshared memory in each process - is somewhat > suspect since in most setups I've seen, there is a very significant > amount of memory being shared between mod_perl processes. My message and testing concerns un-shared memory only. If all of your memory is shared, then there shouldn't be a problem. But a point I'm making is that with mod_perl you have to go to great lengths to write your code so as to avoid unshared memory. My claim is that with mod_speedycgi you don't have to concern yourself as much with this. You can concentrate more on the application and less on performance tuning. > Regardless, > the explanation here doesn't make sense to me. If we assume that each > approach is equally fast (as Sam seems to say earlier in his message) > then it should take an equal number of speedycgi and mod_perl processes > to handle the same concurrency. I don't assume that each approach is equally fast under all loads. They were about the same with concurrency level-1, but higher concurrency levels they weren't. I am saying that since SpeedyCGI uses MRU to allocate requests to perl interpreters, it winds up using a lot fewer interpreters to handle the same number of requests. On a single-CPU system of course at some point all the concurrency has to be serialized. mod_speedycgi and mod_perl take different approaches before getting to get to that point. mod_speedycgi tries to use as small a number of unix processes as possible, while mod_perl tries to use a very large number of unix processes. > That leads me to believe that what's really happening here is that > Apache is pre-forking a bit over-zealously in response to a sudden surge > of traffic from ab, and thus has extra unused processes sitting around > waiting, while speedycgi is avoiding this situation by waiting for > someone to try and use the processes before forking them (i.e. no > pre-forking). The speedycgi way causes a brief delay while new > processes fork, but doesn't waste memory. Does this sound like a > plausible explanation to folks? I don't think it's pre-forking. When I ran my tests I would always run them twice, and take the results from the second run. The first run was just to "prime the pump". I tried reducing MinSpareSErvers, and this did help mod_perl get a higher concurrency number, but it would still run into a wall where speedycgi would not. > This is probably all a moot point on a server with a properly set > MaxClients and Apache::SizeLimit that will not go into swap. Please let me know what you think I should change. So far my benchmarks only show one trend, but if you can tell me specifically what I'm doing wrong (and it's something reasonable), I'll try it. I don't think SizeLimit is the answer - my process isn't growing. It's using the same 50k of un-shared memory over and over. I believe that with speedycgi you don't have to lower the MaxClients setting, because it's able to handle a larger number of clients, at least in this test. In other words, if with mod_perl you had to turn away requests, but with mod_speedycgi you did not, that would just prove that speedycgi is more scalable. Now you could tell me "don't use unshared memory", but that's outside the bounds of the test. The whole test concerns unshared memory. > I would > expect mod_perl to have the advantage when all processes are > fully-utilized because of the shared memory. Maybe. There must a benchmark somewhere that would show off of mod_perl's advantages in shared memory. Maybe a 100,000 line perl program or something like that - it would have to be something where mod_perl is using *lots* of shared memory, because keep in mind that there are still going to be a whole lot fewer SpeedyCGI processes than there are mod_perl processes, so you would really have to go overboard in the shared-memory department. > It would be cool if speedycgi could somehow use a parent process > model and get the shared memory benefits too. > Speedy seems like it > might be more attractive to > ISPs, and it would be nice to increase > interoperability between the two > projects. Thanks. And please, I'm not trying start a speedy vs mod_perl war. My original message was only to the speedycgi list, but now that it's on mod_perl I think I have to reply there too. But, there is a need for a little good PR on speedycgi's side, and I was looking for that. I would rather just see mod_perl fixed if that's possible. But the last time I brought up this issue (maybe a year ago) I was unable to convince the people on the mod_perl list that this problem even existed. === Date: Thu, 21 Dec 2000 21:16:08 +0100 (CET) From: Stas Bekman <stas@stason.org> To: Sam Horrocks <sam@daemoninc.com> Cc: Gunther Birznieks <gunther@extropia.com>, mod_perl list <modperl@apache.org>, speedycgi@newlug.org Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory Folks, your discussion is not short of wrong statements that can be easily proved, but I don't find it useful. Instead please read: http://perl.apache.org/~dougm/modperl_2.0.html#new Too quote the most relevant part: "With 2.0, mod_perl has much better control over which PerlInterpreters are used for incoming requests. The intepreters are stored in two linked lists, one for available interpreters one for busy. When needed to handle a request, one is taken from the head of the available list and put back into the head of the list when done. This means if you have, say, 10 interpreters configured to be cloned at startup time, but no more than 5 are ever used concurrently, those 5 continue to reuse Perls allocations, while the other 5 remain much smaller, but ready to go if the need arises." Of course you should read the rest. So the moment mod_perl 2.0 hits the shelves, this possible benefit of speedycgi over mod_perl becomes irrelevant. I think this more or less summarizes this thread. And Gunther, nobody tries to shut people expressing their opinions here, it's just that different people express their feelings in different ways, that's the way the open list goes... :) so please keep on forwarding things that you find interesting. I don't think anybody here has a relief when you are busy and not posting as you happen to say -- I believe that your posts are very interesting and you shouldn't discourage yourself from keeping on doing that. Those who don't like your posts don't have to read them. Hope you are all having fun and getting ready for the holidays :) I'm going to buy my ski equipment soonish! === To: Stas Bekman <stas@stason.org> cc: Gunther Birznieks <gunther@extropia.com>, mod_perl list <modperl@apache.org>, speedycgi@newlug.org Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory Date: Thu, 21 Dec 2000 14:34:46 -0800 > Folks, your discussion is not short of wrong statements that can be easily > proved, but I don't find it useful. I don't follow. Are you saying that my conclusions are wrong, but you don't want to bother explaining why? Would you agree with the following statement? Under apache-1, speedycgi scales better than mod_perl with scripts that contain un-shared memory === Date: Fri, 22 Dec 2000 08:45:25 +0800 To: Stas Bekman <stas@stason.org>, Sam Horrocks <sam@daemoninc.com> From: Gunther Birznieks <gunther@extropia.com> Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory Cc: mod_perl list <modperl@apache.org>, speedycgi@newlug.org At 09:16 PM 12/21/00 +0100, Stas Bekman wrote: [much removed] >So the moment mod_perl 2.0 hits the shelves, this possible benefit >of speedycgi over mod_perl becomes irrelevant. I think this more or less >summarizes this thread. I think you are right about the summarization. However, I also think it's unfair for people here to pin too many hopes on mod_perl 2.0. First Apache 2.0 has to be fully released. It's still in Alpha! Then, mod_perl 2.0 has to be released. I haven't seen any realistic timelines that indicate to me that these will be released and stable for production use in only a few months time. And Apache 2.0 has been worked on for years. I first saw a talk on Apache 2.0's architecture at the first ApacheCon 2 years ago! To be fair, back then they were using Mozilla's NPR which I think they learned from, threw away, and rewrote from scratch after all (to become APR). But still, the point is that it's been a long time and probably will be a while yet. Who in their right mind would pin their business or production database on the hope that mod_perl 2.0 comes out in a few months? I don't think anyone would. Sam has a solution that works now, and is open source and provides some benefits for web applications that mod_perl and apache is not as efficient at for some types of applications. As people interested in Perl, we should be embracing these alternatives not telling people to wait for new versions of software that may not come out soon. If there is a problem with mod_perl advocacy, it's that it is precisely too mod_perl centric. Mod_perl is a niche crowd which has a high learning curve. I think the technology mod_perl offers is great, but as has been said before, the problem is that people are going to PHP away from Perl. If more people had easier solutions to implement their simple apps in Perl yet be as fast as PHP, less people would go to PHP. Those Perl people would eventually discover mod_perl's power as they require it, and then they would take the step to "upgrade" to the power of handlers away from the "missing link". But without that "missing link" to make things easy for people to move from PHP to Perl, then Perl will miss something very crucial to maintaining its standing as the "defacto language for Web applications". 3 years ago, I think it would be accurate to say Perl apps drive 95% of the dynamic web. Sadly, I believe (anecdotally) that this is no longer true. SpeedyCGI is not "THE" missing link, but I see it as a crucial part of this link between newbies and mod_perl. This is why I believe that mod_perl and its documentation should have a section (even if tiny) on this stuff, so that people will know that if they find mod_perl too hard, that there are alternatives that are less powerful, yet provide at least enough power to beat PHP. I also see SpeedyCGI as being on the way to being more ISP-friendly already for hosting casual users of Perl than mod_perl is. Different apps use a different backend engine by default. So the problem with virtual hosts screwing each other over by accident is gone for the casual user. There are still some needs for improvement (eg memory is likely still an issue with different backends)... Anyway, these are just my feelings. I really shouldn't be spending time on posting this as I have some deadlines to meet. But I felt they were still important points to make that I think some people may be potentially missing here. :) === Date: Fri, 22 Dec 2000 01:48:47 +0100 (CET) From: Stas Bekman <stas@stason.org> To: Sam Horrocks <sam@daemoninc.com> Cc: Gunther Birznieks <gunther@extropia.com>, mod_perl list <modperl@apache.org>, speedycgi@newlug.org Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory In-Reply-To: <14214.977438086@daemonweb.daemoninc.com> On Thu, 21 Dec 2000, Sam Horrocks wrote: > > Folks, your discussion is not short of wrong statements that can be easily > > proved, but I don't find it useful. > > I don't follow. Are you saying that my conclusions are wrong, but > you don't want to bother explaining why? > > Would you agree with the following statement? > > Under apache-1, speedycgi scales better than mod_perl with > scripts that contain un-shared memory I don't know. It's easy to give a simple example and claim being better. So far whoever tried to show by benchmarks that he is better, most often was proved wrong, since the technologies in question have so many features, that I believe no benchmark will prove any of them absolutely superior or inferior. Therefore I said that trying to tell that your grass is greener is doomed to fail if someone has time on his hands to prove you wrong. Well, we don't have this time. Therefore I'm not trying to prove you wrong or right. Gunther's point of the original forward was to show things that mod_perl may need to adopt to make it better. Doug already explained in his paper that the MRU approach has been already implemented in mod_perl-2.0. You could read it in the link that I've attached and the quote that I've quoted. So your conclusions about MRU are correct and we have it implemented already (well very soon now :). I apologize if my original reply was misleading. I'm not telling that benchmarks are bad. What I'm telling is that it's very hard to benchmark things which are different. You benefit the most from the benchmarking when you take the initial code/product, benchmark it, then you try to improve the code and benchmark again to see whether it gave you any improvement. That's the area when the benchmarks rule and their are fair because you test the same thing. Well you could read more of my rambling about benchmarks in the guide. So if you find some cool features in other technologies that mod_perl might adopt and benefit from, don't hesitate to tell the rest of the gang. ---- Something that I'd like to comment on: I find it a bad practice to quote one sentence from person's post and follow up on it. Someone from the list has sent me this email (SB> == me): SB> I don't find it useful and follow up. Why not to use a single letter: SB> I and follow up? It's so much easier to flame on things taken out of their context. it has been no once that people did this to each other here on the list, I think I did too. So please be more careful when taking things out of context. Thanks a lot, folks! === To: Gunther Birznieks <gunther@extropia.com> cc: speedycgi@newlug.org cc: perrin@primenet.com, mod_perl list <modperl@apache.org> Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory Date: Thu, 21 Dec 2000 16:56:54 -0800 I've put your suggestion on the todo list. It certainly wouldn't hurt to have that feature, though I think memory sharing becomes a much much smaller issue once you switch to MRU scheduling. At the moment I think SpeedyCGI has more pressing needs though - for example multiple scripts in a single interpreter, and an NT port. > I think you could actually make speedycgi even better for shared memory > usage by creating a special directive which would indicate to speedycgi to > preload a series of modules. And then to tell speedy cgi to do forking of > that "master" backend preloaded module process and hand control over to > that forked process whenever you need to launch a new process. > > Then speedy would potentially have the best of both worlds. > > Sorry I cross posted your thing. But I do think it is a problem of mod_perl > also, and I am happily using speedycgi in production on at least one > commercial site where mod_perl could not be installed so easily because of > infrastructure issues. > > I believe your mechanism of round robining among MRU perl interpreters is > actually also accomplished by ActiveState's PerlEx (based on > Apache::Registry but using multithreaded IIS and pool of Interpreters). A > method similar to this will be used in Apache 2.0 when Apache is > multithreaded and therefore can control within program logic which Perl > interpeter gets called from a pool of Perl interpreters. > > It just isn't so feasible right now in Apache 1.0 to do this. And sometimes > people forget that mod_perl came about primarily for writing handlers in > Perl not as an application environment although it is very good for the > later as well. > > I think SpeedyCGI needs more advocacy from the mod_perl group because put > simply speedycgi is way easier to set up and use than mod_perl and will > likely get more PHP people using Perl again. If more people rely on Perl > for their fast websites, then you will get more people looking for more > power, and by extension more people using mod_perl. > > Whoops... here we go with the advocacy thing again. > > Later, > Gunther > > At 02:50 AM 12/21/2000 -0800, Sam Horrocks wrote: > > > Gunther Birznieks wrote: > > > > Sam just posted this to the speedycgi list just now. > > > [...] > > > > >The underlying problem in mod_perl is that apache likes to spread out > > > > >web requests to as many httpd's, and therefore as many mod_perl > > interpreters, > > > > >as possible using an LRU selection processes for picking httpd's. > > > > > > Hmmm... this doesn't sound right. I've never looked at the code in > > > Apache that does this selection, but I was under the impression that the > > > choice of which process would handle each request was an OS dependent > > > thing, based on some sort of mutex. > > > > > > Take a look at this: http://httpd.apache.org/docs/misc/perf-tuning.html > > > > > > Doesn't that appear to be saying that whichever process gets into the > > > mutex first will get the new request? > > > > I would agree that whichver process gets into the mutex first will get > > the new request. That's exactly the problem I'm describing. What you > > are describing here is first-in, first-out behaviour which implies LRU > > behaviour. > > > > Processes 1, 2, 3 are running. 1 finishes and requests the mutex, then > > 2 finishes and requests the mutex, then 3 finishes and requests the mutex. > > So when the next three requests come in, they are handled in the same order: > > 1, then 2, then 3 - this is FIFO or LRU. This is bad for performance. > > > > > In my experience running > > > development servers on Linux it always seemed as if the the requests > > > would continue going to the same process until a request came in when > > > that process was already busy. > > > > No, they don't. They go round-robin (or LRU as I say it). > > > > Try this simple test script: > > > > use CGI; > > my $cgi = CGI->new; > > print $cgi->header(); > > print "mypid=$$\n"; > > > > WIth mod_perl you constantly get different pids. WIth mod_speedycgi you > > usually get the same pid. THis is a really good way to see the LRU/MRU > > difference that I'm talking about. > > > > Here's the problem - the mutex in apache is implemented using a lock > > on a file. It's left up to the kernel to decide which process to give > > that lock to. > > > > Now, if you're writing a unix kernel and implementing this file locking > > code, > > what implementation would you use? Well, this is a general purpose thing - > > you have 100 or so processes all trying to acquire this file lock. You > > could > > give out the lock randomly or in some ordered fashion. If I were writing > > the kernel I would give it out in a round-robin fashion (or the > > least-recently-used process as I referred to it before). Why? Because > > otherwise one of those processes may starve waiting for this lock - it may > > never get the lock unless you do it in a fair (round-robin) manner. > > > > THe kernel doesn't know that all these httpd's are exactly the same. > > The kernel is implementing a general-purpose file-locking scheme and > > it doesn't know whether one process is more important than another. If > > it's not fair about giving out the lock a very important process might > > starve. > > > > Take a look at fs/locks.c (I'm looking at linux 2.3.46). In there is the > > comment: > > > > /* Insert waiter into blocker's block list. > > * We use a circular list so that processes can be easily woken up in > > * the order they blocked. The documentation doesn't require this but > > * it seems like the reasonable thing to do. > > */ > > static void locks_insert_block(struct file_lock *blocker, struct > > file_lock *waiter) > > > > > As I understand it, the implementation of "wake-one" scheduling in the > > > 2.4 Linux kernel may affect this as well. It may then be possible to > > > skip the mutex and use unserialized accept for single socket servers, > > > which will definitely hand process selection over to the kernel. > > > > If the kernel implemented the queueing for multiple accepts using a LIFO > > instead of a FIFO and apache used this method instead of file locks, > > then that would probably solve it. > > > > Just found this on the net on this subject: > > http://www.uwsg.iu.edu/hypermail/linux/kernel/9704.0/0455.html > > http://www.uwsg.iu.edu/hypermail/linux/kernel/9704.0/0453.html > > > > > > >The problem is that at a high concurrency level, mod_perl is using lots > > > > >and lots of different perl-interpreters to handle the requests, each > > > > >with its own un-shared memory. It's doing this due to its LRU design. > > > > >But with SpeedyCGI's MRU design, only a few speedy_backends are > > being used > > > > >because as much as possible it tries to use the same interpreter > > over and > > > > >over and not spread out the requests to lots of different interpreters. > > > > >Mod_perl is using lots of perl-interpreters, while speedycgi is > > only using > > > > >a few. mod_perl is requiring that lots of interpreters be in memory in > > > > >order to handle the requests, wherase speedy only requires a small > > number > > > > >of interpreters to be in memory. > > > > > > This test - building up unshared memory in each process - is somewhat > > > suspect since in most setups I've seen, there is a very significant > > > amount of memory being shared between mod_perl processes. > > > > My message and testing concerns un-shared memory only. If all of your > > memory > > is shared, then there shouldn't be a problem. > > > > But a point I'm making is that with mod_perl you have to go to great > > lengths to write your code so as to avoid unshared memory. My claim is that > > with mod_speedycgi you don't have to concern yourself as much with this. > > You can concentrate more on the application and less on performance tuning. > > > > > Regardless, > > > the explanation here doesn't make sense to me. If we assume that each > > > approach is equally fast (as Sam seems to say earlier in his message) > > > then it should take an equal number of speedycgi and mod_perl processes > > > to handle the same concurrency. > > > > I don't assume that each approach is equally fast under all loads. They > > were about the same with concurrency level-1, but higher concurrency levels > > they weren't. > > > > I am saying that since SpeedyCGI uses MRU to allocate requests to perl > > interpreters, it winds up using a lot fewer interpreters to handle the > > same number of requests. > > > > On a single-CPU system of course at some point all the concurrency has > > to be serialized. mod_speedycgi and mod_perl take different approaches > > before getting to get to that point. mod_speedycgi tries to use as > > small a number of unix processes as possible, while mod_perl tries to > > use a very large number of unix processes. > > > > > That leads me to believe that what's really happening here is that > > > Apache is pre-forking a bit over-zealously in response to a sudden surge > > > of traffic from ab, and thus has extra unused processes sitting around > > > waiting, while speedycgi is avoiding this situation by waiting for > > > someone to try and use the processes before forking them (i.e. no > > > pre-forking). The speedycgi way causes a brief delay while new > > > processes fork, but doesn't waste memory. Does this sound like a > > > plausible explanation to folks? > > > > I don't think it's pre-forking. When I ran my tests I would always run > > them twice, and take the results from the second run. The first run > > was just to "prime the pump". > > > > I tried reducing MinSpareSErvers, and this did help mod_perl get a higher > > concurrency number, but it would still run into a wall where speedycgi > > would not. > > > > > This is probably all a moot point on a server with a properly set > > > MaxClients and Apache::SizeLimit that will not go into swap. > > > > Please let me know what you think I should change. So far my > > benchmarks only show one trend, but if you can tell me specifically > > what I'm doing wrong (and it's something reasonable), I'll try it. > > > > I don't think SizeLimit is the answer - my process isn't growing. It's > > using the same 50k of un-shared memory over and over. > > > > I believe that with speedycgi you don't have to lower the MaxClients > > setting, because it's able to handle a larger number of clients, at > > least in this test. In other words, if with mod_perl you had to turn > > away requests, but with mod_speedycgi you did not, that would just > > prove that speedycgi is more scalable. > > > > Now you could tell me "don't use unshared memory", but that's outside > > the bounds of the test. The whole test concerns unshared memory. > > > > > I would > > > expect mod_perl to have the advantage when all processes are > > > fully-utilized because of the shared memory. > > > > Maybe. There must a benchmark somewhere that would show off of > > mod_perl's advantages in shared memory. Maybe a 100,000 line perl > > program or something like that - it would have to be something where > > mod_perl is using *lots* of shared memory, because keep in mind that > > there are still going to be a whole lot fewer SpeedyCGI processes than > > there are mod_perl processes, so you would really have to go overboard > > in the shared-memory department. > > > > > It would be cool if speedycgi could somehow use a parent process > > > model and get the shared memory benefits too. > > > > > Speedy seems like it > > > might be more attractive to > ISPs, and it would be nice to increase > > > interoperability between the two > projects. > > > > Thanks. And please, I'm not trying start a speedy vs mod_perl war. > > My original message was only to the speedycgi list, but now that it's > > on mod_perl I think I have to reply there too. > > > > But, there is a need for a little good PR on speedycgi's side, and I > > was looking for that. I would rather just see mod_perl fixed if that's > > possible. But the last time I brought up this issue (maybe a year ago) > > I was unable to convince the people on the mod_perl list that this > > problem even existed. > > > > Sam === To: speedycgi@newlug.org cc: Gunther Birznieks <gunther@extropia.com>, mod_perl list <modperl@apache.org> cc: Stas Bekman <stas@stason.org> Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory Date: Thu, 21 Dec 2000 17:26:39 -0800 I really wasn't trying to work backwards from a benchmark. It was more of an analysis of the design, and the benchmarks bore it out. It's sort of like coming up with a theory in science - if you can't get any experimental data to back up the theory, you're in big trouble. But if you can at least point out the existence of some experiments that are consistent with your theory, it means your theory may be true. The best would be to have other people do the same tests and see if they see the same trend. If no-one else sees this trend, then I'd really have to re-think my analysis. Another way to look at it - as you say below MRU is going to be in mod_perl-2.0. ANd what is the reason for that? If there's no performance difference between LRU and MRU why would the author bother to switch to MRU. So, I'm saying there must be some benchmarks somewhere that point out this difference - if there weren't any real-world difference, why bother even implementing MRU. I claim that my benchmarks point out this difference between MRU over LRU, and that's why my benchmarks show better performance on speedycgi than on mod_perl. === From: Perrin Harkins <perrin@primenet.com> To: Sam Horrocks <sam@daemoninc.com> Cc: Gunther Birznieks <gunther@extropia.com>, mod_perl list <modperl@apache.org>, speedycgi@newlug.org Date: Thu, 21 Dec 2000 17:38:37 -0800 (PST) Hi Sam, > Processes 1, 2, 3 are running. 1 finishes and requests the mutex, then > 2 finishes and requests the mutex, then 3 finishes and requests the mutex. > So when the next three requests come in, they are handled in the same order: > 1, then 2, then 3 - this is FIFO or LRU. This is bad for performance. Thanks for the explanation; that makes sense now. So, I was right that it's OS dependent, but most OSes use a FIFO approach which leads to LRU selection in the mutex. Unfortunately, I don't see that being fixed very simply, since it's not really Apache doing the choosing. Maybe it will be possible to do something cool with the wake-one stuff in Linux 2.4 when that comes out. By the way, how are you doing it? Do you use a mutex routine that works in LIFO fashion? > > In my experience running > > development servers on Linux it always seemed as if the the requests > > would continue going to the same process until a request came in when > > that process was already busy. > > No, they don't. They go round-robin (or LRU as I say it). Keith Murphy pointed out that I was seeing the result of persistent HTTP connections from my browser. Duh. > But a point I'm making is that with mod_perl you have to go to great > lengths to write your code so as to avoid unshared memory. My claim is that > with mod_speedycgi you don't have to concern yourself as much with this. > You can concentrate more on the application and less on performance tuning. I think you're overstating the case a bit here. It's really easy to take advantage of shared memory with mod_perl - I just add a 'use Foo' to my startup.pl! It can be hard for newbies to understand, but there's nothing difficult about implementing it. I often get 50% or more of my application shared in this way. That's a huge savings. > I don't assume that each approach is equally fast under all loads. They > were about the same with concurrency level-1, but higher concurrency levels > they weren't. Well, certainly not when mod_perl started swapping... Actually, there is a reason why MRU could lead to better performance (as opposed to just saving memory): caching of allocated memory. The first time Perl sees lexicals it has to allocate memory for them, so if you re-use the same interpreter you get to skip this step and that should give some kind of performance benefit. > I am saying that since SpeedyCGI uses MRU to allocate requests to perl > interpreters, it winds up using a lot fewer interpreters to handle the > same number of requests. What I was saying is that it doesn't make sense for one to need fewer interpreters than the other to handle the same concurrency. If you have 10 requests at the same time, you need 10 interpreters. There's no way speedycgi can do it with fewer, unless it actually makes some of them wait. That could be happening, due to the fork-on-demand model, although your warmup round (priming the pump) should take care of that. > I don't think it's pre-forking. When I ran my tests I would always run > them twice, and take the results from the second run. The first run > was just to "prime the pump". That seems like it should do it, but I still think you could only have more processes handling the same concurrency on mod_perl if some of the mod_perl processes are idle or some of the speedycgi requests are waiting. > > This is probably all a moot point on a server with a properly set > > MaxClients and Apache::SizeLimit that will not go into swap. > > Please let me know what you think I should change. So far my > benchmarks only show one trend, but if you can tell me specifically > what I'm doing wrong (and it's something reasonable), I'll try it. Try setting MinSpareServers as low as possible and setting MaxClients to a value that will prevent swapping. Then set ab for a concurrency equal to your MaxClients setting. > I believe that with speedycgi you don't have to lower the MaxClients > setting, because it's able to handle a larger number of clients, at > least in this test. Maybe what you're seeing is an ability to handle a larger number of requests (as opposed to clients) because of the performance benefit I mentioned above. I don't know how hard ab tries to make sure you really have n simultaneous clients at any given time. > In other words, if with mod_perl you had to turn > away requests, but with mod_speedycgi you did not, that would just > prove that speedycgi is more scalable. Are the speedycgi+Apache processes smaller than the mod_perl processes? If not, the maximum number of concurrent requests you can handle on a given box is going to be the same. > Maybe. There must a benchmark somewhere that would show off of > mod_perl's advantages in shared memory. Maybe a 100,000 line perl > program or something like that - it would have to be something where > mod_perl is using *lots* of shared memory, because keep in mind that > there are still going to be a whole lot fewer SpeedyCGI processes than > there are mod_perl processes, so you would really have to go overboard > in the shared-memory department. Well, I get tons of use out of shared memory without even trying. If you can find a way to implement it in speedycgi, I think it would be very beneficial to your users. > I would rather just see mod_perl fixed if that's > possible. Because this has more to do with the OS than Apache and is already fixed in mod_perl 2, I doubt anyone will feel like messing with it before that gets released. Your experiment demonstrates that the MRU approach has value, so I'll be looking forward to trying it out with mod_perl 2. === Date: Thu, 21 Dec 2000 22:39:50 -0600 From: "Ken Williams" <ken@forum.swarthmore.edu> To: "Perrin Harkins" <perrin@primenet.com> Cc: "mod_perl list" <modperl@apache.org>, speedycgi@newlug.org Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory perrin@primenet.com (Perrin Harkins) wrote: >Hi Sam, [snip] >> I am saying that since SpeedyCGI uses MRU to allocate requests to perl >> interpreters, it winds up using a lot fewer interpreters to handle the >> same number of requests. > >What I was saying is that it doesn't make sense for one to need fewer >interpreters than the other to handle the same concurrency. If you have >10 requests at the same time, you need 10 interpreters. There's no way >speedycgi can do it with fewer, unless it actually makes some of them >wait. Well, there is one way, though it's probably not a huge factor. If mod_perl indeed manages the child-farming in such a way that too much memory is used, then each process might slow down as memory becomes sparse, especially if you start swapping. Then if each request takes longer, your child pool is more saturated with requests, and you might have to fork a few more kids. So in a sense, I think you're both correct. If "concurrency" means the number of requests that can be handled at once, both systems are necessarily (and trivially) equivalent. This isn't a very useful measurement, though; a more useful one is how many children (or perhaps how much memory) will be necessary to handle a given number of incoming requests per second, and with this metric the two systems could perform differently. === To: Ken Williams <ken@forum.swarthmore.edu> Cc: mod_perl list <modperl@apache.org>, speedycgi@newlug.org Date: Thu, 21 Dec 2000 22:07:10 -0800 (PST) Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory On Thu, 21 Dec 2000, Ken Williams wrote: > So in a sense, I think you're both correct. If "concurrency" means > the number of requests that can be handled at once, both systems are > necessarily (and trivially) equivalent. This isn't a very useful > measurement, though; a more useful one is how many children (or > perhaps how much memory) will be necessary to handle a given number of > incoming requests per second, and with this metric the two systems > could perform differently. Yes, well put. And that actually brings me back around to my original hypothesis, which is that once you reach the maximum number of interprerters that can be run on the box before swapping, it no longer makes a difference if you're using LRU or MRU. That's because all interpreters are busy all the time, and the RAM for lexicals has already been allocated in all of them. At that point, it's a question of which system can fit more interpreters in RAM at once, and I still think mod_perl would come out on top there because of the shared memory. Of course most people don't run their servers at full throttle, and at less than total saturation I would expect speedycgi to use less RAM and possibly be faster. So I guess I'm saying exactly the opposite of the original assertion: mod_perl is more scalable if you define "scalable" as maximum requests per second on a given machine, but speedycgi uses fewer resources at less than peak loads which would make it more attractive for ISPs and other people who use their servers for multiple tasks. This is all hypothetical and I don't have time to experiment with it until after the holidays, but I think the logic is correct. === From: "Jeremy Howard" <jh_lists@fastmail.fm> To: "Perrin Harkins" <perrin@primenet.com>, "Sam Horrocks" <sam@daemoninc.com> Cc: "Gunther Birznieks" <gunther@extropia.com>, "mod_perl list" <modperl@apache.org>, <speedycgi@newlug.org> Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscriptsthat contain un-shared memory Date: Fri, 22 Dec 2000 17:38:19 +1100 Perrin Harkins wrote: > What I was saying is that it doesn't make sense for one to need fewer > interpreters than the other to handle the same concurrency. If you have > 10 requests at the same time, you need 10 interpreters. There's no way > speedycgi can do it with fewer, unless it actually makes some of them > wait. That could be happening, due to the fork-on-demand model, although > your warmup round (priming the pump) should take care of that. I don't know if Speedy fixes this, but one problem with mod_perl v1 is that if, for instance, a large POST request is being uploaded, this takes a whole perl interpreter while the transaction is occurring. This is at least one place where a Perl interpreter should not be needed. Of course, this could be overcome if an HTTP Accelerator is used that takes the whole request before passing it to a local httpd, but I don't know of any proxies that work this way (AFAIK they all pass the packets as they arrive). === Date: Fri, 22 Dec 2000 07:51:47 +0000 (GMT) From: Matt Sergeant <matt@sergeant.org> To: Sam Horrocks <sam@daemoninc.com> cc: Stas Bekman <stas@stason.org>, Gunther Birznieks <gunther@extropia.com>, mod_perl list <modperl@apache.org>, <speedycgi@newlug.org> Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory On Thu, 21 Dec 2000, Sam Horrocks wrote: > > Folks, your discussion is not short of wrong statements that can be easily > > proved, but I don't find it useful. > > I don't follow. Are you saying that my conclusions are wrong, but > you don't want to bother explaining why? > > Would you agree with the following statement? > > Under apache-1, speedycgi scales better than mod_perl with > scripts that contain un-shared memory NO! When you can write a trans handler or an auth handler with speedy, then I might agree with you. Until then I must insist you add "mod_perl Apache::Registry scripts" or something to that affect. === Date: Fri, 22 Dec 2000 11:18:32 -0600 From: "Keith G. Murphy" <keithmur@mindspring.com> To: mod_perl list <modperl@apache.org> CC: Perrin Harkins <perrin@primenet.com> Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscriptsthat contain un-shared memory Perrin Harkins wrote: > Keith Murphy pointed out that I was seeing the result of persistent HTTP > connections from my browser. Duh. > I must mention that, having seen your postings here over a long period, anytime I can make you say "duh", my week is made. Maybe the whole month. That issue can be confusing. It was especially so for me when IE did it, and Netscape did not... Let's make everyone switch to IE, and mod_perl looks good again! :-b === To: "Jeremy Howard" <jh_lists@fastmail.fm> Cc: modperl@apache.org Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscriptsthat contain un-shared memory From: Joe Schaefer <joe@sunstarsys.com> Date: 22 Dec 2000 22:17:06 -0500 "Jeremy Howard" <jh_lists@fastmail.fm> writes: > Perrin Harkins wrote: > > What I was saying is that it doesn't make sense for one to need fewer > > interpreters than the other to handle the same concurrency. If you have > > 10 requests at the same time, you need 10 interpreters. There's no way > > speedycgi can do it with fewer, unless it actually makes some of them > > wait. That could be happening, due to the fork-on-demand model, although > > your warmup round (priming the pump) should take care of that. A backend server can realistically handle multiple frontend requests, since the frontend server must stick around until the data has been delivered to the client (at least that's my understanding of the lingering-close issue that was recently discussed at length here). Hypothetically speaking, if a "FastCGI-like"[1] backend can deliver it's content faster than the apache (front-end) server can "proxy" it to the client, you won't need as many to handle the same (front-end) traffic load. As an extreme hypothetical example, say that over a 5 second period you are barraged with 100 modem requests that typically would take 5s each to service. This means (sans lingerd :) that at the end of your 5 second period, you have 100 active apache children around. But if new requests during that 5 second interval were only received at 20/second, and your "FastCGI-like" server could deliver the content to apache in one second, you might only have forked 50-60 "FastCGI-like" new processes to handle all 100 requests (forks take a little time :). Moreover, an MRU design allows the transient effects of a short burst of abnormally heavy traffic to dissipate quickly, and IMHO that's its chief advantage over LRU. To return to this hypothetical, suppose that immediately following this short burst, we maintain a sustained traffic of 20 new requests per second. Since it takes 5 seconds to deliver the content, that amounts to a sustained concurrency level of 100. The "Fast-CGI like" backend may have initially reacted by forking 50-60 processes, but with MRU only 20-30 processes will actually be handling the load, and this reduction would happen almost immediately in this hyothetical. This means that the remaining transient 20-30 processes could be quickly killed off or _moved to swap_ without adversely affecting server performance. Again, this is all purely hypothetical - I don't have benchmarks to back it up ;) > I don't know if Speedy fixes this, but one problem with mod_perl v1 is that > if, for instance, a large POST request is being uploaded, this takes a whole > perl interpreter while the transaction is occurring. This is at least one > place where a Perl interpreter should not be needed. > > Of course, this could be overcome if an HTTP Accelerator is used that takes > the whole request before passing it to a local httpd, but I don't know of > any proxies that work this way (AFAIK they all pass the packets as they > arrive). I posted a patch to modproxy a few months ago that specifically addresses this issue. It has a ProxyPostMax directive that changes it's behavior to a store-and-forward proxy for POST data (it also enabled keepalives on the browser-side connection if they were enabled on the frontend server.) It does this by buffering the data to a temp file on the proxy before opening the backend socket. It's straightforward to make it buffer to a portion of RAM instead- if you're interested I can post another patch that does this also, but it's pretty much untested. [1] I've never used SpeedyCGI, so I've refrained from specifically discussing it. Also, a mod_perl backend server using Apache::Registry can be viewed as "FastCGI-like" for the purpose of my argument. === Date: Sat, 23 Dec 2000 11:28:18 +0800 To: Joe Schaefer <joe@sunstarsys.com>, "Jeremy Howard" <jh_lists@fastmail.fm> From: Gunther Birznieks <gunther@extropia.com> Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscriptsthat contain un-shared memory At 10:17 PM 12/22/2000 -0500, Joe Schaefer wrote: >"Jeremy Howard" <jh_lists@fastmail.fm> writes: > >[snipped] >I posted a patch to modproxy a few months ago that specifically >addresses this issue. It has a ProxyPostMax directive that changes >it's behavior to a store-and-forward proxy for POST data (it also enabled >keepalives on the browser-side connection if they were enabled on the >frontend server.) > >It does this by buffering the data to a temp file on the proxy before >opening the backend socket. It's straightforward to make it buffer to >a portion of RAM instead- if you're interested I can post another patch >that does this also, but it's pretty much untested. Cool! Are these patches now incorporated in the core mod_proxy if we download it off the web? Or do we troll through the mailing list to find the patch? (Similar question about the forwarding of remote user patch someone posted last year). === From: "Jeremy Howard" <jh_lists@fastmail.fm> To: <modperl@apache.org> Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscriptsthat contain un-shared memory Date: Sat, 23 Dec 2000 15:36:52 +1100 Joe Schaefer wrote: > "Jeremy Howard" <jh_lists@fastmail.fm> writes: > > I don't know if Speedy fixes this, but one problem with > > mod_perl v1 is that if, for instance, a large POST > > request is being uploaded, this takes a whole perl > > interpreter while the transaction is occurring. This is > > at least one place where a Perl interpreter should not > > be needed. > > Of course, this could be overcome if an HTTP Accelerator > > is used that takes the whole request before passing it > > to a local httpd, but I don't know of any proxies that > > work this way (AFAIK they all pass the packets as they > > arrive). > I posted a patch to modproxy a few months ago that > specifically addresses this issue. It has a ProxyPostMax > directive that changes it's behavior to a > store-and-forward proxy for POST data (it also enabled > keepalives on the browser-side connection if they were > enabled on the frontend server.) FYI, this patch is at: http://www.mail-archive.com/modperl@apache.org/msg11072.html === Date: Fri, 22 Dec 2000 23:57:36 -0800 (PST) From: Ask Bjoern Hansen <ask@valueclick.com> To: Sam Horrocks <sam@daemoninc.com> cc: Stas Bekman <stas@stason.org>, Gunther Birznieks <gunther@extropia.com>, mod_perl list <modperl@apache.org>, speedycgi@newlug.org Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory On Thu, 21 Dec 2000, Sam Horrocks wrote: > > Folks, your discussion is not short of wrong statements that can be easily > > proved, but I don't find it useful. > > I don't follow. Are you saying that my conclusions are wrong, but > you don't want to bother explaining why? > > Would you agree with the following statement? > > Under apache-1, speedycgi scales better than mod_perl with > scripts that contain un-shared memory Maybe; but for one thing the feature set seems to be very different as others have pointed out. Secondly then the test that was originally quoted didn't have much to do with reality and showed that whoever made it didn't have much experience with setting up real-world high traffic systems with mod_perl. === Date: Sat, 23 Dec 2000 16:27:34 +0000 (GMT) From: Nigel Hamilton <nigel@e1mail.com> To: speedycgi@newlug.org cc: Sam Horrocks <sam@daemoninc.com>, Stas Bekman <stas@stason.org>, Gunther Birznieks <gunther@extropia.com>, mod_perl list <modperl@apache.org> Subject: Re: Fwd: [speedycgi] Speedycgi scales a better benchmark Hi, I think some of the 'threatened' replies to this thread speak more volumes than any benchmark. Sam has come up with a cool technology .... it will help bridge the technology adoption gap between traditional perl CGI + mod_perl - especially for ISP's. Well done Sam! === From: Sam Horrocks ((?)) To: Perrin Harkins <perrin@primenet.com> cc: Gunther Birznieks <gunther@extropia.com>, mod_perl list <modperl@apache.org>, speedycgi@newlug.org Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory Date: Thu, 04 Jan 2001 04:56:34 -0800 Sorry for the late reply - I've been out for the holidays. > By the way, how are you doing it? Do you use a mutex routine that works > in LIFO fashion? Speedycgi uses separate backend processes that run the perl interpreters. The frontend processes (the httpd's that are running mod_speedycgi) communicate with the backends, sending over the request and getting the output. Speedycgi uses some shared memory (an mmap'ed file in /tmp) to keep track of the backends and frontends. This shared memory contains the queue. When backends become free, they add themselves at the front of this queue. When the frontends need a backend they pull the first one from the front of this list. > > > I am saying that since SpeedyCGI uses MRU to allocate requests to perl > > interpreters, it winds up using a lot fewer interpreters to handle the > > same number of requests. > > What I was saying is that it doesn't make sense for one to need fewer > interpreters than the other to handle the same concurrency. If you have > 10 requests at the same time, you need 10 interpreters. There's no way > speedycgi can do it with fewer, unless it actually makes some of them > wait. That could be happening, due to the fork-on-demand model, although > your warmup round (priming the pump) should take care of that. What you say would be true if you had 10 processors and could get true concurrency. But on single-cpu systems you usually don't need 10 unix processes to handle 10 requests concurrently, since they get serialized by the kernel anyways. I'll try to show how mod_perl handles 10 concurrent requests, and compare that to mod_speedycgi so you can see the difference. For mod_perl, let's assume we have 10 httpd's, h1 through h10, when the 10 concurent requests come in. h1 has aquired the mutex, and h2-h10 are waiting (in order) on the mutex. Here's how the cpu actually runs the processes: h1 accepts h1 releases the mutex, making h2 runnable h1 runs the perl code and produces the results h1 waits for the mutex h2 accepts h2 releases the mutex, making h3 runnable h2 runs the perl code and produces the results h2 waits for the mutex h3 accepts ... This is pretty straightforward. Each of h1-h10 run the perl code exactly once. They may not run exactly in this order since a process could get pre-empted, or blocked waiting to send data to the client, etc. But regardless, each of the 10 processes will run the perl code exactly once. Here's the mod_speedycgi example - it too uses httpd's h1-h10, and they all take turns running the mod_speedycgi frontend code. But the backends, where the perl code is, don't have to all be run fairly - they use MRU instead. I'll use b1 and b2 to represent 2 speedycgi backend processes, already queued up in that order. Here's a possible speedycgi scenario: h1 accepts h1 releases the mutex, making h2 runnable h1 sends a request to b1, making b1 runnable h2 accepts h2 releases the mutex, making h3 runnable h2 sends a request to b2, making b2 runnable b1 runs the perl code and sends the results to h1, making h1 runnable b1 adds itself to the front of the queue h3 accepts h3 releases the mutex, making h4 runnable h3 sends a request to b1, making b1 runnable b2 runs the perl code and sends the results to h2, making h2 runnable b2 adds itself to the front of the queue h1 produces the results it got from b1 h1 waits for the mutex h4 accepts h4 releases the mutex, making h5 runnable h4 sends a request to b2, making b2 runnable b1 runs the perl code and sends the results to h3, making h3 runnable b1 adds itself to the front of the queue h2 produces the results it got from b2 h2 waits for the mutex h5 accepts h5 release the mutex, making h6 runnable h5 sends a request to b1, making b1 runnable b2 runs the perl code and sends the results to h4, making h4 runnable b2 adds itself to the front of the queue This may be hard to follow, but hopefully you can see that the 10 httpd's just take turns using b1 and b2 over and over. So, the 10 conncurrent requests end up being handled by just two perl backend processes. Again, this is simplified. If the perl processes get blocked, or pre-empted, you'll end up using more of them. But generally, the LIFO will cause SpeedyCGI to sort-of settle into the smallest number of processes needed for the task. The difference between the two approaches is that the mod_perl implementation forces unix to use 10 separate perl processes, while the mod_speedycgi implementation sort-of decides on the fly how many different processes are needed. > > Please let me know what you think I should change. So far my > > benchmarks only show one trend, but if you can tell me specifically > > what I'm doing wrong (and it's something reasonable), I'll try it. > > Try setting MinSpareServers as low as possible and setting MaxClients to a > value that will prevent swapping. Then set ab for a concurrency equal to > your MaxClients setting. I previously had set MinSpareServers to 1 - it did help mod_perl get to a higher level, but didn't change the overall trend. I found that setting MaxClients to 100 stopped the paging. At concurrency level 100, both mod_perl and mod_speedycgi showed similar rates with ab. Even at higher levels (300), they were comparable. But, to show that the underlying problem is still there, I then changed the hello_world script and doubled the amount of un-shared memory. And of course the problem then came back for mod_perl, although speedycgi continued to work fine. I think this shows that mod_perl is still using quite a bit more memory than speedycgi to provide the same service. > > I believe that with speedycgi you don't have to lower the MaxClients > > setting, because it's able to handle a larger number of clients, at > > least in this test. > > Maybe what you're seeing is an ability to handle a larger number of > requests (as opposed to clients) because of the performance benefit I > mentioned above. I don't follow. > I don't know how hard ab tries to make sure you really > have n simultaneous clients at any given time. I do know that the ab "-c" option does seem to have an effect on the tests I've been running. > > In other words, if with mod_perl you had to turn > > away requests, but with mod_speedycgi you did not, that would just > > prove that speedycgi is more scalable. > > Are the speedycgi+Apache processes smaller than the mod_perl > processes? If not, the maximum number of concurrent requests you can > handle on a given box is going to be the same. The size of the httpds running mod_speedycgi, plus the size of speedycgi perl processes is significantly smaller than the total size of the httpd's running mod_perl. The reason for this is that only a handful of perl processes are required by speedycgi to handle the same load, whereas mod_perl uses a perl interpreter in all of the httpds. === To: speedycgi@newlug.org cc: Ken Williams <ken@forum.swarthmore.edu>, mod_perl list <modperl@apache.org> Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory Date: Thu, 04 Jan 2001 05:03:26 -0800 I don't agree. SpeedyCGI handles the same load with a whole lot fewer perl interpreters, thus reducing the memory requirements significantly. See my previous post for a more detailed explanation. > On Thu, 21 Dec 2000, Ken Williams wrote: > > So in a sense, I think you're both correct. If "concurrency" means > > the number of requests that can be handled at once, both systems are > > necessarily (and trivially) equivalent. This isn't a very useful > > measurement, though; a more useful one is how many children (or > > perhaps how much memory) will be necessary to handle a given number of > > incoming requests per second, and with this metric the two systems > > could perform differently. > > Yes, well put. And that actually brings me back around to my original > hypothesis, which is that once you reach the maximum number of > interprerters that can be run on the box before swapping, it no longer > makes a difference if you're using LRU or MRU. That's because all > interpreters are busy all the time, and the RAM for lexicals has already > been allocated in all of them. At that point, it's a question of which > system can fit more interpreters in RAM at once, and I still think > mod_perl would come out on top there because of the shared memory. Of > course most people don't run their servers at full throttle, and at less > than total saturation I would expect speedycgi to use less RAM and > possibly be faster. > > So I guess I'm saying exactly the opposite of the original assertion: > mod_perl is more scalable if you define "scalable" as maximum requests per > second on a given machine, but speedycgi uses fewer resources at less than > peak loads which would make it more attractive for ISPs and other people > who use their servers for multiple tasks. > > This is all hypothetical and I don't have time to experiment with it until > after the holidays, but I think the logic is correct. > === To: "Jeremy Howard" <jh_lists@fastmail.fm> cc: "Perrin Harkins" <perrin@primenet.com>, "Gunther Birznieks" <gunther@extropia.com>, "mod_perl list" <modperl@apache.org>, speedycgi@newlug.org Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscriptsthat contain un-shared memory Date: Thu, 04 Jan 2001 05:20:43 -0800 This is planned for a future release of speedycgi, though there will probably be an option to set a maximum number of bytes that can be bufferred before the frontend contacts a perl interpreter and starts passing over the bytes. Currently you can do this sort of acceleration with script output if you use the "speedy" binary (not mod_speedycgi), and you set the BufsizGet option high enough so that it's able to buffer all the output from your script. The perl interpreter will then be able to detach and go handle other requests while the frontend process waits for the output to drain. > Perrin Harkins wrote: > > What I was saying is that it doesn't make sense for one to need fewer > > interpreters than the other to handle the same concurrency. If you have > > 10 requests at the same time, you need 10 interpreters. There's no way > > speedycgi can do it with fewer, unless it actually makes some of them > > wait. That could be happening, due to the fork-on-demand model, although > > your warmup round (priming the pump) should take care of that. > > > I don't know if Speedy fixes this, but one problem with mod_perl v1 is that > if, for instance, a large POST request is being uploaded, this takes a whole > perl interpreter while the transaction is occurring. This is at least one > place where a Perl interpreter should not be needed. > > Of course, this could be overcome if an HTTP Accelerator is used that takes > the whole request before passing it to a local httpd, but I don't know of > any proxies that work this way (AFAIK they all pass the packets as they > arrive). === From: "Les Mikesell" <lesmikesell@home.com> To: "Perrin Harkins" <perrin@primenet.com>, "Sam Horrocks" <sam@daemoninc.com> Cc: "Gunther Birznieks" <gunther@extropia.com>, "mod_perl list" <modperl@apache.org>, <speedycgi@newlug.org> References: <18795.978612994@daemonweb.daemoninc.com> Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory Date: Thu, 4 Jan 2001 08:24:08 -0600 "Sam Horrocks" <sam@daemoninc.com> wrote: ((???)) > > Are the speedycgi+Apache processes smaller than the mod_perl > > processes? If not, the maximum number of concurrent requests you can > > handle on a given box is going to be the same. > > The size of the httpds running mod_speedycgi, plus the size of speedycgi > perl processes is significantly smaller than the total size of the httpd's > running mod_perl. That would be true if you only ran one mod_perl'd httpd, but can you give a better comparison to the usual setup for a busy site where you run a non-mod_perl lightweight front end and let mod_rewrite decide what is proxied through to the larger mod_perl'd backend, letting apache decide how many backends you need to have running? > The reason for this is that only a handful of perl processes are required by > speedycgi to handle the same load, whereas mod_perl uses a perl interpreter > in all of the httpds. I always see at least a 10-1 ratio of front-to-back end httpd's when serving over the internet. One effect that is difficult to benchmark is that clients connecting over the internet are often slow and will hold up the process that is delivering the data even though the processing has been completed. The proxy approach provides some buffering and allows the backend to move on more quickly. Does speedycgi do the same? === Date: Thu, 4 Jan 2001 17:15:35 +0100 From: Roger Espel Llima <espel@iagora.net> To: Jeremy Howard <jh_lists@fastmail.fm> Cc: modperl@apache.org Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscriptsthat contain un-shared memory "Jeremy Howard" <jh_lists@fastmail.fm> wrote: > A backend server can realistically handle multiple frontend requests, since > the frontend server must stick around until the data has been delivered > to the client (at least that's my understanding of the lingering-close > issue that was recently discussed at length here). I won't enter the {Fast,Speedy}-CGI debates, having never played with these, but the picture you're painting about delivering data to the clients is just a little bit too bleak. With a frontend/backend mod_perl setup, the frontend server sticks around for a second or two as part of the lingering_close routine, but it doesn't have to wait for the client to finish reading all the data. Fortunately enough, spoonfeeding data to slow clients is handled by the OS kernel. === Date: Thu, 04 Jan 2001 20:47:22 -0800 From: Perrin Harkins <perrin@primenet.com> To: Sam Horrocks <sam@daemoninc.com> CC: mod_perl list <modperl@apache.org>, speedycgi@newlug.org Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory Hi Sam, I think we're talking in circles here a bit, and I don't want to diminish the original point, which I read as "MRU process selection is a good idea for Perl-based servers." Your tests showed that this was true. Let me just try to explain my reasoning. I'll define a couple of my base assumptions, in case you disagree with them. - Slices of CPU time doled out by the kernel are very small - so small that processes can be considered concurrent, even though technically they are handled serially. - A set of requests can be considered "simultaneous" if they all arrive and start being handled in a period of time shorter than the time it takes to service a request. Operating on these two assumptions, I say that 10 simultaneous requests will require 10 interpreters to service them. There's no way to handle them with fewer, unless you queue up some of the requests and make them wait. I also say that if you have a top limit of 10 interpreters on your machine because of memory constraints, and you're sending in 10 simultaneous requests constantly, all interpreters will be used all the time. In that case it makes no difference to the throughput whether you use MRU or LRU. > What you say would be true if you had 10 processors and could get > true concurrency. But on single-cpu systems you usually don't need > 10 unix processes to handle 10 requests concurrently, since they get > serialized by the kernel anyways. I think the CPU slices are smaller than that. I don't know much about process scheduling, so I could be wrong. I would agree with you if we were talking about requests that were coming in with more time between them. Speedycgi will definitely use fewer interpreters in that case. > I found that setting MaxClients to 100 stopped the paging. At concurrency > level 100, both mod_perl and mod_speedycgi showed similar rates with ab. > Even at higher levels (300), they were comparable. That's what I would expect if both systems have a similar limit of how many interpreters they can fit in RAM at once. Shared memory would help here, since it would allow more interpreters to run. By the way, do you limit the number of SpeedyCGI processes as well? it seems like you'd have to, or they'd start swapping too when you throw too many requests in. > But, to show that the underlying problem is still there, I then changed > the hello_world script and doubled the amount of un-shared memory. > And of course the problem then came back for mod_perl, although speedycgi > continued to work fine. I think this shows that mod_perl is still > using quite a bit more memory than speedycgi to provide the same service. I'm guessing that what happened was you ran mod_perl into swap again. You need to adjust MaxClients when your process size changes significantly. > > > I believe that with speedycgi you don't have to lower the MaxClients > > > setting, because it's able to handle a larger number of clients, at > > > least in this test. > > > > Maybe what you're seeing is an ability to handle a larger number of > > requests (as opposed to clients) because of the performance benefit I > > mentioned above. > > I don't follow. When not all processes are in use, I think Speedy would handle requests more quickly, which would allow it to handle n requests in less time than mod_perl. Saying it handles more clients implies that the requests are simultaneous. I don't think it can handle more simultaneous requests. > > Are the speedycgi+Apache processes smaller than the mod_perl > > processes? If not, the maximum number of concurrent requests you can > > handle on a given box is going to be the same. > > The size of the httpds running mod_speedycgi, plus the size of speedycgi > perl processes is significantly smaller than the total size of the httpd's > running mod_perl. > > The reason for this is that only a handful of perl processes are required by > speedycgi to handle the same load, whereas mod_perl uses a perl interpreter > in all of the httpds. I think this is true at lower levels, but not when the number of simultaneous requests gets up to the maximum that the box can handle. At that point, it's a question of how many interpreters can fit in memory. I would expect the size of one Speedy + one httpd to be about the same as one mod_perl/httpd when no memory is shared. With sharing, you'd be able to run more processes. === To: Roger Espel Llima <espel@iagora.net>, modperl@apache.org Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscriptsthat contain un-shared memory From: Joe Schaefer <joe+apache@sunstarsys.com> Date: 05 Jan 2001 00:53:05 -0500 Roger Espel Llima <espel@iagora.net> writes: > "Jeremy Howard" <jh_lists@fastmail.fm> wrote: I'm pretty sure I'm the person whose words you're quoting here, not Jeremy's. > > A backend server can realistically handle multiple frontend requests, since > > the frontend server must stick around until the data has been delivered > > to the client (at least that's my understanding of the lingering-close > > issue that was recently discussed at length here). > > I won't enter the {Fast,Speedy}-CGI debates, having never played > with these, but the picture you're painting about delivering data to > the clients is just a little bit too bleak. It's a "hypothetical", and I obviously exaggerated the numbers to show the advantage of a front/back end architecture for "comparative benchmarks" like these. As you well know, the relevant issue is the percentage of time spent generating the content relative to the entire time spent servicing the request. If you don't like seconds, rescale it to your favorite time window. > With a frontend/backend mod_perl setup, the frontend server sticks > around for a second or two as part of the lingering_close routine, > but it doesn't have to wait for the client to finish reading all the > data. Fortunately enough, spoonfeeding data to slow clients is > handled by the OS kernel. Right- relative to the time it takes the backend to actually create and deliver the content to the frontend, a second or two can be an eternity. === From: Sam Horrocks <sam@daemoninc.com> To: "Les Mikesell" <lesmikesell@home.com> cc: "Perrin Harkins" <perrin@primenet.com>, "Gunther Birznieks" <gunther@extropia.com>, "mod_perl list" <modperl@apache.org>, speedycgi@newlug.org Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory Date: Fri, 05 Jan 2001 04:28:59 -0800 > > > Are the speedycgi+Apache processes smaller than the mod_perl > > > processes? If not, the maximum number of concurrent requests you can > > > handle on a given box is going to be the same. > > > > The size of the httpds running mod_speedycgi, plus the size of speedycgi > > perl processes is significantly smaller than the total size of the httpd's > > running mod_perl. > > That would be true if you only ran one mod_perl'd httpd, but can you > give a better comparison to the usual setup for a busy site where > you run a non-mod_perl lightweight front end and let mod_rewrite > decide what is proxied through to the larger mod_perl'd backend, > letting apache decide how many backends you need to have > running? The fundamental differences would remain the same - even in the mod_perl backend, the requests will be spread out over all the httpd's that are running, whereas speedycgi would tend to use fewer perl interpreters to handle the same load. But with this setup, the mod_perl backend could probably be set to run fewer httpds because it doesn't have to wait on slow clients. And the fewer httpd's you run with mod_perl the smaller your total memory. > > The reason for this is that only a handful of perl processes are required by > > speedycgi to handle the same load, whereas mod_perl uses a perl interpreter > > in all of the httpds. > > I always see at least a 10-1 ratio of front-to-back end httpd's when serving > over the internet. One effect that is difficult to benchmark is that clients > connecting over the internet are often slow and will hold up the process > that is delivering the data even though the processing has been completed. > The proxy approach provides some buffering and allows the backend > to move on more quickly. Does speedycgi do the same? There are plans to make it so that SpeedyCGI does more buffering of the output in memory, perhaps eliminating the need for caching frontend webserver. It works now only for the "speedy" binary (not mod_speedycgi) if you set the BufsizGet value high enough. Of course you could add a caching webserver in front of the SpeedyCGI server just like you do with mod_perl now. So yes you can do the same with speedycgi now. === To: perrin@primenet.com cc: mod_perl list <modperl@apache.org>, speedycgi@newlug.org Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory Date: Sat, 06 Jan 2001 04:32:34 -0800 From: Sam Horrocks <sam@daemoninc.com> > Let me just try to explain my reasoning. I'll define a couple of my > base assumptions, in case you disagree with them. > > - Slices of CPU time doled out by the kernel are very small - so small > that processes can be considered concurrent, even though technically > they are handled serially. Don't agree. You're equating the model with the implemntation. Unix processes model concurrency, but when it comes down to it, if you don't have more CPU's than processes, you can only simulate concurrency. Each process runs until it either blocks on a resource (timer, network, disk, pipe to another process, etc), or a higher priority process pre-empts it, or it's taken so much time that the kernel wants to give another process a chance to run. > - A set of requests can be considered "simultaneous" if they all arrive > and start being handled in a period of time shorter than the time it > takes to service a request. That sounds OK. > Operating on these two assumptions, I say that 10 simultaneous requests > will require 10 interpreters to service them. There's no way to handle > them with fewer, unless you queue up some of the requests and make them > wait. Right. And that waiting takes place: - In the mutex around the accept call in the httpd - In the kernel's run queue when the process is ready to run, but is waiting for other processes ahead of it. So, since there is only one CPU, then in both cases (mod_perl and SpeedyCGI), processes spend time waiting. But what happens in the case of SpeedyCGI is that while some of the httpd's are waiting, one of the earlier speedycgi perl interpreters has already finished its run through the perl code and has put itself back at the front of the speedycgi queue. And by the time that Nth httpd gets around to running, it can re-use that first perl interpreter instead of needing yet another process. This is why it's important that you don't assume that Unix is truly concurrent. > I also say that if you have a top limit of 10 interpreters on your > machine because of memory constraints, and you're sending in 10 > simultaneous requests constantly, all interpreters will be used all the > time. In that case it makes no difference to the throughput whether you > use MRU or LRU. This is not true for SpeedyCGI, because of the reason I give above. 10 simultaneous requests will not necessarily require 10 interpreters. > > What you say would be true if you had 10 processors and could get > > true concurrency. But on single-cpu systems you usually don't need > > 10 unix processes to handle 10 requests concurrently, since they get > > serialized by the kernel anyways. > > I think the CPU slices are smaller than that. I don't know much about > process scheduling, so I could be wrong. I would agree with you if we > were talking about requests that were coming in with more time between > them. Speedycgi will definitely use fewer interpreters in that case. This url: http://www.oreilly.com/catalog/linuxkernel/chapter/ch10.html says the default timeslice is 210ms (1/5th of a second) for Linux on a PC. There's also lots of good info there on Linux scheduling. > > I found that setting MaxClients to 100 stopped the paging. At concurrency > > level 100, both mod_perl and mod_speedycgi showed similar rates with ab. > > Even at higher levels (300), they were comparable. > > That's what I would expect if both systems have a similar limit of how > many interpreters they can fit in RAM at once. Shared memory would help > here, since it would allow more interpreters to run. > > By the way, do you limit the number of SpeedyCGI processes as well? it > seems like you'd have to, or they'd start swapping too when you throw > too many requests in. SpeedyCGI has an optional limit on the number of processes, but I didn't use it in my testing. > > But, to show that the underlying problem is still there, I then changed > > the hello_world script and doubled the amount of un-shared memory. > > And of course the problem then came back for mod_perl, although speedycgi > > continued to work fine. I think this shows that mod_perl is still > > using quite a bit more memory than speedycgi to provide the same service. > > I'm guessing that what happened was you ran mod_perl into swap again. > You need to adjust MaxClients when your process size changes > significantly. Right, but this also points out how difficult it is to get mod_perl tuning just right. My opinion is that the MRU design adapts more dynamically to the load. > > > > I believe that with speedycgi you don't have to lower the MaxClients > > > > setting, because it's able to handle a larger number of clients, at > > > > least in this test. > > > > > > Maybe what you're seeing is an ability to handle a larger number of > > > requests (as opposed to clients) because of the performance benefit I > > > mentioned above. > > > > I don't follow. > > When not all processes are in use, I think Speedy would handle requests > more quickly, which would allow it to handle n requests in less time > than mod_perl. Saying it handles more clients implies that the requests > are simultaneous. I don't think it can handle more simultaneous > requests. Don't agree. > > > Are the speedycgi+Apache processes smaller than the mod_perl > > > processes? If not, the maximum number of concurrent requests you can > > > handle on a given box is going to be the same. > > > > The size of the httpds running mod_speedycgi, plus the size of speedycgi > > perl processes is significantly smaller than the total size of the httpd's > > running mod_perl. > > > > The reason for this is that only a handful of perl processes are required by > > speedycgi to handle the same load, whereas mod_perl uses a perl interpreter > > in all of the httpds. > > I think this is true at lower levels, but not when the number of > simultaneous requests gets up to the maximum that the box can handle. > At that point, it's a question of how many interpreters can fit in > memory. I would expect the size of one Speedy + one httpd to be about > the same as one mod_perl/httpd when no memory is shared. With sharing, > you'd be able to run more processes. I'd agree that the size of one Speedy backend + one httpd would be the same or even greater than the size of one mod_perl/httpd when no memory is shared. But because the speedycgi httpds are small (no perl in them) and the number of SpeedyCGI perl interpreters is small, the total memory required is significantly smaller for the same load. === Date: Sat, 06 Jan 2001 13:35:01 -0800 From: Perrin Harkins <perrin@primenet.com> Reply-To: perrin@primenet.com To: Sam Horrocks <sam@daemoninc.com> CC: mod_perl list <modperl@apache.org>, speedycgi@newlug.org Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory Sam Horrocks wrote: > Don't agree. You're equating the model with the implemntation. > Unix processes model concurrency, but when it comes down to it, if you > don't have more CPU's than processes, you can only simulate concurrency. [...] > This url: > > http://www.oreilly.com/catalog/linuxkernel/chapter/ch10.html > > says the default timeslice is 210ms (1/5th of a second) for Linux on a PC. > There's also lots of good info there on Linux scheduling. Thanks for the info. This makes much more sense to me now. It sounds like using an MRU algrorithm for process selection is automatically finding the sweet spot in terms of how many processes can run within the space of one request and coming close to the ideal of never having unused processes in memory. Now I'm really looking forward to getting MRU and shared memory in the same package and seeing how high I can scale my hardware. === Date: Sat, 06 Jan 2001 16:46:30 -0500 From: Buddy Lee Haystack <haystack@email.rentzone.org> To: perrin@primenet.com Cc: Sam Horrocks <sam@daemoninc.com>, mod_perl list <modperl@apache.org>, speedycgi@newlug.org Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory Does this mean that mod_perl's memory hunger will curbed in the future using some of the neat tricks in Speedycgi? Perrin Harkins wrote: > > Sam Horrocks wrote: > > Don't agree. You're equating the model with the implemntation. > > Unix processes model concurrency, but when it comes down to it, if you > > don't have more CPU's than processes, you can only simulate concurrency. > [...] > > This url: > > > > http://www.oreilly.com/catalog/linuxkernel/chapter/ch10.html > > > > says the default timeslice is 210ms (1/5th of a second) for Linux on a PC. > > There's also lots of good info there on Linux scheduling. > > Thanks for the info. This makes much more sense to me now. It sounds > like using an MRU algrorithm for process selection is automatically > finding the sweet spot in terms of how many processes can run within the > space of one request and coming close to the ideal of never having > unused processes in memory. Now I'm really looking forward to getting > MRU and shared memory in the same package and seeing how high I can > scale my hardware. === Date: Sat, 06 Jan 2001 13:47:51 -0800 From: Perrin Harkins <perrin@primenet.com> To: haystack@email.rentzone.org Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory Buddy Lee Haystack wrote: > Does this mean that mod_perl's memory hunger will curbed > in the future using some of the neat tricks in Speedycgi? Yes. The upcoming mod_perl 2 (running on Apache 2) will use MRU to select threads. Doug demoed this at ApacheCon a few months back. === From: "Les Mikesell" <lesmikesell@home.com> To: <perrin@primenet.com>, "Sam Horrocks" <sam@daemoninc.com> Cc: "mod_perl list" <modperl@apache.org>, <speedycgi@newlug.org> Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory Date: Sat, 6 Jan 2001 15:56:44 -0600 "Sam Horrocks" <sam@daemoninc.com> wrote: > Right, but this also points out how difficult it is to get mod_perl > tuning just right. My opinion is that the MRU design adapts more > dynamically to the load. How would this compare to apache's process management when using the front/back end approach? > I'd agree that the size of one Speedy backend + one httpd would be the > same or even greater than the size of one mod_perl/httpd when no memory > is shared. But because the speedycgi httpds are small (no perl in them) > and the number of SpeedyCGI perl interpreters is small, the total memory > required is significantly smaller for the same load. Likewise, it would be helpful if you would always make the comparison to the dual httpd setup that is often used for busy sites. I think it must really boil down to the efficiency of your IPC vs. access to the full apache environment. === Date: Sat, 06 Jan 2001 14:08:56 -0800 From: Joshua Chamas <joshua@chamas.com> To: Sam Horrocks <sam@daemoninc.com> Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory Sam Horrocks wrote: > Don't agree. You're equating the model with the implemntation. > Unix processes model concurrency, but when it comes down to it, if you > don't have more CPU's than processes, you can only simulate concurrency. Hey Sam, nice module. I just installed your SpeedyCGI for a good 'ol HelloWorld benchmark & it was a snap, well done. I'd like to add to the numbers below that a fair benchmark would be between mod_proxy in front of a mod_perl server and mod_speedycgi, as it would be a similar memory saving model ( this is how we often scale mod_perl )... both models would end up forwarding back to a smaller set of persistent perl interpreters. However, I did not do such a benchmark, so SpeedyCGI looses out a bit for the extra layer it has to do :( This is based on the suite at http://www.chamas.com/bench/hello.tar.gz, but I have not included the speedy test in that yet. -- Josh Test Name Test File Hits/sec Total Hits Total Time sec/Hits Bytes/Hit ------------ ---------- ---------- ---------- ---------- ---------- ---------- Apache::Registry v2.01 CGI.pm hello.cgi 451.9 27128 hits 60.03 sec 0.002213 216 bytes Speedy CGI hello.cgi 375.2 22518 hits 60.02 sec 0.002665 216 bytes Apache Server Header Tokens --------------------------- (Unix) Apache/1.3.14 OpenSSL/0.9.6 PHP/4.0.3pl1 mod_perl/1.24 mod_ssl/2.7.1 === To: speedycgi@newlug.org cc: perrin@primenet.com, "mod_perl list" <modperl@apache.org> Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory Date: Sat, 06 Jan 2001 14:37:34 -0800 From: Sam Horrocks <sam@daemoninc.com> > > Right, but this also points out how difficult it is to get mod_perl > > tuning just right. My opinion is that the MRU design adapts more > > dynamically to the load. > > How would this compare to apache's process management when > using the front/back end approach? Same thing applies. The front/back end approach does not change the fundamentals. > > I'd agree that the size of one Speedy backend + one httpd would be the > > same or even greater than the size of one mod_perl/httpd when no memory > > is shared. But because the speedycgi httpds are small (no perl in them) > > and the number of SpeedyCGI perl interpreters is small, the total memory > > required is significantly smaller for the same load. > > Likewise, it would be helpful if you would always make the comparison > to the dual httpd setup that is often used for busy sites. I think it must > really boil down to the efficiency of your IPC vs. access to the full > apache environment. The reason I don't include that comparison is that it's not fundamental to the differences between mod_perl and speedycgi or LRU and MRU that I have been trying to point out. Regardless of whether you add a frontend or not, the mod_perl process selection remains LRU and the speedycgi process selection remains MRU. === To: Joshua Chamas <joshua@chamas.com> cc: perrin@primenet.com, mod_perl list <modperl@apache.org>, speedycgi@newlug.org Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory Date: Sat, 06 Jan 2001 15:58:27 -0800 From: Sam Horrocks <sam@daemoninc.com> A few things: - In your results, could you add the speedycgi version number (2.02), and the fact that this is using the mod_speedycgi frontend. The fork/exec frontend will be much slower on hello-world so I don't want people to get the wrong idea. You may want to benchmark the fork/exec version as well. - You may be able to eke out a little more performance by setting MaxRuns to 0 (infinite). The is set for mod_speedycgi using the SpeedyMaxRuns directive, or on the command-line using "-r0". This setting is similar to the MaxRequestsPerChild setting in apache. - My tests show mod_perl/speedy much closer than yours do, even with MaxRuns at its default value of 500. Maybe you're running on a different OS than I am - I'm using Redhat 6.2. I'm also running one rev lower of mod_perl in case that matters. > Hey Sam, nice module. I just installed your SpeedyCGI for a good 'ol > HelloWorld benchmark & it was a snap, well done. I'd like to add to the > numbers below that a fair benchmark would be between mod_proxy in front > of a mod_perl server and mod_speedycgi, as it would be a similar memory > saving model ( this is how we often scale mod_perl )... both models would > end up forwarding back to a smaller set of persistent perl interpreters. > > However, I did not do such a benchmark, so SpeedyCGI looses out a > bit for the extra layer it has to do :( This is based on the > suite at http://www.chamas.com/bench/hello.tar.gz, but I have not > included the speedy test in that yet. > > -- Josh > > Test Name Test File Hits/sec Total Hits Total Time sec/Hits Bytes/Hit > ------------ ---------- ---------- ---------- ---------- ---------- ---------- > Apache::Registry v2.01 CGI.pm hello.cgi 451.9 27128 hits 60.03 sec 0.002213 216 bytes > Speedy CGI hello.cgi 375.2 22518 hits 60.02 sec 0.002665 216 bytes > > Apache Server Header Tokens > --------------------------- > (Unix) > Apache/1.3.14 > OpenSSL/0.9.6 > PHP/4.0.3pl1 > mod_perl/1.24 > mod_ssl/2.7.1 === From: "Les Mikesell" <lesmikesell@home.com> To: <speedycgi@newlug.org>, "Sam Horrocks" <sam@daemoninc.com> Cc: <perrin@primenet.com>, "mod_perl list" <modperl@apache.org> Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory Date: Sat, 6 Jan 2001 23:10:02 -0600 "Sam Horrocks" <sam@daemoninc.com> wrote: > > > Right, but this also points out how difficult it is to get mod_perl > > > tuning just right. My opinion is that the MRU design adapts more > > > dynamically to the load. > > > > How would this compare to apache's process management when > > using the front/back end approach? > > Same thing applies. The front/back end approach does not change the > fundamentals. It changes them drastically in the world of slow internet connections, but perhaps not much in artificial benchmarks or LAN use. I think you can reduce the problem to: How much time do you spend in non-perl apache code vs. how much time you spend in perl code. and the solution to: Only use the memory footprint of perl for the miminal time it is needed. If your I/O is slow and your program complexity minimal, the bulk of the wall-clock time is spent in i/o wait by non-perl apache code. Using a front-end proxy greatly reduces this time (and correspondingly the ratio of time spent in non-perl code) for the backend where it matters because you are tying up a copy of perl in memory. Likewise, increasing the complexity of the perl code will reduce this ratio, reducing the potential for saving memory regardless of what you do, so benchmarking a trivial perl program will likely be misleading. > > > I'd agree that the size of one Speedy backend + one httpd would be the > > > same or even greater than the size of one mod_perl/httpd when no memory > > > is shared. But because the speedycgi httpds are small (no perl in them) > > > and the number of SpeedyCGI perl interpreters is small, the total memory > > > required is significantly smaller for the same load. > > > > Likewise, it would be helpful if you would always make the comparison > > to the dual httpd setup that is often used for busy sites. I think it must > > really boil down to the efficiency of your IPC vs. access to the full > > apache environment. > > The reason I don't include that comparison is that it's not fundamental > to the differences between mod_perl and speedycgi or LRU and MRU that > I have been trying to point out. Regardless of whether you add a > frontend or not, the mod_perl process selection remains LRU and the > speedycgi process selection remains MRU. I don't think I understand what you mean by LRU. When I view the Apache server-status with ExtendedStatus On, it appears that the backend server processes recycle themselves as soon as they are free instead of cycling sequentially through all the available processes. Did you mean to imply otherwise or are you talking about something else? === Date: Sat, 06 Jan 2001 23:51:34 -0800 From: Joshua Chamas <joshua@chamas.com> To: Sam Horrocks <sam@daemoninc.com> Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory Sam Horrocks wrote: > > A few things: > > - In your results, could you add the speedycgi version number (2.02), > and the fact that this is using the mod_speedycgi frontend. The version numbers are gathered at runtime, so for mod_speedycgi, this would get picked up if you registered it in the Apache server header that gets sent out. I'll list the test as mod_speedycgi. > The fork/exec frontend will be much slower on hello-world so I don't > want people to get the wrong idea. You may want to benchmark > the fork/exec version as well. > If its slower than what's the point :) If mod_speedycgi is the faster way to run it, they that should be good enough, no? If you would like to contribute that test to the suite, please do so. > - You may be able to eke out a little more performance by setting > MaxRuns to 0 (infinite). The is set for mod_speedycgi using the > SpeedyMaxRuns directive, or on the command-line using "-r0". > This setting is similar to the MaxRequestsPerChild setting in apache. > Will do. > - My tests show mod_perl/speedy much closer than yours do, even with > MaxRuns at its default value of 500. Maybe you're running on > a different OS than I am - I'm using Redhat 6.2. I'm also running > one rev lower of mod_perl in case that matters. > I'm running the same thing, RH 6.2, I don't know if the mod_perl rev matters, but what often does matter is that I have 2 CPUs in my box, so my results often look different from other peoples. === Date: Mon, 08 Jan 2001 09:50:22 -0600 From: "Keith G. Murphy" <keithmur@mindspring.com> To: mod_perl list <modperl@apache.org> Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory Les Mikesell wrote: > [cut] > > I don't think I understand what you mean by LRU. When I view the > Apache server-status with ExtendedStatus On, it appears that > the backend server processes recycle themselves as soon as they > are free instead of cycling sequentially through all the available > processes. Did you mean to imply otherwise or are you talking > about something else? > Be careful here. Note my message earlier in the thread about the misleading effect of persistent connections (HTTP 1.1). Perrin Harkins noted in another thread that it had fooled him as well as me. Not saying that's what you're seeing, just take it into account. (Quick-and-dirty test: run Netscape as the client browser; do you still see the same thing?) === Date: Sun, 14 Jan 2001 12:40:00 +0800 To: Sam Horrocks <sam@daemoninc.com>, perrin@primenet.com From: Gunther Birznieks <gunther@extropia.com> Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory Cc: mod_perl list <modperl@apache.org>, speedycgi@newlug.org I have just gotten around to reading this thread I've been saving for a rainy day. Well, it's not rainy, but I'm finally getting to it. Apologizes to those who hate when when people don't snip their reply mails but I am including it so that the entire context is not lost. Sam (or others who may understand Sam's explanation), I am still confused by this explanation of MRU helping when there are 10 processes serving 10 requests at all times. I understand MRU helping when the processes are not at max, but I don't see how it helps when they are at max utilization. It seems to me that if the wait is the same for mod_perl backend processes and speedyCGI processes, that it doesn't matter if some of the speedycgi processes cycle earlier than the mod_perl ones because all 10 will always be used. I did read and reread (once) the snippets about modeling concurrency and the HTTP waiting for an accept.. But I still don't understand how MRU helps when all the processes would be in use anyway. At that point they all have an equal chance of being called. Could you clarify this with a simpler example? Maybe 4 processes and a sample timeline of what happens to those when there are enough requests to keep all 4 busy all the time for speedyCGI and a mod_perl backend? === To: Gunther Birznieks <gunther@extropia.com> cc: perrin@primenet.com, mod_perl list <modperl@apache.org>, speedycgi@newlug.org Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory Date: Wed, 17 Jan 2001 03:19:46 -0800 From: Sam Horrocks <sam@daemoninc.com> I think the major problem is that you're assuming that just because there are 10 constant concurrent requests, that there have to be 10 perl processes serving those requests at all times in order to get maximum throughput. The problem with that assumption is that there is only one CPU - ten processes cannot all run simultaneously anyways, so you don't really need ten perl interpreters. I've been trying to think of better ways to explain this. I'll try to explain with an analogy - it's sort-of lame, but maybe it'll give you a mental picture of what's happening. To eliminate some confusion, this analogy doesn't address LRU/MRU, nor waiting on other events like network or disk i/o. It only tries to explain why you don't necessarily need 10 perl-interpreters to handle a stream of 10 concurrent requests on a single-CPU system. You own a fast-food restaurant. The players involved are: Your customers. These represent the http requests. Your cashiers. These represent the perl interpreters. Your cook. You only have one. THis represents your CPU. The normal flow of events is this: A cashier gets an order from a customer. The cashier goes and waits until the cook is free, and then gives the order to the cook. The cook then cooks the meal, taking 5-minutes for each meal. The cashier waits for the meal to be ready, then takes the meal and gives it to the customer. The cashier then serves another customer. The cashier/customer interaction takes a very small amount of time. The analogy is this: An http request (customer) arrives. It is given to a perl interpreter (cashier). A perl interpreter must wait for all other perl interpreters ahead of it to finish using the CPU (the cook). It can't serve any other requests until it finishes this one. When its turn arrives, the perl interpreter uses the CPU to process the perl code. It then finishes and gives the results over to the http client (the customer). Now, say in this analogy you begin the day with 10 customers in the store. At each 5-minute interval thereafter another customer arrives. So at time 0, there is a pool of 10 customers. At time +5, another customer arrives. At time +10, another customer arrives, ad infinitum. You could hire 10 cashiers in order to handle this load. What would happen is that the 10 cashiers would fairly quickly get all the orders from the first 10 customers simultaneously, and then start waiting for the cook. The 10 cashiers would queue up. Casher #1 would put in the first order. Cashiers 2-9 would wait their turn. After 5-minutes, cashier number 1 would receive the meal, deliver it to customer #1, and then serve the next customer (#11) that just arrived at the 5-minute mark. Cashier #1 would take customer #11's order, then queue up and wait in line for the cook - there will be 9 other cashiers already in line, so the wait will be long. At the 10-minute mark, cashier #2 would receive a meal from the cook, deliver it to customer #2, then go on and serve the next customer (#12) that just arrived. Cashier #2 would then go and wait in line for the cook. This continues on through all the cashiers in order 1-10, then repeating, 1-10, ad infinitum. Now even though you have 10 cashiers, most of their time is spent waiting to put in an order to the cook. Starting with customer #11, all customers will wait 50-minutes for their meal. When customer #11 comes in he/she will immediately get to place an order, but it will take the cashier 45-minutes to wait for the cook to become free, and another 5-minutes for the meal to be cooked. Same is true for customer #12, and all customers from then on. Now, the question is, could you get the same throughput with fewer cashiers? Say you had 2 cashiers instead. The 10 customers are there waiting. The 2 cashiers take orders from customers #1 and #2. Cashier #1 then gives the order to the cook and waits. Cashier #2 waits in line for the cook behind cashier #1. At the 5-minute mark, the first meal is done. Cashier #1 delivers the meal to customer #1, then serves customer #3. Cashier #1 then goes and stands in line behind cashier #2. At the 10-minute mark, cashier #2's meal is ready - it's delivered to customer #2 and then customer #4 is served. This continues on with the cashiers trading off between serving customers. Does the scenario with two cashiers go any more slowly than the one with 10 cashiers? No. When the 11th customer arrives at the 5-minute mark, what he/she sees is that customer #3 is just now putting in an order. There are 7 other people there waiting to put in orders. Customer #11 will wait 40 minutes until he/she puts in an order, then wait another 10 minutes for the meal to arrive. Same is true for customer #12, and all others arriving thereafter. The only difference between the two scenarious is the number of cashiers, and where the waiting is taking place. In the first scenario, each customer puts in their order immediately, then waits 50 minutes for it to arrive. In the second scenario each customer waits 40 minutes in to put in their order, then waits another 10 minutes for it to arrive. What I'm trying to show with this analogy is that no matter how many "simultaneous" requests you have, they all have to be serialized at some point because you only have one CPU. Either you can serialize them before they get to the perl interpreter, or afterward. Either way you wait on the CPU, and you get the same throughput. Does that help? === Date: Wed, 17 Jan 2001 23:05:13 +0800 To: Sam Horrocks <sam@daemoninc.com> From: Gunther Birznieks <gunther@extropia.com> Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory I guess as I get older I start to slip technically. :) This helps me a bit, but it doesn't really help me understand the final arguement (that MRU is still going to help on a fully loaded system). With some modification, I guess I am thinking that the cook is really the OS and the CPU is really the oven. But the hamburgers on an Intel oven have to be timesliced instead of left to cook and then after it's done the next hamburger is put on. So if we think of meals as Perl requests, the reality is that not all meals take the same amount of time to cook. A quarter pounder surely takes longer than your typical paper thin McDonald's Patty. The fact that a customer requests a meal that takes longer to cook than another one is relatively random. In fact in the real world, it is likely to be random. This means that it's possible for all 10 meals to be cooking but the 3rd meal gets done really fast, so another customer gets time sliced to use the oven for their meal -- which might be a long meal. In your testing, perhaps the problem is that you are benchmarking with a homogeneous process. So of course you are seeing this behavior that makes it look like serializing 10 connections is just the same wait as time slicing them and therefore an MRU algorithm works better (of course it works better, because you keep releasing the systems in order)... But in the world where the 3rd or 5th or 6th process may finish sooner and release sooner than others, then an MRU algorithm doesn't matter. And actually a process that finishes in 10 seconds shouldn't have to wait until a process than takes 30 seconds to complete has finished. And all 10 interpreters are in use at the same time, serving all requests and randomly popping off the queue and starting again where no MRU or LRU algorithm will really help. It's all the same. Anyway, maybe I am still not really getting it. Even with the fast food analogy. Maybe it is time to throw in the network time and other variables that seemed to make a difference in Perrin understanding how you were approaching the explanation. I am now curious -- on a fully loaded system of max 10 processes, did you see that SpeedyCGI scaled better than mod_perl on your benchmarks? Or are we still just speculating? === Date: Wed, 17 Jan 2001 11:08:09 -0500 From: Buddy Lee Haystack <haystack@email.rentzone.org> To: Sam Horrocks <sam@daemoninc.com> Cc: Gunther Birznieks <gunther@extropia.com>, perrin@primenet.com, mod_perl list <modperl@apache.org>, speedycgi@newlug.org Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory I have a wide assortment of queries on a site, some of which take several minutes to execute, while others execute in less than one second. If understand this analogy correctly, I'd be better off with the current incarnation of mod_perl because there would be more cashiers around to serve the "quick cups of coffee" that many customers request at my dinner. Is this correct? === To: Gunther Birznieks <gunther@extropia.com> cc: perrin@primenet.com, mod_perl list <modperl@apache.org>, speedycgi@newlug.org Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory Date: Wed, 17 Jan 2001 15:37:00 -0800 From: Sam Horrocks <sam@daemoninc.com> > I guess as I get older I start to slip technically. :) This helps me a bit, > but it doesn't really help me understand the final arguement (that MRU is > still going to help on a fully loaded system). > > With some modification, I guess I am thinking that the cook is really the > OS and the CPU is really the oven. But the hamburgers on an Intel oven have > to be timesliced instead of left to cook and then after it's done the next > hamburger is put on. > > So if we think of meals as Perl requests, the reality is that not all meals > take the same amount of time to cook. A quarter pounder surely takes longer > than your typical paper thin McDonald's Patty. > > The fact that a customer requests a meal that takes longer to cook than > another one is relatively random. In fact in the real world, it is likely > to be random. This means that it's possible for all 10 meals to be cooking > but the 3rd meal gets done really fast, so another customer gets time > sliced to use the oven for their meal -- which might be a long meal. I don't like your mods to the analogy, because they don't model how a CPU actually works. Even if the cook == the OS and the oven == the CPU, the oven *must* work on tasks sequentially. If you look at the assembly language for your Intel CPU you won't see anything about it doing multi-tasking. It does adds, subtracts, stores, loads, jumps, etc. It executes code sequentially. You must model this somewhere in your analogy if it's going to be accurate. So I'll modify your analogy to say the oven can only cook one thing at a time. Now, what you could do is have the cook take one of the longer meals (the 10 minute meatloaf) out of the oven in order to cook something small, then put the meatloaf back later to finish cooking. But the oven does *not* cook things in parallel. Remember that things have to cook for a very long time before they get timesliced -- 210ms is a long time for a CPU, and that's the default timeslice on a Linux PC. If we say the oven cooks things sequentially, it doesn't really change the overall results that I had in the previous example. The cook just puts things in the oven sequentially, in the order in which they were received from the cashiers - this represents the run queue in the OS. But the cashiers still sit there and wait for the meals from the cook, and the cook just stands there waiting for the oven to cook meals sequentially. > In your testing, perhaps the problem is that you are benchmarking with a > homogeneous process. So of course you are seeing this behavior that makes > it look like serializing 10 connections is just the same wait as time > slicing them and therefore an MRU algorithm works better (of course it > works better, because you keep releasing the systems in order)... > > But in the world where the 3rd or 5th or 6th process may finish sooner and > release sooner than others, then an MRU algorithm doesn't matter. And > actually a process that finishes in 10 seconds shouldn't have to wait until > a process than takes 30 seconds to complete has finished. No, homogeneity (or the lack of it) wouldn't make a difference. Those 3rd, 5th or 6th processes run only *after* the 1st and 2nd have finished using the CPU. And at that poiint you could re-use those interpreters that 1 and 2 were using. > And all 10 interpreters are in use at the same time, serving all requests > and randomly popping off the queue and starting again where no MRU or LRU > algorithm will really help. It's all the same. If in both the MRU/LRU case there were exactly 10 interpreters busy at all times, then you're right it wouldn't matter. But don't confuse the issues - 10 concurrent requests do *not* necessarily require 10 concurrent interpreters. The MRU has an affect on the way a stream of 10 concurrent requests are handled, and MRU results in those same requests being handled by fewer interpreters. > Anyway, maybe I am still not really getting it. Even with the fast food > analogy. Maybe it is time to throw in the network time and other variables > that seemed to make a difference in Perrin understanding how you were > approaching the explanation. Please again take a look at the first analogy. The CPU can't do multi-tasking. Until that gets straightened out, I don't think adding more to the analogy will help. Also, I think the analogy is about to break - that's why I put in extra disclaimers at the top. It was only intended to show that 10 concurrent requests don't necessarily require 10 perl interpreters in order to achieve maximum throughput. > I am now curious -- on a fully loaded system of max 10 processes, did you > see that SpeedyCGI scaled better than mod_perl on your benchmarks? Or are > we still just speculating? It is actually possible to benchmark. Given the same concurrent load and the same number of httpds running, speedycgi will use fewer perl interpreters than mod_perl. This will usually result in speedycgi using less RAM, except under light loads, or if the amount of shared memory is extremely large. If the total amount of RAM used by the mod_perl interpreters is high enough, your system will start paging, and your performance will nosedive. Given the same load speedycgi will just maintain the same performance because it's using less RAM. The thing is that if you know ahead of time what your load is going to be in the benchmark, you can reduce the number of httpd's so that mod_perl handles it with the same number of interpreters as speedycgi does. But how realistic that is in the real world, I don't know. With speedycgi it just sort of adapts to the load automatically. Maybe it would be possible to come up wiith a better benchmark that varies the load to show how speedycgi adapts better. Here are my results (perl == mod_perl, speedy == mod_speedycgi): * * Benchmarking perl * 3:05pm up 5 min, 3 users, load average: 0.04, 0.26, 0.15 This is ApacheBench, Version 1.3 Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Copyright (c) 1998-1999 The Apache Group, http://www.apache.org/ Benchmarking localhost (be patient)... Server Software: Apache/1.3.9 Server Hostname: localhost Server Port: 80 Document Path: /perl/hello_world Document Length: 11 bytes Concurrency Level: 300 Time taken for tests: 30.022 seconds Complete requests: 2409 Failed requests: 0 Total transferred: 411939 bytes HTML transferred: 26499 bytes Requests per second: 80.24 Transfer rate: 13.72 kb/s received Connnection Times (ms) min avg max Connect: 0 572 21675 Processing: 30 1201 8301 Total: 30 1773 29976 This is ApacheBench, Version 1.3 Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Copyright (c) 1998-1999 The Apache Group, http://www.apache.org/ Benchmarking localhost (be patient)... Server Software: Apache/1.3.9 Server Hostname: localhost Server Port: 80 Document Path: /perl/hello_world Document Length: 11 bytes Concurrency Level: 300 Time taken for tests: 41.872 seconds Complete requests: 524 Failed requests: 0 Total transferred: 98496 bytes HTML transferred: 6336 bytes Requests per second: 12.51 Transfer rate: 2.35 kb/s received Connnection Times (ms) min avg max Connect: 70 1679 8864 Processing: 300 7209 14728 Total: 370 8888 23592 * * Benchmarking speedy * 3:14pm up 3 min, 3 users, load average: 0.14, 0.31, 0.15 This is ApacheBench, Version 1.3 Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Copyright (c) 1998-1999 The Apache Group, http://www.apache.org/ Benchmarking localhost (be patient)... Server Software: Apache/1.3.9 Server Hostname: localhost Server Port: 80 Document Path: /speedy/hello_world Document Length: 11 bytes Concurrency Level: 300 Time taken for tests: 30.175 seconds Complete requests: 6135 Failed requests: 0 Total transferred: 1060713 bytes HTML transferred: 68233 bytes Requests per second: 203.31 Transfer rate: 35.15 kb/s received Connnection Times (ms) min avg max Connect: 0 179 9122 Processing: 12 341 5710 Total: 12 520 14832 This is ApacheBench, Version 1.3 Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Copyright (c) 1998-1999 The Apache Group, http://www.apache.org/ Benchmarking localhost (be patient)... Server Software: Apache/1.3.9 Server Hostname: localhost Server Port: 80 Document Path: /speedy/hello_world Document Length: 11 bytes Concurrency Level: 300 Time taken for tests: 30.327 seconds Complete requests: 7034 Failed requests: 0 Total transferred: 1221795 bytes HTML transferred: 78595 bytes Requests per second: 231.94 Transfer rate: 40.29 kb/s received Connnection Times (ms) min avg max Connect: 0 237 9336 Processing: 215 405 12012 Total: 215 642 21348 Here's the hello_world script: #!/usr/bin/speedy ## mod_perl/cgi program; iis/perl cgi; iis/perl isapi cgi use CGI; $x = 'x' x 65536; my $cgi = CGI->new(); print $cgi->header(); print "Hello "; print "World"; Here's the script I used to run the benchmarks: #!/bin/sh which=$1 echo "*" echo "* Benchmarking $which" echo "*" uptime httpd sleep 5 ab -t 30 -c 300 http://localhost/$which/hello_world ab -t 30 -c 300 http://localhost/$which/hello_world Before running each test, I rebooted my system. Here's the software installed: angel: {139}# rpm -q -a |egrep -i 'mod_perl|speedy|apache' apache-1.3.9-4 speedycgi-2.02-1 apache-devel-1.3.9-4 speedycgi-apache-2.02-1 mod_perl-1.21-2 Here are some relevant parameters from my httpd.conf: MinSpareServers 8 MaxSpareServers 20 StartServers 10 MaxClients 150 MaxRequestsPerChild 10000 SpeedyMaxRuns 0 === To: haystack@email.rentzone.org cc: Gunther Birznieks <gunther@extropia.com>, perrin@primenet.com, mod_perl list <modperl@apache.org>, speedycgi@newlug.org Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory Date: Wed, 17 Jan 2001 15:43:18 -0800 From: Sam Horrocks <sam@daemoninc.com> > I have a wide assortment of queries on a site, some of >which take several minutes to execute, while others >execute in less than one second. If understand this >analogy correctly, I'd be better off with the current >incarnation of mod_perl because there would be more >cashiers around to serve the "quick cups of coffee" that >many customers request at my dinner. There is no coffee. Only meals. No substitutions. :-) If we added coffee to the menu it would still have to be prepared by the cook. Remember that you only have one CPU, and all the perl interpreters large and small must gain access to that CPU in order to run. === Date: Wed, 17 Jan 2001 15:55:52 -0800 (PST) From: Perrin Harkins <perrin@primenet.com> To: Sam Horrocks <sam@daemoninc.com> cc: Gunther Birznieks <gunther@extropia.com>, mod_perl list <modperl@apache.org>, speedycgi@newlug.org Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory On Wed, 17 Jan 2001, Sam Horrocks wrote: > If in both the MRU/LRU case there were exactly 10 interpreters busy at > all times, then you're right it wouldn't matter. But don't confuse > the issues - 10 concurrent requests do *not* necessarily require 10 > concurrent interpreters. The MRU has an affect on the way a stream of 10 > concurrent requests are handled, and MRU results in those same requests > being handled by fewer interpreters. On a side note, I'm curious about is how Apache decides that child processes are unused and can be killed off. The spawning of new processes is pretty agressive on a busy server, but if the server reaches a steady state and some processes aren't needed they should be killed off. Maybe no one has bothered to make that part very efficient since in normal circusmtances most users would prefer to have extra processes waiting around than not have enough to handle a surge and have to spawn a whole bunch. === Date: Thu, 18 Jan 2001 03:02:11 +0100 To: Sam Horrocks <sam@daemoninc.com> From: Christian Jaeger <christian.jaeger@sl.ethz.ch> Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory Hello Sam and others If I haven't overseen, nobody so far really mentioned fastcgi. I'm asking myself why you reinvented the wheel. I summarize the differences I see: + perl scripts are more similar to standard CGI ones than with FastCGI (downside: see next point) - it seems you can't control the request loop yourself + protocol is more free than the one of FastCGI (is it?) - protocol isn't widespread (almost standard) like the one of FastCGI - seems only to support perl (so far) - doesn't seem to support external servers (on other machines) like FastCGI (does it?) Question: does speedycgi run a separate interpreter for each script, or is there one process loading and calling several perl scripts? If it's a separate process for each script, then mod_perl is sure to use less memory. As far I understand, IF you can collect several scripts together into one interpreter and IF you do preforking, I don't see essential performance related differences between mod_perl and speedy/fastcgi if you set up mod_perl with the proxy approach. With mod_perl the protocol to the backends is http, with speedy it's speedy and with fastcgi it's the fastcgi protocol. (The difference between mod_perl and fastcgi is that fastcgi uses a request loop, whereas mod_perl has it's handlers (sorry, I never really used mod_perl so I don't know exactly).) I think it's a pity that during the last years there was such little interest/support for fastcgi and now that should change with speedycgi. But why not, if the stuff that people develop can run on both and speedy is/becomes better than fastcgi. I'm developing a web application framework (called 'Eile', you can see some outdated documentation on testwww.ethz.ch/eile, I will release a new much better version soon) which currently uses fastcgi. If I can get it to run with speedycgi, I'll be glad to release it with support for both protocols. I haven't looked very close at it yet. One of the problems seems to be that I really depend on controlling the request loop (initialization, preforking etc all have to be done before the application begins serving requests, and I'm also controlling exits of childs myself). If you're interested to help me solving these issues please contact me privately. The main advantages of Eile concerning resources are a) one process/interpreter runs dozens of 'scripts' (called page-processing modules), and you don't have to dispatch requests to each of them yourself, and b) my new version does preforking. === To: mod_perl list <modperl@apache.org> From: Stephen Anderson <Stephen.Anderson@energis-squared.com> Subject: RE: Fwd: [speedycgi] Speedycgi scales better than mod_perl withsc Date: Thu, 18 Jan 2001 11:20:49 -0000 Sam Horrocks [mailto:sam@daemoninc.com] wrote: > > With some modification, I guess I am thinking that the > > cook is really the > > OS and the CPU is really the oven. But the hamburgers on > > an Intel oven have > > to be timesliced instead of left to cook and then after > > it's done the next > > hamburger is put on. > > > > So if we think of meals as Perl requests, the reality is > > that not all meals > > take the same amount of time to cook. A quarter pounder > > surely takes longer > > than your typical paper thin McDonald's Patty. [snip] > > I don't like your mods to the analogy, because they don't model how > a CPU actually works. Even if the cook == the OS and the oven == the > CPU, the oven *must* work on tasks sequentially. If you look at the > assembly language for your Intel CPU you won't see anything about it > doing multi-tasking. It does adds, subtracts, stores, loads, > jumps, etc. > It executes code sequentially. You must model this somewhere in your > analogy if it's going to be accurate. ( I think the analogies have lost their usefulness....) This doesn't affect the argument, because the core of it is that: a) the CPU will not completely process a single task all at once; instead, it will divide its time _between_ the tasks b) tasks do not arrive at regular intervals c) tasks take varying amounts of time to complete Now, if (a) were true but (b) and (c) were not, then, yes, it would have the same effective result as sequential processing. Tasks that arrived first would finish first. In the real world however, (b) and (c) are usually true, and it becomes practically impossible to predict which task handler (in this case, a mod_perl process) will complete first. Similarly, because of the non-deterministic nature of computer systems, Apache doesn't service requests on an LRU basis; you're comparing SpeedyCGI against a straw man. Apache's servicing algortihm approaches randomness, so you need to build a comparison between forced-MRU and random choice. (Note I'm not saying SpeedyCGI _won't_ win....just that the current comparison doesn't make sense) Thinking about it, assuming you are, at some time, servicing requests _below_ system capacity, SpeedyCGI will always win in memory usage, and probably have an edge in handling response time. My concern would be, does it offer _enough_ of an edge? Especially bearing in mind, if I understand, you could end runing anywhere up 2x as many processes (n Apache handlers + n script handlers)? > No, homogeneity (or the lack of it) wouldn't make a > difference. Those 3rd, > 5th or 6th processes run only *after* the 1st and 2nd have > finished using > the CPU. And at that poiint you could re-use those > interpreters that 1 and 2 > were using. This, if you'll excuse me, is quite clearly wrong. See the above argument, and imagine that tasks 1 and 2 happen to take three times as long to complete than 3, and you should see that that they could all end being in the scheduling queue together. Perhaps you're considering tasks which are too small to take more than 1 or 2 timeslices, in which case, you're much less likely to want to accelerate them. [snipping obscenely long quoted thread 8-)] Stephen. === To: speedycgi@newlug.org From: Sam Horrocks <sam@daemoninc.com> Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withsc ripts that contain un-shared memory Date: Thu, 18 Jan 2001 20:38:48 -0800 > This doesn't affect the argument, because the core of it is that: > > a) the CPU will not completely process a single task all at once; instead, > it will divide its time _between_ the tasks > b) tasks do not arrive at regular intervals > c) tasks take varying amounts of time to complete > > Now, if (a) were true but (b) and (c) were not, then, yes, it would have the > same effective result as sequential processing. Tasks that arrived first > would finish first. In the real world however, (b) and (c) are usually true, > and it becomes practically impossible to predict which task handler (in this > case, a mod_perl process) will complete first. I'll agree with (b) and (c) - I ignored them to keep my analogy as simple as possible. Again, the goal of my analogy was to show that a stream of 10 concurrent requests can be handled with the same througput with a lot fewer than 10 perl interpreters. (b) and (c) don't really have an effect on that - they don't control the order in which processes arrive and get queued up for the CPU. I won't agree with (a) unless you qualify it further - what do you claim is the method or policy for (a)? There's only one run queue in the kernel. THe first task ready to run is put at the head of that queue, and anything arriving afterwards waits. Only if that first task blocks on a resource or takes a very long time, or a higher priority process becomes able to run due to an interrupt is that process taken out of the queue. It is inefficient for the unix kernel to be constantly switching very quickly from process to process, because it takes time to do context switches. Also, unless the processes share the same memory, some amount of the processor cache can get flushed when you switch processes because you're changing to a different set of memory pages. That's why it's best for overall throughput if the kernel keeps a single process running as long as it can. > Similarly, because of the non-deterministic nature of computer systems, > Apache doesn't service requests on an LRU basis; you're comparing SpeedyCGI > against a straw man. Apache's servicing algortihm approaches randomness, so > you need to build a comparison between forced-MRU and random choice. Apache httpd's are scheduled on an LRU basis. This was discussed early in this thread. Apache uses a file-lock for its mutex around the accept call, and file-locking is implemented in the kernel using a round-robin (fair) selection in order to prevent starvation. This results in incoming requests being assigned to httpd's in an LRU fashion. Once the httpd's get into the kernel's run queue, they finish in the same order they were put there, unless they block on a resource, get timesliced or are pre-empted by a higher priority process. > Thinking about it, assuming you are, at some time, servicing requests > _below_ system capacity, SpeedyCGI will always win in memory usage, and > probably have an edge in handling response time. My concern would be, does > it offer _enough_ of an edge? Especially bearing in mind, if I understand, > you could end runing anywhere up 2x as many processes (n Apache handlers + n > script handlers)? Try it and see. I'm sure you'll run more processes with speedycgi, but you'll probably run a whole lot fewer perl interpreters and need less ram. Remember that the httpd's in the speedycgi case will have very little un-shared memory, because they don't have perl interpreters in them. So the processes are fairly indistinguishable, and the LRU isn't as big a penalty in that case. This is why the original designers of Apache thought it was safe to create so many httpd's. If they all have the same (shared) memory, then creating a lot of them does not have much of a penalty. mod_perl applications throw a big monkey wrench into this design when they add a lot of unshared memory to the httpd's. > > No, homogeneity (or the lack of it) wouldn't make a > > difference. Those 3rd, > > 5th or 6th processes run only *after* the 1st and 2nd have > > finished using > > the CPU. And at that poiint you could re-use those > > interpreters that 1 and 2 > > were using. > > This, if you'll excuse me, is quite clearly wrong. See the above argument, > and imagine that tasks 1 and 2 happen to take three times as long to > complete than 3, and you should see that that they could all end being in > the scheduling queue together. Perhaps you're considering tasks which are > too small to take more than 1 or 2 timeslices, in which case, you're much > less likely to want to accelerate them. So far to keep things fairly simple I've assumed you take less than one time slice to run. A timeslice is fairly long on a linux pc (210ms). But say they take two slices, and interpreters 1 and 2 get pre-empted and go back into the queue. So then requests 5/6 in the queue have to use other interpreters, and you expand the number of interpreters in use. But still, you'll wind up using the smallest number of interpreters required for the given load and timeslice. As soon as those 1st and 2nd perl interpreters finish their run, they go back at the beginning of the queue, and the 7th/ 8th or later requests can then use them, etc. Now you have a pool of maybe four interpreters, all being used on an MRU basis. But it won't expand beyond that set unless your load goes up or your program's CPU time requirements increase beyond another timeslice. MRU will ensure that whatever the number of interpreters in use, it is the lowest possible, given the load, the CPU-time required by the program and the size of the timeslice. === To: speedycgi@newlug.org From: Sam Horrocks <sam@daemoninc.com> Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory Date: Thu, 18 Jan 2001 21:30:28 -0800 > Hello Sam and others > > If I haven't overseen, nobody so far really mentioned fastcgi. I'm > asking myself why you reinvented the wheel. I summarize the > differences I see: > > + perl scripts are more similar to standard CGI ones than with > FastCGI (downside: see next point) Agree. > - it seems you can't control the request loop yourself Yes, but what do you do with this control in fastcgi? Maybe you can do the same thing in speedycgi in a different way? > + protocol is more free than the one of FastCGI (is it?) I'm not sure what you mean by "more free". > - protocol isn't widespread (almost standard) like the one of FastCGI Correct. The speedycgi protocol has changed many times, and is documented only in the C files. > - seems only to support perl (so far) > - doesn't seem to support external servers (on other machines) like > FastCGI (does it?) Correct. Correct. I'll add the following plusses for speedycgi: + Starts up/shuts down perl processes automatically depending on the load. Users don't have to get involved at all in starting/stopping proceeses. + Assigns requests to processes on an MRU basis Don't know if these are also true now for fastcgi. > Question: does speedycgi run a separate interpreter for each script, > or is there one process loading and calling several perl scripts? Currently one process == one script. I'm almost done with a version that allows mutiple scripts in one process. > If it's a separate process for each script, then mod_perl is sure to use > less memory. Depends on the number of scripts you have running at once. And you have to factor in the whole LRU/shared-memory problem in mod_perl, which is where this thread originally started. > As far I understand, IF you can collect several scripts together into > one interpreter and IF you do preforking, I don't see essential > performance related differences between mod_perl and speedy/fastcgi > if you set up mod_perl with the proxy approach. There is the way speedy assigns requests to handlers that is different. Plus speedy runs the perl processes outside the web-server. And speedy has a CGI-only mode totally outside the webserver. > I think it's a pity that during the last years there was such little > interest/support for fastcgi and now that should change with > speedycgi. But why not, if the stuff that people develop can run on > both and speedy is/becomes better than fastcgi. I think people should use whichever one is best for their application. I'm trying to make speedy as good as possible given the time I can put into it. And I'm trying to communicate to people what it can do. Beyond that it's up to people to decide which one they want to use. As I've mentioned on the speedycgi list, I'd like to see some sort of persistent-perl API developed so that people could write to that API, then run their script under different persistent-perl environments without changes. I think that would be better than porting scripts and modules to every persistent perl environment out there. > I'm developing a web application framework (called 'Eile', you can > see some outdated documentation on testwww.ethz.ch/eile, I will > release a new much better version soon) which currently uses fastcgi. > If I can get it to run with speedycgi, I'll be glad to release it > with support for both protocols. I haven't looked very close at it > yet. One of the problems seems to be that I really depend on > controlling the request loop (initialization, preforking etc all have > to be done before the application begins serving requests, and I'm > also controlling exits of childs myself). If you're interested to > help me solving these issues please contact me privately. The main > advantages of Eile concerning resources are a) one > process/interpreter runs dozens of 'scripts' (called page-processing > modules), and you don't have to dispatch requests to each of them > yourself, and b) my new version does preforking. I think I can help - I'll send a private message. === To: <speedycgi@newlug.org>, "Sam Horrocks" <sam@daemoninc.com> From: "Les Mikesell" <lesmikesell@home.com> Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withsc ripts that contain un-shared memory Date: Fri, 19 Jan 2001 00:13:56 -0600 "Sam Horrocks" <sam@daemoninc.com> wrote: > There's only one run queue in the kernel. THe first task > ready to run is put at the head of that queue, and > anything arriving afterwards waits. Only if that first > task blocks on a resource or takes a very long time, or a > higher priority process becomes able to run due to an > interrupt is that process taken out of the queue. Note that any I/O request that isn't completely handled by buffers will trigger the 'blocks on a resource' clause above, which means that jobs doing any real work will complete in an order determined by something other than the cpu and not strictly serialized. Also, most of my web servers are dual-cpu so even cpu bound processes may complete out of order. > > Similarly, because of the non-deterministic nature of computer systems, > > Apache doesn't service requests on an LRU basis; you're comparing SpeedyCGI > > against a straw man. Apache's servicing algortihm approaches randomness, so > > you need to build a comparison between forced-MRU and random choice. > > Apache httpd's are scheduled on an LRU basis. This was discussed early > in this thread. Apache uses a file-lock for its mutex around the accept > call, and file-locking is implemented in the kernel using a round-robin > (fair) selection in order to prevent starvation. This results in > incoming requests being assigned to httpd's in an LRU fashion. But, if you are running a front/back end apache with a small number of spare servers configured on the back end there really won't be any idle perl processes during the busy times you care about. That is, the backends will all be running or apache will shut them down and there won't be any difference between MRU and LRU (the difference would be which idle process waits longer - if none are idle there is no difference). > Once the httpd's get into the kernel's run queue, they finish in the > same order they were put there, unless they block on a resource, get > timesliced or are pre-empted by a higher priority process. Which means they don't finish in the same order if (a) you have more than one cpu, (b) they do any I/O (including delivering the output back which they all do), or (c) some of them run long enough to consume a timeslice. > Try it and see. I'm sure you'll run more processes with speedycgi, but > you'll probably run a whole lot fewer perl interpreters and need less ram. Do you have a benchmark that does some real work (at least a dbm lookup) to compare against a front/back end mod_perl setup? > Remember that the httpd's in the speedycgi case will have very little > un-shared memory, because they don't have perl interpreters in them. > So the processes are fairly indistinguishable, and the LRU isn't as > big a penalty in that case. > > This is why the original designers of Apache thought it was safe to > create so many httpd's. If they all have the same (shared) memory, > then creating a lot of them does not have much of a penalty. mod_perl > applications throw a big monkey wrench into this design when they add > a lot of unshared memory to the httpd's. This is part of the reason the front/back end mod_perl configuration works well, keeping the backend numbers low. The real win when serving over the internet, though, is that the perl memory is no longer tied up while delivering the output back over frequently slow connections. === To: Sam Horrocks <sam@daemoninc.com> From: Perrin Harkins <perrin@primenet.com> Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withsc ripts Date: Fri, 19 Jan 2001 01:52:26 -0800 Sam Horrocks wrote: > say they take two slices, and interpreters 1 and 2 get pre-empted and > go back into the queue. So then requests 5/6 in the queue have to use > other interpreters, and you expand the number of interpreters in use. > But still, you'll wind up using the smallest number of interpreters > required for the given load and timeslice. As soon as those 1st and > 2nd perl interpreters finish their run, they go back at the beginning > of the queue, and the 7th/ 8th or later requests can then use them, etc. > Now you have a pool of maybe four interpreters, all being used on an MRU > basis. But it won't expand beyond that set unless your load goes up or > your program's CPU time requirements increase beyond another timeslice. > MRU will ensure that whatever the number of interpreters in use, it > is the lowest possible, given the load, the CPU-time required by the > program and the size of the timeslice. You know, I had brief look through some of the SpeedyCGI code yesterday, and I think the MRU process selection might be a bit of a red herring. I think the real reason Speedy won the memory test is the way it spawns processes. If I understand what's going on in Apache's source, once every second it has a look at the scoreboard and says "less than MinSpareServers are idle, so I'll start more" or "more than MaxSpareServers are idle, so I'll kill one". It only kills one per second. It starts by spawning one, but the number spawned goes up exponentially each time it sees there are still not enough idle servers, until it hits 32 per second. It's easy to see how this could result in spawning too many in response to sudden load, and then taking a long time to clear out the unnecessary ones. In contrast, Speedy checks on every request to see if there are enough backends running. If there aren't, it spawns more until there are as many backends as queued requests. That means it never overshoots the mark. Going back to your example up above, if Apache actually controlled the number of processes tightly enough to prevent building up idle servers, it wouldn't really matter much how processes were selected. If after the 1st and 2nd interpreters finish their run they went to the end of the queue instead of the beginning of it, that simply means they will sit idle until called for instead of some other two processes sitting idle until called for. If the systems were both efficient enough about spawning to only create as many interpreters as needed, none of them would be sitting idle and memory usage would always be as low as possible. I don't know if I'm explaining this very well, but the gist of my theory is that at any given time both systems will require an equal number of in use interpreters to do an equal amount of work and the diffirentiator between the two is Apache's relatively poor estimate of how many processes should be available at any given time. I think this theory matches up nicely with the results of Sam's tests: when MaxClients prevents Apache from spawning too many processes, both systems have similar performance characteristics. There are some knobs to twiddle in Apache's source if anyone is interested in playing with it. You can change the frequency of the checks and the maximum number of servers spawned per check. I don't have much motivation to do this investigation myself, since I've already tuned our MaxClients and process size constraints to prevent problems with our application. === To: "Les Mikesell" <lesmikesell@home.com> From: Sam Horrocks <sam@daemoninc.com> Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withsc ripts that contain un-shared memory Date: Fri, 19 Jan 2001 03:47:21 -0800 > > There's only one run queue in the kernel. THe first task ready to run is > put > > at the head of that queue, and anything arriving afterwards waits. Only > > if that first task blocks on a resource or takes a very long time, or > > a higher priority process becomes able to run due to an interrupt is that > > process taken out of the queue. > > Note that any I/O request that isn't completely handled by buffers will > trigger the 'blocks on a resource' clause above, which means that > jobs doing any real work will complete in an order determined by > something other than the cpu and not strictly serialized. Also, most > of my web servers are dual-cpu so even cpu bound processes may > complete out of order. I think it's much easier to visualize how MRU helps when you look at one thing running at a time. And MRU works best when every process runs to completion instead of blocking, etc. But even if the process gets timesliced, blocked, etc, MRU still degrades gracefully. You'll get more processes in use, but still the numbers will remain small. > > > Similarly, because of the non-deterministic nature of computer systems, > > > Apache doesn't service requests on an LRU basis; you're comparing > SpeedyCGI > > > against a straw man. Apache's servicing algortihm approaches randomness, > so > > > you need to build a comparison between forced-MRU and random choice. > > > > Apache httpd's are scheduled on an LRU basis. This was discussed early > > in this thread. Apache uses a file-lock for its mutex around the accept > > call, and file-locking is implemented in the kernel using a round-robin > > (fair) selection in order to prevent starvation. This results in > > incoming requests being assigned to httpd's in an LRU fashion. > > But, if you are running a front/back end apache with a small number > of spare servers configured on the back end there really won't be > any idle perl processes during the busy times you care about. That > is, the backends will all be running or apache will shut them down > and there won't be any difference between MRU and LRU (the > difference would be which idle process waits longer - if none are > idle there is no difference). If you can tune it just right so you never run out of ram, then I think you could get the same performance as MRU on something like hello-world. > > Once the httpd's get into the kernel's run queue, they finish in the > > same order they were put there, unless they block on a resource, get > > timesliced or are pre-empted by a higher priority process. > > Which means they don't finish in the same order if (a) you have > more than one cpu, (b) they do any I/O (including delivering the > output back which they all do), or (c) some of them run long enough > to consume a timeslice. > > > Try it and see. I'm sure you'll run more processes with speedycgi, but > > you'll probably run a whole lot fewer perl interpreters and need less ram. > > Do you have a benchmark that does some real work (at least a dbm > lookup) to compare against a front/back end mod_perl setup? No, but if you send me one, I'll run it. === To: "'Sam Horrocks'" <sam@daemoninc.com>, speedycgi@newlug.org From: Stephen Anderson <Stephen.Anderson@energis-squared.com> Subject: RE: Fwd: [speedycgi] Speedycgi scales better than mod_perl withsc Date: Fri, 19 Jan 2001 12:09:35 -0000 > > This doesn't affect the argument, because the core of it is that: > > > > a) the CPU will not completely process a single task all > at once; instead, > > it will divide its time _between_ the tasks > > b) tasks do not arrive at regular intervals > > c) tasks take varying amounts of time to complete > > [snip] > I won't agree with (a) unless you qualify it further - what > do you claim > is the method or policy for (a)? I think this has been answered ... basically, resource conflicts (including I/O), interrupts, long running tasks, higher priority tasks, and, of course, the process yielding, can all cause the CPU to switch processes (which of these qualify depends very much on the OS in question). This is why, despite the efficiency of single-task running, you can usefully run more than one process on a UNIX system. Otherwise, if you ran a single Apache process and had no traffic, you couldn't run a shell at the same time - Apache would consume practically all your CPU in its select() loop 8-) > Apache httpd's are scheduled on an LRU basis. This was > discussed early > in this thread. Apache uses a file-lock for its mutex > around the accept > call, and file-locking is implemented in the kernel using a > round-robin > (fair) selection in order to prevent starvation. This results in > incoming requests being assigned to httpd's in an LRU fashion. I'll apologise, and say, yes, of course you're right, but I do have a query: There are at (IIRC) 5 methods that Apache uses to serialize requests: fcntl(), flock(), Sys V semaphores, uslock (IRIX only) and Pthreads (reliably only on Solaris). Do they _all_ result in LRU? > Remember that the httpd's in the speedycgi case will have very little > un-shared memory, because they don't have perl interpreters in them. > So the processes are fairly indistinguishable, and the LRU isn't as > big a penalty in that case. Yessss...._but_, interpreter for interpreter, won't the equivalent speedycgi have roughly as much unshared memory as the mod_perl? I've had a lot of (dumb) discussions with people who complain about the size of Apache+mod_perl without realising that the interpreter code's all shared, and with pre-loading a lot of the perl code can be too. While I _can_ see speedycgi having an advantage (because it's got a much better overview of what's happening, and can intelligently manage the situation), I don't think it's as large as you're suggesting. I think this needs to be intensively benchmarked to answer that.... > other interpreters, and you expand the number of interpreters in use. > But still, you'll wind up using the smallest number of interpreters > required for the given load and timeslice. As soon as those 1st and > 2nd perl interpreters finish their run, they go back at the beginning > of the queue, and the 7th/ 8th or later requests can then > use them, etc. > Now you have a pool of maybe four interpreters, all being > used on an MRU > basis. But it won't expand beyond that set unless your load > goes up or > your program's CPU time requirements increase beyond another > timeslice. > MRU will ensure that whatever the number of interpreters in use, it > is the lowest possible, given the load, the CPU-time required by the > program and the size of the timeslice. Yep...no arguments here. SpeedyCGI should result in fewer interpreters. I will say that there are a lot of convincing reasons to follow the SpeedyCGI model rather than the mod_perl model, but I've generally thought that the increase in that kind of performance that can be obtained as sufficiently minimal as to not warrant the extra layer... thoughts, anyone? Stephen. === To: <modperl@apache.org> From: Matt Sergeant <matt@sergeant.org> Subject: RE: Fwd: [speedycgi] Speedycgi scales better than mod_perl withsc Date: Fri, 19 Jan 2001 12:14:45 +0000 (GMT) There seems to be a lot of talk here, and analogies, and zero real-world benchmarking. Now it seems to me from reading this thread, that speedycgi would be better where you run 1 script, or only a few scripts, and mod_perl might win where you have a large application with hundreds of different URLs with different code being executed on each. That may change with the next release of speedy, but then lots of things will change with the next major release of mod_perl too, so its irrelevant until both are released. And as well as that, speedy still suffers (IMHO) that is still follows the CGI scripting model, whereas mod_perl offers a much more flexible environemt, and feature rich API (the Apache API). What's more, I could never build something like AxKit in speedycgi, without resorting to hacks like mod_rewrite to hide nasty URL's. At least thats my conclusion from first appearances. Either way, both solutions have their merits. Neither is going to totally replace the other. What I'd really like to do though is sum up this thread in a short article for take23. I'll see if I have time on Sunday to do it. === To: perrin@primenet.com From: Sam Horrocks <sam@daemoninc.com> Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withsc ripts that contain un-shared memory Date: Fri, 19 Jan 2001 04:53:04 -0800 > You know, I had brief look through some of the SpeedyCGI code yesterday, > and I think the MRU process selection might be a bit of a red herring. > I think the real reason Speedy won the memory test is the way it spawns > processes. Please take a look at that code again. There's no smoke and mirrors, no red-herrings. Also, I don't look at the benchmarks as "winning" - I am not trying to start a mod_perl vs speedy battle here. Gunther wanted to know if there were "real bechmarks", so I reluctantly put them up. Here's how SpeedyCGI works (this is from version 2.02 of the code): When the frontend starts, it tries to quickly grab a backend from the front of the be_wait queue, which is a LIFO. This is in speedy_frontend.c, get_a_backend() function. If there aren't any idle be's, it puts itself onto the fe_wait queue. Same file, get_a_backend_hard(). If this fe (frontend) is at the front of the fe_wait queue, it "takes charge" and starts looking to see if a backend needs to be spawned. This is part of the "frontend_ping()" function. It will only spawn a be if no other backends are being spawned, so only one backend gets spawned at a time. Every frontend in the queue, drops into a sigsuspend and waits for an alarm signal. The alarm is set for 1-second. This is also in get_a_backend_hard(). When a backend is ready to handle code, it goes and looks at the fe_wait queue and if there are fe's there, it sends a SIGALRM to the one at the front, and sets the sent_sig flag for that fe. This done in speedy_group.c, speedy_group_sendsigs(). When a frontend wakes on an alarm (either due to a timeout, or due to a be waking it up), it looks at its sent_sig flag to see if it can now grab a be from the queue. If so it does that. If not, it runs various checks then goes back to sleep. In most cases, you should get a be from the lifo right at the beginning in the get_a_backend() function. Unless there aren't enough be's running, or somethign is killing them (bad perl code), or you've set the MaxBackends option to limit the number of be's. > If I understand what's going on in Apache's source, once every second it > has a look at the scoreboard and says "less than MinSpareServers are > idle, so I'll start more" or "more than MaxSpareServers are idle, so > I'll kill one". It only kills one per second. It starts by spawning > one, but the number spawned goes up exponentially each time it sees > there are still not enough idle servers, until it hits 32 per second. > It's easy to see how this could result in spawning too many in response > to sudden load, and then taking a long time to clear out the unnecessary > ones. > > In contrast, Speedy checks on every request to see if there are enough > backends running. If there aren't, it spawns more until there are as > many backends as queued requests. Speedy does not check on every request to see if there are enough backends running. In most cases, the only thing the frontend does is grab an idle backend from the lifo. Only if there are none available does it start to worry about how many are running, etc. > That means it never overshoots the mark. You're correct that speedy does try not to overshoot, but mainly because there's no point in overshooting - it just wastes swap space. But that's not the heart of the mechanism. There truly is a LIFO involved. Please read that code again, or run some tests. Speedy could overshoot by far, and the worst that would happen is that you would get a lot of idle backends sitting in virtual memory, which the kernel would page out, and then at some point they'll time out and die. Unless of course the load increases to a point where they're needed, in which case they would get used. If you have speedy installed, you can manually start backends yourself and test. Just run "speedy_backend script.pl &" to start a backend. If you start lots of those on a script that says 'print "$$\n"', then run the frontend on the same script, you will still see the same pid over and over. This is the LIFO in action, reusing the same process over and over. === To: Sam Horrocks <sam@daemoninc.com> From: Perrin Harkins <perrin@primenet.com> Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withsc Date: Fri, 19 Jan 2001 16:00:52 -0800 (PST) On Fri, 19 Jan 2001, Sam Horrocks wrote: > > You know, I had brief look through some of the SpeedyCGI code yesterday, > > and I think the MRU process selection might be a bit of a red herring. > > I think the real reason Speedy won the memory test is the way it spawns > > processes. > > Please take a look at that code again. There's no smoke and mirrors, > no red-herrings. I didn't mean that MRU isn't really happening, just that it isn't the reason why Speedy is running fewer interpeters. > Also, I don't look at the benchmarks as "winning" - I > am not trying to start a mod_perl vs speedy battle here. Okay, but let's not be so polite about things that we don't acknowledge when someone is onto a better way of doing things. Stealing good ideas from other projects is a time-honored open source tradition. > Speedy does not check on every request to see if there are enough > backends running. In most cases, the only thing the frontend does is > grab an idle backend from the lifo. Only if there are none available > does it start to worry about how many are running, etc. Sorry, I had a lot of the details about what Speedy is doing wrong. However, it still sounds like it has a more efficient approach than Apache in terms of managing process spawning. > You're correct that speedy does try not to overshoot, but mainly > because there's no point in overshooting - it just wastes swap space. > But that's not the heart of the mechanism. There truly is a LIFO > involved. Please read that code again, or run some tests. Speedy > could overshoot by far, and the worst that would happen is that you > would get a lot of idle backends sitting in virtual memory, which the > kernel would page out, and then at some point they'll time out and die. When you spawn a new process it starts out in real memory, doesn't it? Spawning too many could use up all the physical RAM and send a box into swap, at least until it managed to page out the idle processes. That's what I think happened to mod_perl in this test. > If you start lots of those on a script that says 'print "$$\n"', then > run the frontend on the same script, you will still see the same pid > over and over. This is the LIFO in action, reusing the same process > over and over. Right, but I don't think that explains why fewer processes are running. Suppose you start 10 processes, and then send in one request at a time, and that request takes one time slice to complete. If MRU works perfectly, you'll get process 1 over and over again handling the requests. LRU will use process 1, then 2, then 3, etc. But both of them have 9 processes idle and one in use at any given time. The 9 idle ones should either be killed off, or ideally never have been spawned in the first place. I think Speedy does a better job of preventing unnecessary process spawning. One alternative theory is that keeping the same process busy instead of rotating through all 10 means that the OS can page out the other 9 and thus use less physical RAM. Anyway, I feel like we've been putting you on the spot, and I don't want you to feel obligated to respond personally to all the messages on this thread. I'm only still talking about it because it's interesting and I've learned a couple of things about Linux and Apache from it. If I get the chance this weekend, I'll try some tests of my own. - Perrin ===