This is part of The Pile, a partial archive of some open source mailing lists and newsgroups.
Date: Sat, 10 Aug 1996 13:57:20 +0600 From: Tim Pierce <twpierce@midway.uchicago.edu> Subject: Re: why you don't want to work on a regexp to strip html tags Right, comment stripping makes it even trickier. On second glance, I see Tom C. has written a "striphtml" script to do what you want -- go to http://www.perl.com/perl and look for his "web-hacking scripts". I may have overestimated the risk of finding a closing `>' in the middle of a double-quoted string. But the problem of dealing with broken html is really a thorny one. ===