This is part of The Pile, a partial archive of some open source mailing lists and newsgroups.
Date: Wed, 27 Sep 2000 20:10:26 -0700 From: Joey Hess <joey@kitenet.net> To: svlug@svlug.org Subject: Re: [svlug] Re: Netscape doesn't work Rick Moen wrote: > My _God_. Look at all that crud! You let a Web browser near _that_ > demented pile of table-ridden excreta from a misplaced DTP wretch? I've figured out how to handle this type of html. joey@kite:~>cat test.pl use HTML::Sanitizer; $s=HTML::Sanitizer->new( javascript => 0, comment => 0, title => [], h1 => [], h2 => [], h3 => [], h4 => [], h5 => [], p => [], hr => [], li => [], ol => [], ul => [], br => [], b => [], i => [], em => [], strong => [], a => [qw{href name}], blockquote => [], pre => [], br => [], div => [], tt => [], form => [qw{action method}], input => [qw{type name value}], table => [qw{border summary}], tr => [], th => [], td => [], dl => [], dt => [], dd => [], img => [qw{alt src}], textarea => [qw{name rows cols wrap}], ); print $s->sanitize(join '', <>); joey@kite:~>perl test.pl ~/torture.html <title>Alteon WebSystems Intelligent Webworking</title> <table> <tr> <td> <table> <tr> <td><img alt="" src="/images/logo_main_700.gif"></td> </tr> <tr> <td><br><br></td> </tr> <tr> <td> <table border="0"> <tr> <td><b>|</b></td> <td><b><a href="/main.asp">English</a></b></td> <td><b>|</b></td> <td><b><a href="/chinese.asp"><img src="/images/Chinese.gif"></a></b></td> <td><b>|</b></td> <td><b><a href="/german.asp">Deutsch</a></b></td> <td><b>|</b></td> <td><b><a href="/spanish.asp">Espa