svlug_demoronizing_ms_html

This is part of The Pile, a partial archive of some open source mailing lists and newsgroups.



From: kmself@ix.netcom.com
Date: Tue, 19 Dec 2000 15:14:22 -0800
To: Silicon Valley Users Group <svlug@svlug.org>
Subject: [svlug] MS Word 2000 HTML rationalizer?

Cross-platform documentation project, partner in crime uses MS Word.

"HTML" output does not print in Netscape 4.7x, crashes Mozilla, is
printable (sans formatting) from Lynx and w3m, which may be a blessing
in disguise.

I'm familiar with John Walker's demoroniser, however Word2k appears to
have taken noncompliance to entirely new hights.

Is anyone familier with a postprocessor which will dump rational,
simple, HTML from Word2K output?

===

Date: Tue, 19 Dec 2000 16:28:25 -0800
From: Brian Bilbrey <bilbrey@orbdesigns.com>
To: kmself@ix.netcom.com
Cc: Silicon Valley Users Group <svlug@svlug.org>
Subject: Re: [svlug] MS Word 2000 HTML rationalizer?


On Tue, Dec 19, 2000 at 03:14:22PM -0800, kmself@ix.netcom.com wrote:
 [snip]
> Is anyone familier with a postprocessor which will dump rational,
> simple, HTML from Word2K output?

You might try importing the W2K output into SO (aka OpenOffice) -
while it doesn't print, it does allow you to re-export in HTML. I
haven't tried this specifically, but was looking at the SO / O2K
compatibility stuff a few weeks back, and found that the W2K
documents imported fairly well (though there are a couple of hinks
with tables??? or was that KWord that had such problems with the
tables? Hmmm.) 

I certainly cain't vouch for the compliance or non- of the SO HTML
output, but it can't be worse, can it?

.brian

-- 
bilbrey@orbdesigns.com    www.orbdesigns.com
"You have not experienced Shakespeare until you have read 
 it in the original Klingon."    Gorkon: Stardate 9522.6, STVI



===

Date: Tue, 19 Dec 2000 16:28:16 -0800
From: Aaron Lehmann <aaronl@vitelus.com>
To: kmself@ix.netcom.com
Cc: Silicon Valley Users Group <svlug@svlug.org>
Subject: Re: [svlug] MS Word 2000 HTML rationalizer?

On Tue, Dec 19, 2000 at 03:14:22PM -0800, kmself@ix.netcom.com wrote:
> Is anyone familier with a postprocessor which will dump rational,
> simple, HTML from Word2K output?

Have you tried wvHTML? There's a CGI version at
http://www.freeviewer.com/.


===


From: kmself@ix.netcom.com
Date: Tue, 19 Dec 2000 16:33:33 -0800
To: Silicon Valley Users Group <svlug@svlug.org>
Subject: Re: [svlug] MS Word 2000 HTML rationalizer?

Brian Bilbrey (bilbrey@orbdesigns.com) wrote:

> On Tue, Dec 19, 2000 at 03:14:22PM -0800, kmself@ix.netcom.com wrote:
>  [snip]
> > Is anyone familier with a postprocessor which will dump rational,
> > simple, HTML from Word2K output?

> You might try importing the W2K output into SO (aka OpenOffice) -
> while it doesn't print, it does allow you to re-export in HTML. I
> haven't tried this specifically, but was looking at the SO / O2K
> compatibility stuff a few weeks back, and found that the W2K
> documents imported fairly well (though there are a couple of hinks
> with tables??? or was that KWord that had such problems with the
> tables? Hmmm.)

> I certainly cain't vouch for the compliance or non- of the SO HTML
> output, but it can't be worse, can it?

Good idea....but.

I'm not sure if it's worse.  It's certainly not much (if any) better.

===

Date: Tue, 19 Dec 2000 16:58:06 -0800
To: Silicon Valley Users Group <svlug@svlug.org>
Subject: Re: [svlug] MS Word 2000 HTML rationalizer?
From: Rick Moen <rick@linuxmafia.com>

begin  Brian Bilbrey quotation:

> I certainly cain't vouch for the compliance or non- of the SO HTML
> output, but it can't be worse, can it?

It's pretty brain-dead.  I had to do a tremendous amount of
post-StarOffice pruning, on a MS-Word2k document I recently tried that
with.  It's http://linuxmafia.com/pub/jordan/Humor/abridged.html ,
actually.  Pity I threw away the StarOffice-generated HTML mess it
started out being:  It was really wretched.

===

Date: Tue, 19 Dec 2000 17:16:38 -0800
From: Brian Bilbrey <bilbrey@orbdesigns.com>
To: Silicon Valley Users Group <svlug@svlug.org>
Subject: Re: [svlug] MS Word 2000 HTML rationalizer?

On Tue, Dec 19, 2000 at 04:58:06PM -0800, Rick Moen wrote:
> begin  Brian Bilbrey quotation:
> 
> > I certainly cain't vouch for the compliance or non- of the SO HTML
> > output, but it can't be worse, can it?
> 
> It's pretty brain-dead.  I had to do a tremendous amount of
> post-StarOffice pruning, on a MS-Word2k document I recently tried that
> with.  It's http://linuxmafia.com/pub/jordan/Humor/abridged.html ,
> actually.  Pity I threw away the StarOffice-generated HTML mess it
> started out being:  It was really wretched.

I can draw a line with those two data points. Bad idea discarded. Be
interested to hear of any successes - as I migrate more functions at
work to Linux, I start to get the inquiries about how we might go
about going whole-hog away from MS, while retaining the ability to
collaborate with our customers.  

Tom and I also found this challenging when working with IDG... Hmmm.

===

Date: Tue, 19 Dec 2000 17:24:51 -0800
To: Silicon Valley Users Group <svlug@svlug.org>
Subject: Re: [svlug] MS Word 2000 HTML rationalizer?
From: Rick Moen <rick@linuxmafia.com>

begin  Brian Bilbrey quotation:

> I can draw a line with those two data points. Bad idea discarded. Be
> interested to hear of any successes - as I migrate more functions at
> work to Linux, I start to get the inquiries about how we might go
> about going whole-hog away from MS, while retaining the ability to
> collaborate with our customers.  

Don't forget:  Microsoft is the company that did its best to sabotage
the RTF format, when it discovered that far too many people were using
it for meaningful formatted-text compatiblity.  (I seem to recall them
doing this in MS Office 4.2, but it could have been Office 95.)

===

From: duperron@charter.net (Vince Duperron)
Subject: Re: [svlug] MS Word 2000 HTML rationalizer?
To: kmself@ix.netcom.com
Date: Tue, 19 Dec 2000 19:17:14 -0600 (CST)
Cc: svlug@svlug.org (Silicon Valley Users Group)

Hello;

This isn't quite on topic (but close).

Have you checked out http://www.antiword.org ?

===


Date: Tue, 19 Dec 2000 18:03:10 -0800
From: hvrietsc@yahoo.com
To: Brian Bilbrey <bilbrey@orbdesigns.com>
Cc: kmself@ix.netcom.com, Silicon Valley Users Group <svlug@svlug.org>
Subject: Re: [svlug] MS Word 2000 HTML rationalizer?

i've had some good results along this line:
on windoze:create with word
rest on linux:
load .doc into staroffice
save as html
netscape seems to render it just fine.

===


Date: Wed, 20 Dec 2000 02:48:05 -0500
From: Bill Jonas <bill@billjonas.com>
To: Silicon Valley Users Group <svlug@svlug.org>
Subject: Re: [svlug] MS Word 2000 HTML rationalizer?

On Tue, Dec 19, 2000 at 04:58:06PM -0800, Rick Moen wrote:
> It's pretty brain-dead.  I had to do a tremendous amount of
> post-StarOffice pruning, on a MS-Word2k document I recently tried that
> with.  It's http://linuxmafia.com/pub/jordan/Humor/abridged.html ,

Hmm, I had to create a resume in the company template for my new job.  Of
course, the attachment was one of those dreaded ".dot" files.  The Word
template itself was nothing fancy, so YMMV, but AbiWord imported "pretty
okay" and the HTML output was fairly reasonable HTML.  AbiWord's starting
to get pretty not bad.

(The epilogue is that HTML that looked "close" wasn't good enough, and I
had to borrow someone's machine to do it in Word, if you're interested.
<rant>You'd think that an Internet consulting company that wanted resumes
to show potential clients would leverage the power of the 'net itself and,
say, put HTML versions in a password-protected area of the site, and
create a password for a client so they could peruse them online...</rant>)

===

Date: Wed, 20 Dec 2000 12:58:41 -0800 (PST)
From: Deirdre Saoirse <deirdre@deirdre.net>
To: Brian Bilbrey <bilbrey@orbdesigns.com>
Subject: Re: [svlug] MS Word 2000 HTML rationalizer?

On Tue, 19 Dec 2000, Brian Bilbrey wrote:

> You might try importing the W2K output into SO (aka OpenOffice) -
> while it doesn't print, it does allow you to re-export in HTML. I
> haven't tried this specifically, but was looking at the SO / O2K
> compatibility stuff a few weeks back, and found that the W2K documents
> imported fairly well (though there are a couple of hinks with
> tables??? or was that KWord that had such problems with the tables?
> Hmmm.)

Unfortunately, one of the issues we discovered at the office was this
scenario:

1) User saves a doc in Word.
2) User makes changes, which are fast-saved.
3) Doc is imported into Star Office.

The problem is that you'll more likely see doc #1 than doc #2.

And, as I'm about to start a Master's degree in creative writing, and as
everyone sends their documents in Word and as realistic critiques are a
significant part of my grade....

I am going to be using MacOS on the desktop -- with Word -- for the next
two years. 

Feel my pain.

I myself will be using my ancient, but still personal favorite, Word 5.1a
to format my own documents -- after composing and editing them in html on
bbedit (yes, I use CVS for revision control on fiction and prefer html for
that).

For one thing, Star Office, for all its faults, has one that is
particularly annoying: it is incapable of printing headers and footers in
the standard manuscript format.

At least I'll be able to run MacOS X and get *some* of the advantages of
BSD out of the experience.

===

Date: Wed, 20 Dec 2000 14:15:31 -0800
To: Silicon Valley Users Group <svlug@svlug.org>
Subject: Re: [svlug] MS Word 2000 HTML rationalizer?
From: Rick Moen <rick@linuxmafia.com>

begin  Dire Red quotation:
 
> Unfortunately, one of the issues we discovered at the office was this
> scenario:
> 
> 1) User saves a doc in Word.
> 2) User makes changes, which are fast-saved.
> 3) Doc is imported into Star Office.
> 
> The problem is that you'll more likely see doc #1 than doc #2.

Kill the [l]user.  Problem solved.

===

To: Bill Jonas <bill@billjonas.com>,
Subject: Re: [svlug] MS Word 2000 HTML rationalizer? 
Date: Wed, 20 Dec 2000 15:18:31 -0800
From: J C Lawrence <claw@kanga.nu>

On Wed, 20 Dec 2000 02:48:05 -0500 
Bill Jonas <bill@billjonas.com> wrote:

> <rant>You'd think that an Internet consulting company that wanted
> resumes to show potential clients would leverage the power of the
> 'net itself and, say, put HTML versions in a password-protected
> area of the site, and create a password for a client so they could
> peruse them online...</rant>)

I follow a simple rule: I don't work for or with companies that
either require MS-based files (as versus say flat text, PDF, or
NTML), or, more simply, which consider their time so much more
valuable than mine.

===

Date: Wed, 20 Dec 2000 16:12:38 -0800 (PST)
From: Deirdre Saoirse <deirdre@deirdre.net>
To: Rick Moen <rick@linuxmafia.com>
Subject: Re: [svlug] MS Word 2000 HTML rationalizer?

On Wed, 20 Dec 2000, Rick Moen wrote:

> begin  Dire Red quotation:
>  
> > Unfortunately, one of the issues we discovered at the office was this
> > scenario:
> > 
> > 1) User saves a doc in Word.
> > 2) User makes changes, which are fast-saved.
> > 3) Doc is imported into Star Office.
> > 
> > The problem is that you'll more likely see doc #1 than doc #2.
> 
> Kill the [l]user.  Problem solved.

When the [l]user in question is a customer who wants to spend money, it's
not so easily rationalised. :)

While our company uses a lot of Linux, almost none of our customers do.
Also, outside of Engineering, you rarely see Linux on the desktop.

===

Date: Wed, 20 Dec 2000 16:14:54 -0800
From: Rick Moen <rick@linuxmafia.com>
To: Silicon Valley Users Group <svlug@svlug.org>
Subject: Re: [svlug] MS Word 2000 HTML rationalizer?

begin  Dire Red quotation:

> When the [l]user in question is a customer who wants to spend money, it's
> not so easily rationalised. :)

Take his money, and _then_ kill him.  See .signature block.

-- 
Cheers,                                     The Viking's Reminder:
Rick Moen                                   Pillage first, _then_ burn.
rick@linuxmafia.com


===

From: kmself@ix.netcom.com
Date: Thu, 21 Dec 2000 00:52:05 -0800
To: Silicon Valley Users Group <svlug@svlug.org>
Subject: Re: [svlug] MS Word 2000 HTML rationalizer?


kmself@ix.netcom.com (kmself@ix.netcom.com) wrote:

> Cross-platform documentation project, partner in crime uses MS Word.

> "HTML" output does not print in Netscape 4.7x, crashes Mozilla, is
> printable (sans formatting) from Lynx and w3m, which may be a blessing
> in disguise.

> I'm familiar with John Walker's demoroniser, however Word2k appears to
> have taken noncompliance to entirely new hights.

> Is anyone familier with a postprocessor which will dump rational,
> simple, HTML from Word2K output?

From the DocBook mailing list, kudos to Dave Pawson for suggesting
'tidy'.  It has a somewhat eccentric arguments syntax -- you apparently
*have* to feed it a config file -- but it nicely trimmed all the crap
out of the monster which had landed on my doorstop.

===

Date: Fri, 22 Dec 2000 20:33:13 -0800 (PST)
From: fdj <mrlocomojo@yahoo.com>
Subject: Re: [svlug] MS Word 2000 HTML rationalizer?
To: Silicon Valley Users Group <svlug@svlug.org>

fyi - 
	HTML Tidy is a wonderful little command line tool
that can clean up and convert your html.  Tidy is
endorsed by the w3c.  It does have an option that can
be placed in a .rc-style file or invoked on the
command line to clean up word2000 documents.  From the
html Tidy page <
http://www.w3.org/People/Raggett/tidy/ >:


	word-2000: bool 
        If set to yes, Tidy will go to 
	great pains to strip out all the
	surplus stuff Microsoft Word 
	2000 inserts when you save Word
	documents as "Web pages". The 
	default is no. Note that Tidy 
	doesn't yet know what to do with 
	VML markup from Word, but in
	future I hope to be able to map 
	VML to SVG.

	The above would be invoked as:
tidy --word-2000 true msdoc.html > gooddoc.html

	This will not only correct broken html (ala msword or
hand-coded hanging tags, open tags), it will produce
warnings about recommended standards that are not
complied with, such as using and ALT tag with an IMG.

	No custom setup files are required.

	Tidy will also do pretty-printing with indenting,
making all your tags the same case, etc....

	It has limited support for php, and facilitates
creating custom tags.

	Finally, tidy is an excellent tool to aid you in the
move from html to xml, as it has options to produce
both xml and xhtml from html documents.  

	I realize that someone else on the list mentioned
tidy, but I'm not sure they did it justice.  It is an
excellent html validator, and a whole lot more.

===

From: kmself@ix.netcom.com
Date: Sat, 23 Dec 2000 00:29:47 -0800
To: Silicon Valley Users Group <svlug@svlug.org>
Subject: Re: [svlug] MS Word 2000 HTML rationalizer?

fdj (mrlocomojo@yahoo.com) wrote:

> fyi
> 	HTML Tidy is a wonderful little command line tool
> that can clean up and convert your html.  Tidy is
> endorsed by the w3c.  It does have an option that can
> be placed in a .rc-style file or invoked on the
> command line to clean up word2000 documents.  From the
> html Tidy page <
> http://www.w3.org/People/Raggett/tidy/ >:

Found it, commented earlier.

It did a bang-up job on one file.  On the second, not only does MS HTML
not validate, but it kills the validator.  Go figure.

To appear shortly at an ecommerce vendor's release near you:
either DocBook generated materials, or something which looks
suspiciously as if it's been passed through w3m and pr.

Now how would I know that....?

===

To: Erik Steffl <steffl@bigfoot.com>
Subject: Re: [svlug] PS2 -> USB 
Date: Sat, 23 Dec 2000 11:15:30 -0800
From: J C Lawrence <claw@kanga.nu>

Erik Steffl <steffl@bigfoot.com> wrote:

>   neither does logitech, for some reason they make USB wireless
> keyboard&mouse combo but urge you to use provided USB->ps/2
> convertors:-) kinda strange.

Happily I've found there are several vendors of USB>-PS/2 converters
ala:

  http://www.provantage.com/FP_48274.HTM

Suggesting that my model Ms and I will be able to happily survive
the move to USB.


===

the rest of The Pile (a partial mailing list archive)

doom@kzsu.stanford.edu