Pod to Html

                                        November 9, 2004  

What's the big deal about converting pod to html? Isn't that a solved problem by now? Of course. Many times. And an unsolved problem is only a little worse than a problem solved too many times...

My mission should you choose to accept it (and I don't know why you would) was to extract the pod from a project with multiple *.pm/*.pod files and convert it into a tree of html files.

The main feature I wanted was conversion of L<> links to other modules in the project into relative html links; and in general I was interested in doing this relatively simple job without having to put too much work into the process.
If you're looking for a quick recommendation, I settled on Pod::Simple::HTMLBatch, which did the job without too many rough spots. My impression is that Sean M. Burke is the man to watch in this field, and with any luck he will succeed in what seems like the cursed project of replacing Pod::Html...

Also, Mark Overmeer looks like he's got some interesting ideas going with his OODoc (some custom extentions to pod markup to deal with large OO projects).

Table of Contents

 Pod::Html / pod2html by Tom Christiansen
 pods2html by Steven McDougall
 Pod::Tree::HTML by Steven McDougall
 Pod::HtmlTree 0.97 by Mike Schilli
 Pod::Simple::HTML by Sean M. Burke
 DocSet by Stas Bekman
My pick: Pod::Simple::HTMLBatch by Sean M. Burke
 OODoc by Mark Overmeer
 Marek::Pod::HTML by Marek Rouchal
 Pod::POM by Andy Wardley
 But wait, there's more:
PodToHTML
Pod::HTML
Pod::HtmlEasy
 Appendix 1: Quotable Quotes


Pod::Html / pod2html by Tom Christiansen, included in the perl core
Description Advantages Disadvantages
The original. Works on one file at a time, handles cross-references as absolute hrefs, (prefix set with --htmlroot).

Universally available. The first thing everyone looks at.

No one seems to like it, all agree it needs revision, and starting from scratch is preferred to maintenance. Outputs html that doesn't validate (if I remember right). File-oriented, making it less useful with large projects.

SYNOPSIS
           use Pod::Html;
           pod2html("pod2html",
                    "--podpath=lib:ext:pod:vms",
                    "--podroot=/usr/src/perl",
                    "--htmlroot=/perl/nmanual",
                    "--libpods=perlfunc:perlguts:perlvar:perlrun:perlop",
                    "--recurse",
                    "--infile=foo.pod",
                    "--outfile=/perl/nmanual/foo.html");

pods2html by Steven McDougall, uses Pod::Tree::HTML, HTML::Stream
pods2html is a front end for Pod::Tree::HTML, which does the actual translation one-file-at-a-time (though internally I believe it redundantly traverses the entire tree to do link resolution correctly). The pods2html script uses File::Find to traverse the tree of pod (PODdir), and writes the output to a parallel tree of html (HTMLdir).



Works on a directory tree (not file-oriented)

Does the main job I was after, turns L<> links into relative html links.

However, the relative links it generates are a little peculiar, going all the way back up to the package root before dropping back down to the destination.

It has a --base option which is supposed to be like the --htmlroot of pod2html, but doesn't seem to do anything.

SYNOPSIS
       "pods2html" ["--base" url] ["--css" url] ["--index" title]
       ["--"["no"]"toc"] ["--hr" level] ["--bgcolor" #rrggbb] ["--text"
       #rrggbb] PODdir HTMLdir

Pod::Tree::HTML 1.10 by Steven McDougall, uses Pod::Tree::PerlPod, Pod::Tree::PerlMap
Quoted from the documentation for Pod::Tree::PerlPod:
"Pod::Tree::PerlPod" translates Perl PODs to HTML. It does a recursive subdirectory search through $perl_dir to find PODs.

"Pod::Tree::PerlPod" generates a top-level index of all the PODs that it finds, and writes it to HTML_dir"/pod.html".
"Pod::Tree::PerlPod" generates and uses an index of the PODs that it finds to construct HTML links. Other modules can also use this index.

From the documentation for "Pod::Tree::HTML":
"Pod::Tree::HTML" reads a POD and translates it to HTML. The source and destination are fixed when the object is created. Options are pro- vided for controlling details of the translation.

The "translate" method does the actual translation.

Quoting the docs again:

"Pod::Tree::PerlPod" indexes PODs by the base name of the POD file. To link to perlsub.pod, write L<perlsub>


As of this writing, PerlPod.pm does not pass perl -c as shipped (Doh!): "my" variable $pods masks earlier declaration in same scope at ... PerlPod.pm line 151.

SYNOPSIS
         $perl_map = new Pod::Tree::PerlMap;
         $perl_pod = new Pod::Tree::PerlPod $perl_dir, $HTML_dir, $perl_map, %opts;

         $perl_pod->scan;
         $perl_pod->index;
         $perl_pod->translate;

         $top = $perl_pod->get_top_entry;

         # Alternately:

         use Pod::Tree::HTML;

         $source   =  "file.pod";
         $dest     =  "file.html";
         $html     =   new Pod:::Tree::HTML $source, $dest, %options;
         $html->translate;

Pod::HtmlTree 0.97 by Mike Schilli, uses use Pod::Html
Quoting the docs:
... like to navigate between all those manual pages in your distribution and even view their source code?

... traverses your module's distribution directory finds all *.pm files recursively and calls "pod2html()" on them, hereby resolving all POD links (L<...> style).

It then saves the nicely formatted HTML files under "docs/html" and updates each "SEE ALSO" section to contain links to every other *.pm file in you're module's distribution.

Automatically fills in a blank SEE ALSO section for you (a nice touch).

That includes a link to the source code itself (which would be a very nice touch if it worked, but it doesn't seem to link to the right place).


Always puts the output in a subdirectory of the pod directory called "docs/html".

Uses pod2html internally, so the quality of the html output is not likely to be great.

Fixing up html links later in post-processing seems a little cheesy to me (I'm looking for decent pod processing modules so I can avoid doing things like that).

Doesn't do relative html links, generates funky HREF strings with a "/./" stuck in after the $htmlroot string. If $htmlroot is blank, the HREFs begin with "/" and that isn't good for much.

Processes L<> links even if they're indented (technically that should mean it's a code block deserving <PRE></PRE> tags).


SYNOPSIS
         use Pod::HtmlTree qw(pod2htmltree);
         pod2htmltree($httproot);

Pod::Simple::HTML by Sean M. Burke, uses Pod::Simple and Pod::Simple::PullParser
Pod::Simple::PullParser is described:
This class is for using Pod::Simple to build a Pod processor -- but one that uses an interface based on a stream of token objects, instead of based on events.



The documentation has even more TODOs scattered around than in Pod::Simple::HTMLBatch.

SYNOPSIS
        TODO  ((sic))
         perl -MPod::Simple::HTML -e \
          "exit Pod::Simple::HTML->filter(shift)->errors_seen" \
          thingy.pod

DocSet by Stas Bekman, uses Pod::POM
From the documentation for DocSet: This package builds a docset from sources in different formats. The generated documents can be all nicely interlinked and to have the same look and feel.
Currently it knows to handle input formats: * POD * HTML
and knows to generate: * HTML * PS * PDF

Used in production on a high visibility project with massive amounts of documentation: mod_perl.
Can merge together project docs with source written in pod and/or html.

Documentation is weak. You're expected to clone the "examples" directory and figure it out. The configuration file is perl code that defines a massive hash of complex structures. You need to edit a lot of template toolkit templates manually to get something that doesn't look like the mod_perl web site.

SYNOPSIS
         docset_build [options] base_full_path relative_to_base_confer_file_location

       Options:

         -h    this help
         -v    verbose
         -i    podify pseudo-pod items (s/^* /=item */)
         -s    create the splitted html version (not implemented)
         -t    create tar.gz (not implemented)
         -p    generate PS file
         -d    generate PDF file
         -f    force a complete rebuild
         -a    print available hypertext anchors (not implemented)
         -l    perform L<> links validation in pod docs
         -e    slides mode (for presentations) (not implemented)
         -m    executed from Makefile (forces rebuild,
                                       no PS/PDF file,
                                       no tgz archive!)

Pod::Simple::HTMLBatch by Sean M. Burke, uses Pod::Simple
Not a sub-class of Pod::Simple::HTML. Works on a tree of files ("Batch").

Recently written, under active maintenance.

Sean M. Burke hangs out on pod-people@perl.org and answers questions.

Does the main job I was after, turns L<> links into relative html links with minimal fuss.

Used by the search.cpan.org site.

The documentation is a little sketchy (as of this writing, there are lots of gaps labeled TODO).

The appearance of the contents page it generates isn't great -- one alphabetized linear list for each top-level directory (I just link past it to use an introductory page of my own). Options do exist to customize (or skip) the contents page generation (I haven't tried them, myself).

There's a bunch of javascript folderol for doing user selectable style sheets, the supreme coolness of which is not immediately apparent to me. (There's a partially documented option -- $batchconv->add_css( $url ); which I would guess would override that stuff.)

SYNOPSIS
   use Pod::Simple::HTMLBatch;
   my $batchconv = Pod::Simple::HTMLBatch->new;
   $batchconv->verbose(3);
   $batchconv->batch_convert( [ $in_dir ], $out_dir );

From the command line (to get html format docs for all installed modules):

   mkdir html_docs; cd html_docs
   perl -mPod::Simple::HTMLBatch -ePod::Simple::HTMLBatch::go @INC .



OODoc 0.90 by Mark Overmeer, (a large but self-contained bundle)
"extends pod with some keywords to be able to document error messages, inheritance, and examples (it is the step from visual markup to logical markup) in your code, but can also accept plain pod..."

Could be a good idea: it's occurred to me that pod and OOP aren't the best mix (e.g. documentation for a sub-class doesn't describe inherited methods, you need to look elsewhere to learn about all of them, following the inheritance chain up the docs).

Haven't played with it yet.
Markup uses special extensions of it's own, which may slow adoption.

SYNOPSIS
  See  OODoc::Parser::Markov



Marek::Pod::HTML by Marek Rouchal, uses Pod::Parser and Pod::Checker (both in the core library); and HTML::Entities, HTML::TreeBuilder
Written about five years ago as a candidate to replace Pod::HTML (and pod2html); works on one or more files; has optional ToC generation. Comes with a "mpod2html" script that acts as a front end.

From the documentation for "mpod2html":
An important note: mpod2html will cross-link only those documents that are processed in one conversion session. The benefit is that you will get only working hyperlinks, no "dead ends". The downside is that you cannot simply convert one additional Pod and everything will be nicely cross-linked.

I gather the author was trying to play nice with the community, and was working in the then current style (e.g. using the Pod::Parser module, officially blessed into the core library). He's succeeded in getting a number of related modules placed in the perl core.

And yet, this module seems to have been abandoned (it still resides in the preliminary "Marek" namespace).

It requires that you give it both the the file system version of a module name and also the package name defined inside the file (usually right at the top)... why doesn't it just read the package name for you?

That task was pushed out to the mpod2html script, which makes the module itself much less practical to work with on it's own.

SYNOPSIS
  use Marek::Pod::HTML;
  pod2html( { -dir => 'html' },
    { '/usr/lib/perl5/Pod/HTML.pm' => 'Pod::HTML' });

 Alternately:
       mpod2html [ -converter module ]
                 [ -suffix suffix ]
                 [ -filesuffix suffix ] 
                 [ -dir path ] 
                 [ -libpods pod1,pod2,... ] 
                 [ -(no)localtoc ]
                 [ -(no)navigation ]
                 [ -(no)toc ]
                 [ -tocname filename ]
                 [ -toctitle title ]
                 [ -(no)idx ]
                 [ -idxopt options ]
                 [ -idxname filename ]
                 [ -idxtitle title ]
                 [ -(no)ps ]
                 [ -psdir path ]
                 [ -psfont font ]
                 [ -papersize format ]
                 [ -(no)inc ]
                 [ -(no)script ]
                 [ -(no)warnings ]
                 [ -(no)verbose ]
                 [ -(no)banner ]
                 [ -stylesheet link ]
                 [ dir1 , dir2 , ... ]
                 [ pod1 , pod2 , ... ]



Pod::POM by Andy Wardley. uses Pod::POM::Constants, Pod::POM::Nodes and Pod::POM::View::Pod.
Quoting the docs:
This module implements a parser to convert Pod documents into a simple object model form known hereafter as the Pod Object Model. The object model is generated as a hierarchical tree of nodes, each of which represents a different element of the original document. The tree can be walked manually and the nodes examined, printed or otherwise manipulated. In addition, Pod::POM supports and provides view objects which can automatically traverse the tree, or section thereof, and generate an output representation in one form or another.

A script is provided for converting Pod documents to other format by using the view objects provided. The pom2 script should be called with two arguments, the first specifying the output format, the second the input filename. ...

Used by Stas Bekman's DocSet.

Andy Wardley himself explains that Pod::POM's advantage is "flexibility in being able to customise the generated output"

Cute touch: if you replace "pom2" with symlinks pom2html and pom2text, it will determine the output format from the name of the symlink.
I almost missed this one: it may be condemned to obscurity because it doesn't have "Html" in it's name.

Doesn't seem to make any attempt at converting L<> linkage to html links.

File oriented: need to write your own recursive descent and filename crunching code.

SYNOPSIS

    $ pom2 text My/Module.pm > README
    $ pom2 html My/Module.pm > ~/public_html/My/Module.html

    # Alternately:

    use Pod::POM;

    my $parser = Pod::POM->new(\%options);

    # parse from a text string
    my $pom = $parser->parse_text($text)
        || die $parser->error();

    # parse from a file specified by name or filehandle
    my $pom = $parser->parse_text($file)
        || die $parser->error();

    # parse from text or file 
    my $pom = $parser->parse($text_or_file)
        || die $parser->error();

    # examine any warnings raised
    foreach my $warning ($parser->warnings()) {
	warn $warning, "\n";
    }

    # print table of contents using each =head1 title
    foreach my $head1 ($pom->head1()) {
	print $head1->title(), "\n";
    }

    # print each section
    foreach my $head1 ($pom->head1()) {
	print $head1->title(), "\n";
        print $head1->content();
    }

    # print the entire document as HTML
    use Pod::POM::View::HTML;
    print Pod::POM::View::HTML->print($pom);

    # create custom view
    package My::View;
    use base qw( Pod::POM::View::HTML );

    sub view_head1 {
	my ($self, $item) = @_;
	return '<h1>', 
	       $item->title->present($self), 
               "</h1>\n",
	       $item->content->present($self);
    }
    
    package main;
    print My::View->print($pom);

But wait, there's more

There are more ways to do it than are dreamt of in your philosophies, Horatio: And if you want to jump in the game yourself, there's plenty of room in the namespace. Still up for grabs: Pod::HtML, Pod::HtMl, Pod::HtmL...

appendix 1: Quotable Quotes

Let us turn back the clock to the golden years of the perl 5.6 era:

perldelta - what's new for perl v5.6.x:

   As of release 5.6.0 of Perl, Pod::Parser is now the
   officially sanctioned "base parser code" recommended for
   use by all pod2xxx translators.  Pod::Text (pod2text)
   and Pod::Man (pod2man) have already been converted to
   use Pod::Parser and efforts to convert Pod::HTML
   (pod2html) are already underway.  
Too bad something went wrong with that effort...

But hey, at least now it's on the todo list:

en-5.8.5 - perltodo:

   POD -> HTML conversion still sucks

   Which is crazy given just how simple POD purports to be, and how simple HTML
   can be.

Neglect not the perl pod-people list, if you have any interest in this subject at all: pod-people archive. Some miscellaneous messages quoted from perl.pod-people follow...

Some promises from Sean M. Burke, from just this year (2004):

From: sburke[at]cpan.org (Sean M. Burke)
Date: Thu, 15 Jul 2004 12:28:46 -0800
Subject: Re: perltodo - POD - HTML conversion still sucks (I think that  not!)

Michael G Schwern wrote:

>Instead of just putting in a new POD -> HTML conversion module and leaving
>the old one around, consider gutting POD::Html and making it a thin
>wrapper around a cleaner module (such as POD::HtmlEasy).  Clean up the
>old messes.

I've already got that basically done for the next release of Pod-Simple.

Jumping back a few years, to 2002:


From: rra[at]stanford.edu (Russ Allbery)
Date: Mon, 22 Jul 2002 00:30:01 -0700
To: Dave Storrs 
Subject: Re: Pod::Html question: L<> with text

Dave Storrs  writes:

> But whenever I try that, it tells me that it "could not resolve link"
> and spits up (instead of a link, I just get <EM> tags).  I poked at the
> source a bit and then wrote the following patch to Pod::Html.pm, but it
> seems like I'm probably missing something.  Is there a better way to do
> this?

The problem that you're running into is that pod2html in the Perl
distribution needs some serious loving attention.  It's currently quite a
bit behind the curve compared to pod2text or pod2man.

I've been tempted a few times to write a new one for my own purposes, as
none of the other POD to HTML translators out there quite do what I want,
but as there are already something like four of them, I've held off since
it feels like a waste of energy.  Parsing POD into HTML can require
different techniques than the translators I've already written and is
better suited for a tree-style parse, and there are apparently new POD
parsers coming up that will make that easier.

In the meantime, if you're just trying to convert your own documents, I'd
poke around in CPAN and try one of the other POD to HTML translators and
see if they work better.  If you're trying to get things working for
people who are just using the version that comes with Perl, you may be out
of luck for the time being, but with any luck a better converter will be
in the next release of Perl.  (It's possible people fixed some things for
5.8.0, but I'm pretty sure that pod2html is still rather behind the
curve.)

===

From: sburke[at]cpan.org (Sean M. Burke)
Date: Mon, 22 Jul 2002 03:11:50 -0600
To: Russ Allbery <rra[at]stanford.edu>, Dave Storrs <dstorrs[at]dstorrs.com>
Subject: Re: Pod::Html question: L<> with text

Russ Allbery wrote:
>[...]The problem that you're running into is that pod2html in the Perl
>distribution needs some serious loving attention.[...]

Well, not so much attention, as total replacement.  The current Pod::Html 
code is, frankly, the worst (semi-)working code that I've ever seen written 
in any high-level language -- with the one exception of the "Universal 
Bulletin Board" source code.

Incidentally, now that my book is finally done, I've been poking at my 
mostly-done perlpodspec-compliant Pod parser (to replace Pod::Parser), and 
it's going surprisingly well.  The first thing that I mean to do with it 
(as a proof of concept, notably) is write a new pod2html.  I think I 
mentioned this a few days ago, so sorry if I'm repeating myself.

>[...]Parsing POD into HTML can require different techniques than the 
>translators I've already written and is better suited for a tree-style 
>parse, and there are apparently new POD parsers coming up that will make 
>that easier.[...]

Yes, much much easier -- as easy as it should have been from the 
beginning!  The new Pod parser essentially makes the difference between a 
tree view and a token view a merely superficial interface question, instead 
of a substantial difference.

===

From: sburke[at]cpan.org (Sean M. Burke)
Date: Mon, 22 Jul 2002 05:06:25 -0600
To: Dave Storrs <dstorrs[at]dstorrs.com>, <pod-people[at]perl.org>
Subject: Re: Pod::Html question: L<> with text

Dave Storrs wrote:

>[...]From what I see in the man page, I should be able to do the following:
>        Please click L<here|http://archive.develooper.com> for the
>        archives
>[...]

Actually, no; that L<text|scheme:...> syntax is expressly forbidden.
As perlpod says:
<<
Or you can link to a web page:
* L<scheme:...>
Links to an absolute URL. For example, L<http://www.perl.org/>. But note
that there is no corresponding L<text|scheme:...> syntax, for various
reasons.
>>

If your perlpod doesn't say that, see
http://public.activestate.com/cgi-bin/perlbrowse?filename=pod%2Fperlpod.pod&action=print
or, the real scary stuff:
http://public.activestate.com/cgi-bin/perlbrowse?filename=pod%2Fperlpodspec.pod&action=print


So instead of anything involving the forbidden L<here|http://...> syntax, try something like:
        The archives are at L<http://archive.develooper.com>


Joseph Brenner, 09 Nov 2004