home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Book Home Programming PerlSearch this book

Chapter 26. Plain Old Documentation

One of the principles underlying Perl's design is that simple things should be simple, and hard things should be possible. Documentation should be simple.

Perl supports a simple text markup format called pod that can stand on its own or be freely intermixed with your source code to create embedded documentation. Pod can be converted to many other formats for printing or viewing, or you can just read it directly, because it's plain.

Pod is not as expressive as languages like XML, [LaTeX], troff(1), or even HTML. This is intentional: we sacrificed that expressiveness for simplicity and convenience. Some text markup languages make authors write more markup than text, which makes writing harder than it has to be, and reading next to impossible. A good format, like a good movie score, stays in the background without causing distraction.

Getting programmers to write documentation is almost as hard as getting them to wear ties. Pod was designed to be so easy to write that even a programmer could do it--and would. We don't claim that pod is sufficient for writing a book, although it was sufficient for writing this one.

26.1. Pod in a Nutshell

Most document formats require the entire document to be in that format. Pod is more forgiving: you can embed pod in any sort of file, relying on pod translators to extract the pod. Some files consist entirely of 100% pure pod. But other files, notably Perl programs and modules, may contain dollops of pod sprinkled about wherever the author feels like it. Perl simply skips over the pod text when parsing the file for execution.

The Perl lexer knows to begin skipping when, at a spot where it would ordinarily find a statement, it instead encounters a line beginning with an equal sign and an identifier, like this:

=head1 Here There Be Pods!
That text, along with all remaining text up through and including a line beginning with =cut, will be ignored. This allows you to intermix your source code and your documentation freely, as in:
=item snazzle

The snazzle() function will behave in the most spectacular
form that you can possibly imagine, not even excepting
cybernetic pyrotechnics.

=cut

sub snazzle {
    my $arg = shift;
    ....
}

=item razzle

The razzle() function enables autodidactic epistemology generation.

=cut

sub razzle {
    print "Epistemology generation unimplemented on this platform.\n";
}
For more examples, look at any standard or CPAN Perl module. They're all supposed to come with pod, and nearly all do, except for the ones that don't.

Since pod is recognized by the Perl lexer and thrown out, you may also use an appropriate pod directive to quickly comment out an arbitrarily large section of code. Use a =for pod block to comment out one paragraph, or a =begin/=end pair for a larger section. We'll cover the syntax of those pod directives later. Remember, though, that in both cases, you're still in pod mode afterwards, so you need to =cut back to the compiler.

print "got 1\n";

=for commentary
This paragraph alone is ignored by anyone except the
mythical "commentary" translator.  When it's over, you're
still in pod mode, not program mode.
print "got 2\n";


=cut

# ok, real program again
print "got 3\n";

=begin comment 

print "got 4\n";

all of this stuff
here will be ignored
by everyone

print "got 5\n";

=end comment 

=cut

print "got 6\n";
This will print out that it got 1, 3, and 6. Remember that these pod directives can't go just anywhere. You have to put them only where the parser is expecting to see a new statement, not just in the middle of an expression or at other arbitrary locations.

From the viewpoint of Perl, all pod markup is thrown out, but from the viewpoint of pod translators, it's the code that is thrown out. Pod translators view the remaining text as a sequence of paragraphs separated by blank lines. All modern pod translators parse pod the same way, using the standard Pod::Parser module. They differ only in their output, since each translator specializes in one output format.

There are three kinds of paragraphs: verbatim paragraphs, command paragraphs, and prose paragraphs.

26.1.1. Verbatim Paragraphs

Verbatim paragraphs are used for literal text that you want to appear as is, such as snippets of code. A verbatim paragraph must be indented; that is, it must begin with a space or tab character. The translator should reproduce it exactly, typically in a constant width font, with tabs assumed to be on eight-column boundaries. There are no special formatting escapes, so you can't play font games to italicize or embolden. A < character means a literal <, and nothing else.

26.1.2. Pod Directives

All pod directives start with = followed by an identifier. This may be followed by any amount of arbitrary text that the directive can use however it pleases. The only syntactic requirement is that the text must all be one paragraph. Currently recognized directives (sometimes called pod commands) are:

=head1
=head2
...

The =head1, =head2,... directives produce headings at the level specified. The rest of the text in the paragraph is treated as the heading description. These are similar to the .SH and .SS section and subsection headers in man(7), or to <H1>...</H1> and <H2>...</H2> tags in HTML. In fact, that's exactly what those translators convert these directives into.

=cut

The =cut directive indicates the end of a stretch of pod. (There might be more pod later in the document, but if so it will be introduced with another pod directive.)

=pod

The =pod directive does nothing beyond telling the compiler to lay off parsing code through the next =cut. It's useful for adding another paragraph to the document if you're mixing up code and pod a lot.

=over NUMBER
=item SYMBOL
=back

The =over directive starts a section specifically for the generation of a list using the =item directive. At the end of your list, use =back to end it. The NUMBER, if provided, hints to the formatter how many spaces to indent. Some formatters aren't rich enough to respect the hint, while others are too rich to respect it, insofar as it's difficult when working with proportional fonts to make anything line up merely by counting spaces. (However, four spaces is generally construed as enough room for bullets or numbers.)

The actual type of the list is indicated by the SYMBOL on the individual items. Here is a bulleted list:

=over 4

=item *

Mithril armor

=item *

Elven cloak

=back
And a numbered list:
=over 4

=item 1.

First, speak "friend".

=item 2.

Second, enter Moria.

=back
And a named list:
=over 4

=item armor()

Description of the armor() function

=item chant()

Description of the chant() function

=back
You may nest lists of the same or different types, but some basic rules apply: don't use =item outside an =over/=back block; use at least one =item inside an =over/=back block; and perhaps most importantly, keep the type of the items consistent within a given list. Either use =item * for each item to produce a bulleted list, or =item 1., =item 2., and so on to produce numbered list, or use =item foo, =item bar, and so on to produce a named list. If you start with bullets or numbers, stick with them, since formatters are allowed to use the first =item type to decide how to format the list.

As with everything in pod, the result is only as good as the translator. Some translators pay attention to the particular numbers (or letters, or Roman numerals) following the =item, and others don't. The current pod2html translator, for instance, is quite cavalier: it strips out the sequence indicators entirely without looking at them to infer what sequence you're using, then wraps the entire list inside <OL> and </OL> tags so that the browser can display it as an ordered list in HTML. This is not to be construed a feature; it may eventually be fixed.

=for TRANSLATOR
=begin TRANSLATOR
=end TRANSLATOR

=for, =begin, and =end let you include special sections to be passed through unaltered, but only to particular formatters. Formatters that recognize their own names, or aliases for their names, in TRANSLATOR pay attention to that directive; any others completely ignore them. The directive =for specifies that just the rest of this paragraph is destined for a particular translator.

=for html
<p> This is a <flash>raw</flash> <small>HTML</small> paragraph </p>
The paired =begin and =end directives work similarly to =for, but instead of accepting a single paragraph only, they treat all text between matched =begin and =end as destined for a particular translator. Some examples:
=begin html

<br>Figure 1.<IMG SRC="figure1.png"><br>

=end html

=begin text

  ---------------
  |  foo        |
  |        bar  |
  ---------------

^^^^ Figure 1. ^^^^

=end text
Values of TRANSLATOR commonly accepted by formatters include roff, man, troff, nroff, tbl, eqn, latex, tex, html, and text. Some formatters will accept some of these as synonyms. No translator accepts comment--that's just the customary word for something to be ignored by everybody. Any unrecognized word would serve the same purpose. While writing this book, we often left notes for ourselves under the directive =for later.

Note that =begin and =end do nest, but only in the sense that the outermost matched set causes everything in the middle to be treated as nonpod, even if it happens to contain other =word directives. That is, as soon as any translator sees =begin foo, it will either ignore or process everything down to the corresponding =end foo.

26.1.3. Pod Sequences

The third type of paragraph is simply "flowed" text. That is, if a paragraph doesn't start with either whitespace or an equals sign, it's taken as a plain paragraph: regular text that's typed in with as few frills as possible. Newlines are treated as equivalent to spaces. It's largely up to the translator to make it look nice, because programmers have more important things to do. It is assumed that translators will apply certain common heuristics--see the section "Pod Translators and Modules" later in this chapter.

You can do some things explicitly, however. Inside either ordinary paragraphs or heading/item directives (but not in verbatim paragraphs), you may use special sequences to adjust the formatting. These sequences always start with a single capital letter followed by a left angle bracket, and extend through the matching (not necessarily the next) right angle bracket. Sequences may contain other sequences.

Here are the sequences defined by pod:

I<text>

Italicized text, used for emphasis, book titles, names of ships, and manpage references such as "perlpod(1)".

B<text>

Emboldened text, used almost exclusively for command-line switches and sometimes for names of programs.

C<text>

Literal code, probably in a fixed-width font like Courier. Not needed on simple items that the translator should be able to infer as code, but you should put it anyway.

S<text>

Text with nonbreaking spaces. Often surrounds other sequences.

L<name>

A cross reference (link) to a name:

L<name>

Manual page

L<name/ident>

Item in manual page

L<name/"sec">

Section in other manual page

L<"sec">

Section in this manual page (the quotes are optional)

L</"sec">

Ditto

The next five sequences are the same as those above, but the output will be only text, with the link information hidden as in HTML:

L<text|name>
L<text|name/ident>
L<text|name/"sec">
L<text|"sec">
L<text|/"sec">

The text cannot contain the characters / and |, and should contain < or > only in matched pairs.

F<pathname>

Used for filenames. This is traditionally rendered the same as I.

X<entry>

An index entry of some sort. As always, it's up to the translator to decide what to do. The pod specification doesn't dictate that.

E<escape>

A named character, similar to HTML escapes:

Z<>

A zero-width character. This is nice for putting in front of sequences that might confuse something. For example, if you had a line in regular prose that had to start with an equals sign, you could write that as:

Z<>=can you see
or for something with a "From" in it, so the mailer doesn't put a > in front:
Z<>From here on out...

Most of the time, you'll need only a single set of angle brackets to delimit one of these pod sequences. Sometimes, however, you will want to put a < or > inside a sequence. (This is particularly common when using a C<> sequence to provide a constant-width font for a snippet of code.) As with all things in Perl, there is more than one way to do it. One way is to simply represent the closing bracket with an E sequence:

C<$a E<lt>=E<gt> $b>
This produces "$a <=> $b".

A more readable, and perhaps more "plain" way, is to use an alternate set of delimiters that doesn't require the angle brackets to be escaped. Doubled angle brackets (C<<stuff>>) may be used, provided there is whitespace immediately following the opening delimiter and immediately preceding the closing one. For example, the following will work:

C<< $a <=> $b >>
You may use as many repeated angle-brackets as you like so long as you have the same number of them on both sides, and you make sure that whitespace immediately follows the last < of the left side and immediately precedes the first > of the right side. So the following will also work:
C<<< $a <=> $b >>>
C<<<< $a <=> $b >>>>
All these end up spitting out $a <=> $b in a constant-width font.

The extra whitespace inside on either end goes away, so you should leave whitespace on the outside if you want it. Also, the two inside chunks of extra whitespace don't overlap, so if the first thing being quoted is >>, it isn't taken as the closing delimiter:

The C<< >> >> right shift operator.
This produces "The >> right shift operator."

Note that pod sequences do nest. That means you can write "The I<Santa MarE<iacute>a> left port already" to produce "The Santa Mar&#237;a left port already", or "B<touch> S<B<-t> I<time>> I<file>" to produce "touch -t timefile", and expect this to work properly.



Library Navigation Links

Copyright © 2001 O'Reilly & Associates. All rights reserved.