Alternatives for Generating Output (CGI Programming with Perl)

5.4.2. Here Documents

As we have seen in earlier examples, Perl supports a feature called here documents that allows you to express a large block of content separately within your code. To create a here document, simply use << followed by the token that will be used to indicate the end of the here document. You can include the token in single or double quotes, and the content will be evaluated as if it were a string within those quotes. In other words, if you use single quotes, variables will not be interpreted. If you omit the quotes, it acts as though you had used double quotes.

Here is the previous example using a here document instead:

#!/usr/bin/perl -wT

use strict;
use CGI;

my $timestamp = localtime;

print <<END_OF_MESSAGE;
Content-type: text/html

<html>
  <head>
    <title>The Time</title>
  </head>
  
  <body bgcolor="#ffffff">
    <h2>Current Time</h2>
    <hr>
    <p>The current time according to this system is: 
    <b>$timestamp</b></p>
  </body>
</html>
END_OF_MESSAGE

This is much cleaner than using lots of print statements, and it allows us to indent the HTML content. The result is that this is much easier to read and to update. You could have accomplished something similar by using one print statement and putting all the content inside one pair of double quotes, but then you would have had to precede each double quote in the HTML with a backslash, and for complicated HTML documents this could get tedious.

Another solution is to use Perl's qq// operator, but with a different delimiter, such as ~. You must find a delimiter that will not appear in the HTML, and remember that if your content includes JavaScript, it can include many characters that HTML might otherwise not. here documents are generally a safer solution.

One drawback to using here documents is that they do not easily indent, so they may look odd inside blocks of otherwise cleanly indented code. Tom Christiansen and Nathan Torkington address this issue in the Perl Cookbook (O'Reilly & Associates, Inc.). The following solutions are adapted from their discussion.

If you do not care about extra leading whitespace in your HTML output, you can simply indent everything. You can also indent the ending token if you use quotes and include the indent in the name (although this is more readable, it may be less maintainable because if the indentation changes, then you must adjust the name of the token to match):

#!/usr/bin/perl -wT

use strict;
use CGI;

my $timestamp = localtime;
display_document( $timestamp );

sub display_document {
    my $timestamp = shift;
    
    print <<"    END_OF_MESSAGE";
      Content-type: text/html
      
      <html>
        <head>
          <title>The Time</title>
        </head>
        
        <body bgcolor="#ffffff">
          <h2>Current Time</h2>
          <hr>
          <p>The current time according to this system is: 
          <b>$timestamp</b></p>
        </body>
      </html>
    END_OF_MESSAGE
}

One problem with indenting HTML here documents is that the extra indentation is sent to the client. You can solve this problem by creating a function that "unindents" your text. If you wish to remove all indentation, this is simple; if you want to maintain your HTML's indentation, this is more complex. The challenge is determining the amount of indentation to remove: what portion belongs to the content and what part is incidental to your script? You could assume the first line contains the smallest indent, but this would not work if you were only printing the end of an HTML document, for example, when the last line would probably contain the smallest indent.

In the following code the unindent subroutine looks at all of the lines being printed, finds the smallest indent, and removes that amount from all of the lines:

sub unindent;

sub display_document {
    my $timestamp = shift;
    
    print unindent <<"    END_OF_MESSAGE";
      Content-type: text/html
      
      <html>
        <head>
          <title>The Time</title>
        </head>
        
        <body bgcolor="#ffffff">
          <h2>Current Time</h2>
          <hr>
          <p>The current time according to this system is: 
          <b>$timestamp</b></p>
        </body>
      </html>
    END_OF_MESSAGE
}

sub unindent {
    local $_ = shift;
    my( $indent ) = sort /^([ \t]*)\S/gm;
    s/^$indent//gm;
    return $_;
}

Predeclaring the unindent function, as we do on the first line, allows us to omit parentheses when we use it. This solution, of course, increases the amount of work the server must do for each request, so it would not be appropriate on a heavily used server. Also keep in mind that each additional space increases the number of bytes you must transfer and the user must download, so you may actually want to strip all leading whitespace instead. After all, users probably care more about the page downloading faster than how it looks if they view the source code.

Overall, here documents are not a bad solution for large chunks of code, but they do not offer CGI.pm's advantages, especially the ability to have your HTML code verified syntactically. It's much harder to forget to close an HTML tag with CGI.pm than it is with a here document. Also, many times you must build HTML programmatically. For example, you may read records from a database and add a row to a table for each record. In these cases, when you are working with small chunks of HTML, CGI.pm is much easier to work with than here documents.

Using CGI.pm's methods for outputting HTML generates strong reactions in developers. Some love it; others don't. Don't worry if it doesn't match your needs, we will look at a whole class of alternatives in the next chapter.

5.4. Alternatives for Generating Output

5.4.1. Lots of print Statements

5.4.2. Here Documents