home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Book HomeCGI Programming with PerlSearch this book

6.2. Server Side Includes

Many times we want to create a web page that contains very little dynamic information. It seems like a lot of work to go through the trouble of writing a full-fledged application in order to display a single piece of dynamic information such as the current date and time, file modification time, or the user's IP address, in an otherwise static document. Fortunately, there is a tool included with most web servers called Server Side Includes , or SSI.

SSI allows us to embed special directives in our HTML documents to execute other programs or insert various pieces of data such as environment variables and file statistics. While SSI has technically nothing to do with CGI, it is an important tool for incorporating dynamic information, as well as output from CGI programs, into otherwise static documents, and you should definitely be aware of its abilities and limitations because in some cases it can provide a simpler and more efficient solution than a CGI script.

For example, say you want to have a web page display the last date it was modified. You could create a CGI script to display the file and use Perl's -M operator to determine the age of the file. However, it's much simpler to enable SSI and include the following line:

Last modified: <!--#echo var="LAST_MODIFIED" -->

The terms within the HTML comment are an SSI command. When the browser requests this document from a web server, the server parses it and returns the result (see Figure 6-1). In this case, it replaces the SSI command with a timestamp reflecting the last time this document was modified. The server does not automatically parse all files looking for SSI directives, but only documents that are associated with SSI. We will look at how to configure this in the next section.

NOTE

Note that SSI cannot parse CGI output; it only parses otherwise static HTML files. The new architecture in Apache 2.0 should eventually support SSI parsing of CGI output if the CGI outputs a particular Content-type header. Other web servers do not support this.

Because the SSI engine is compiled into the web server, it is many times more efficient than a CGI script. However, SSI commands are limited and can only handle basic tasks; in one sense this simplicity is good because SSI is very easy to learn. HTML designers with no programming experience can easily add SSI commands to their documents. Later in this chapter we'll see how other template solutions provide more powerful alternatives aimed at developers.

Figure 6-1

Figure 6-1. Server side includes

6.2.1. Configuration

The server must know which files to parse for SSI commands. We'll see how to configure the Apache web server in this section. If you are using another web server, it should be equally easy to configure; refer to its documentation.

You have the following options with SSI:

To enable SSI for a particular directory or directories, add Includes as an option in each directory. If you wish to enable SSI throughout your web site for all files ending in .shtml, then add the following to httpd.conf (or access.conf if used):

<Location />
...
Options     Includes
AddHandler  server-parsed .shtml
...	
</Location>

Note that your configuration files probably have other lines between the <Location /> and </Location> tags as well as other entries for Options; you can leave these as they are.

You are not restricted to using the .shtml extension; you can have the server parse all HTML documents with this directive:

AddHandler server-parsed .html

However, you should do this only if all of your pages are dynamic because parsing each HTML document increases the amount of work the web server must do and reduces performance.

You should also add the following lines to httpd.conf outside any Location or Directory tags (or srm.conf, if used):

DirectoryIndex   index.html index.shtml
AddType          text/html     .shtml

The DirectoryIndex directive tells the server that if the URL refers to a directory and that directory contains index.shtml, then it should display it if index.html is not found. The AddType server directive tells the server that the media type of parsed files is HTML instead of the default, which is typically plain text.

We'll look at the syntax of SSI commands in a moment, but one particular SSI command, exec, allows you to execute external applications and include the output in your document. You may not wish to enable this for security reasons; you may not wish to give HTML authors the same level of trust in this regard that you give to CGI developers. Also, if you do enable exec and you have a CGI script on your site that creates static HTML files from users' input (as some popular guestbook and message board CGI scripts do), make sure that SSI is not enabled for files created by this CGI script. If someone using this CGI script enters the following and SSI tags are not removed by the CGI application, then their malicious command will be executed the first time their comment is read:

<!--#exec cmd="/bin/rm -rf *" -->

This would remove all the files from all the directories the server can write to. The following could be just as disastrous on a Windows server:

<!--#exec cmd="del /f /s /q c:\" -->

Most CGI scripts that generate files such as this create them with a .html extension, so you would not want to enable exec and configure the web server to parse all .html files. Note that this is not as much of a concern if CGI scripts are not allowed to generate .html files.

To enable SSI without enabling the exec tag, use the following option instead of Includes:

Options     IncludesNoExec

Older versions of Apache and other web servers actually required that the CGI script execution also be enabled in order to use the exec command:

Options     Includes ExecCGI

As you'll recall from Chapter 1, "Getting Started ", there are good reasons to restrict CGI scripts to particular directories. Previously you had to choose between enabling CGI script execution and disallowing the exec command. Fortunately, this restriction has been lifted: you can now enable the exec command while disallowing CGI execution.

6.2.2. Format

Now let's see what SSI can do for us. All SSI directives have the following syntax:

<!--#element attribute="value" attribute="value" ... -->

Table 6-1 lists the available SSI commands. In this chapter, we will discuss each of these directives in detail.

Table 6-1. Server Side Include Commands

Element

Attribute

Description

echo

var

Displays the value of environment variables, special SSI variables and any user-defined variables.

include

Inserts the contents of a particular file into the current document

file

Path of the file relative to the current directory, you cannot use an absolute path or reference files outside the document root; the file contents are included directly into the page with no additional processing.

virtual

Virtual path (URL) relative to the document root; the server interprets the path just as if it were another HTTP request, so you can use this attribute to insert the results of a CGI program or another SSI document.

fsize

Inserts the size of a file.

file

Path of the file relative to the current directory.

virtual

Virtual path (URL) relative to the document root.

flastmod

file

Inserts the last modification date and time for a specified file.

exec

Executes external programs and inserts the output in current document (unless SSI has been configured with IncludesNoExec).

cmd

Path to any executable application relative to the current directory.

cgi

Virtual path to a CGI program; however, you cannot pass a query string -- if you want to pass a query string, use #include virtual="..." instead.

printenv

Displays a list of environment variables and their values.

set

var

Sets the value for a new or existing environment variable; the variable only lasts throughout the current request (but it is available to CGI scripts or other SSI documents included in this document).

if, elif

expr

Starts conditional.

else

Starts the "else" part of the conditional.

endif

Ends conditional.

config

Modifies various aspects of SSI.

errmsg

Default error message.

sizefmt

Format for size of the file.

timefmt

Format for date and time.

6.2.3. Environment Variables

You can insert the values of environment variables in an otherwise static HTML document. Here is an example of a document that will contain the server name, the user's remote host, and the current local date and time:

<HTML>
<HEAD>
    <TITLE>Welcome!</TITLE>
</HEAD>
<BODY>
<H1>Welcome to my server at <!--#echo var="SERVER_NAME"-->...</H1>
<HR>
Dear user from <!--#echo var="REMOTE_HOST"-->,
<P>
There are many links to various CGI documents throughout the Web,
so feel free to explore.
<P>
<HR>
<ADDRESS>Webmaster (<!--#echo var="DATE_LOCAL"-->)</ADDRESS>
</BODY>
</HTML>

In this example, we use the echo SSI command with the var attribute to display the IP name or address of the serving machine, the remote host name, and the local time. All environment variables that are available to CGI programs are also available to SSI directives. There are also a few variables that are exclusively available for use in SSI directives, such as DATE_LOCAL, which contains the current local time. Another is DATE_GMT, which contains the time in Greenwich Mean Time:

The current GMT time is: <!--#echo var="DATE_GMT" -->

Here is another example that uses some of these exclusive SSI environment variables to output information about the current document:

<H2>File Summary</H2>
<HR>
The document you are viewing is:  <!--#echo var="DOCUMENT_NAME"-->,
which you can access it a later time by opening the URL to:
<!--#echo var="DOCUMENT_URI"-->.
<HR>
Document last modified on <!--#echo var="LAST_MODIFIED"-->.

This will display the name, URL, and modification time for the current HTML document.

For a listing of CGI environment variables, refer to Chapter 3, "The Common Gateway Interface". Table 6-2 shows the additional variables available to SSI pages.

Table 6-2. Additional Variables Available to SSI Pages

Environment Variable

Description

DOCUMENT_NAME

The current filename

DOCUMENT_URI

Virtual path to the file

QUERY_STRING_UNESCAPED

Unencoded query string with all shell metacharacters escaped with "\"

DATE_LOCAL

Current date and time in the local time zone

DATE_GMT

Current date and time in GMT

LAST_MODIFIED

Last modification date and time for the file requested by the browser

6.2.4. Tailoring SSI Output

The config command allows you to select the manner in which error messages, file size information, and date and time are displayed. For example, if you use the include command to insert a nonexisting file, the server will output a default error message like the following:

[an error occurred while processing this directive]

By using the config command, you can modify the default error message. If you want to set the message to "[error-contact webmaster]" you can use the following:

<!--#config errmsg="[error-contact webmaster]" -->

You can also set the file size format that the server uses when displaying information with the fsize command. For example, this command:

<!--#config sizefmt="abbrev" -->

will force the server to display the file size rounded to the nearest kilobyte (KB) or megabyte (MB). You can use the argument "bytes" to set the display as a byte count:

<!--#config sizefmt="bytes" -->

Here is how you can change the time format:

<!--#config timefmt="%D (day %j) at %r" -->
My signature was last modified on: 
<!--#flastmod virtual="/address.html"-->.

The output will look like this:

My signature was last modified on: 09/22/97 (day 265) at 07:17:39 PM

The %D format inserts the current date in mm/dd/yy format, %j inserts the day of the year, and %r the current time in hh/mm/ss AM|PM format. Table 6-3 lists all the data and time formats you can use.

Table 6-3. Time and Date Formats

Format

Value

Example

%a

Day of the week abbreviation

Sun

%A

Day of the week

Sunday

%b

Month name abbreviation

Jan

%B

Month name

January

%d

Date

01 (not 1)

%D

Date as %m/%d/%y

06/23/95

%e

Date

1

%H

24-hour clock hour

13

%I

12-hour clock hour

01

%j

Decimal day of the year

360

%m

Month number

11

%M

Minutes

08

%p

AM | PM

AM

%r

Time as %I:%M:%S %p

07:17:39 PM

%S

Seconds

09

%T

24-hour time as %H:%M:%S

16:55:15

%U

Week of the year (also %W)

49

%w

Day of the week number

5

%y

Year of the century

95

%Y

Year

1995

%Z

Time zone

EST

6.2.5. Including Boilerplates

There are times when you will have certain information that you repeat in numerous documents on the server such as a copyright notice, the webmaster's email address, etc. Instead of maintaining this information separately in each file, you can include one file that has all of this information. It is much easier to update a single file if this information changes (for example, you may need to update the copyright notice the beginning of next year). Example 6-1 shows an example of such a file that itself contains SSI commands (note the .shtml extension).

Example 6-1. footer.shtml

<HR>
<P><FONT SIZE="-1">
Copyright 1999-2000 by My Company, Inc.<BR>
Please report any problems to
  <A HREF="mailto:<!--#echo var="SERVER_ADMIN"-->">
  <!--#echo var="SERVER_ADMIN"--></A>.<BR>
This document was last modified on <!--#echo var="LAST_MODIFIED"-->.<BR>
</FONT></P>

It may look messy to include an SSI command within another HTML tag, but don't worry about this being invalid HTML because the web server will parse it before it sends it to the client. Also, you may wonder if we were to include this file in another file which file the server uses to determine the LAST_MODIFIED variable. LAST_MODIFIED is set once by the server for the file that the client requested. If that file includes other files, such as footer.shtml, LAST_MODIFIED will still refer to the original file; so this footer will do what we want.

Because included files are not complete HTML documents (they have no <HTML>, <HEAD>, or <BODY> tags), it can be easier to maintain these files if you differentiate them by creating a standard extension for them or keeping them in a particular directory. In our example we'll create a folder called /includes in the document root and place footer.shtml here. We can then include the file by adding the following line to other .shtml files:

<!--#include virtual="/includes/footer.shtml" -->

This SSI command will be replaced with a footer containing a copyright notice, the email address of the server administrator, and the modification date of the file requested.

You can also use the file attribute instead of virtual to reference the file, but file has limitations. You cannot use absolute paths, the web server does no processing on the requested file (e.g., for CGI scripts or other SSI commands), and you may not reference files outside the document root. This last restriction prevents someone from including a file like /etc/passwd in an HTML document (since it's possible that someone is able to upload files to a server without otherwise having access to this file). Given these restrictions, it's typically easier to simply use virtual.

6.2.6. Executing CGI Programs

You can use Server Side Includes to embed the results of an entire CGI program into a static HTML document by using either exec cgi or include virtual . This is convenient for those times when you want to display just one piece of dynamic data, such as:

This page has been accessed 9387 times.

Let's look at an example of inserting output from CGI programs. Suppose you have a simple CGI program that keeps track of the number of visitors, called using the include SSI command in an HTML document:

This page has been accessed
<!--#include virtual="/cgi/counter.cgi"--> times.

We can include this tag in any SSI-enabled HTML page on our web server; each page will have its own count. We don't need to pass any variables to tell the CGI which URL we need the count for; the DOCUMENT_URI environment variable will contain the URL of the original document requested. Even though this is not a standard CGI environment variable, the additional SSI variables are provided to CGI scripts invoked via SSI.

The code behind an access counter is quite short. A Berkeley DB hash file on the server contains a count of the number of visitors that have accessed each document we're tracking. Whenever a user visits the document, the SSI directive in that document calls a CGI program that reads the numerical value stored in the data file, increments it, and outputs it. The counter is shown in Example 6-2.

Example 6-2. counter.cgi

#!/usr/bin/perl -wT

use strict;
use Fcntl;
use DB_File;

use constant COUNT_FILE => "/usr/local/apache/data/counter/count.dbm";
my %count;
my $url = $ENV{DOCUMENT_URI};
local *DBM;

print "Content-type: text/plain\n\n";

if ( my $db = tie %count, "DB_File", COUNT_FILE, O_RDWR | O_CREAT ) {
    my $fd = $db->fd;
    open DBM, "+<&=$fd" or die "Could not dup DBM for lock: $!";
    flock DBM, LOCK_EX;
    undef $db;
    $count{$url} = 0 unless exists $count{$url};
    my $num_hits = ++$count{$url};
    untie %count;
    close DBM;
    print "$num_hits\n";
} else {
    print "[Error processing counter data]\n";
}

Don't worry about how we access the hash file; we'll discuss this in Chapter 10, "Data Persistence". Note that we output the media type. You must do this for included files even though the header is not returned to the client. An important thing to note is that a CGI program called by an SSI directive cannot output anything other than text because this data is embedded within the document that invoked the directive. As a result, it doesn't matter whether you output a content type of text/plain or text/html, as the browser will interpret the data with the media type of the calling document. Needless to say, your CGI program cannot output graphic images or other binary data.



Library Navigation Links

Copyright © 2001 O'Reilly & Associates. All rights reserved.