home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


 
   
•  Table of Contents
•  Index
•  Reviews
•  Examples
•  Reader Reviews
•  Errata

 

Apache The Definitive Guide, 3rd Edition
By Ben Laurie, Peter Laurie
   
Publisher : O'Reilly
Pub Date : December 2002
ISBN : 0-596-00203-3

Updated to cover the changes in Apache's latest release, 2.0, as well as Apache 1.3, this useful guide discusses how to obtain, set up, secure, modify, and troubleshoot the Apache software on both Unix and Windows systems. In addition to covering the installation and configuration of mod_perl and Tomcat, the book examines PHP, Cocoon, and other new technologies that are associated with the Apache web server.

777

Copyright
    Preface
      Who Wrote Apache, and Why?
      The Demonstration Code
      Conventions Used in This Book
      Organization of This Book
      Acknowledgments
   
    Chapter 1.  Getting Started
      Section 1.1.  What Does a Web Server Do?
      Section 1.2.  How Apache Works
      Section 1.3.  Apache and Networking
      Section 1.4.  How HTTP Clients Work
      Section 1.5.  What Happens at the Server End?
      Section 1.6.  Planning the Apache Installation
      Section 1.7.  Windows?
      Section 1.8.  Which Apache?
      Section 1.9.  Installing Apache
      Section 1.10.  Building Apache 1.3.X Under Unix
      Section 1.11.  New Features in Apache v2
      Section 1.12.  Making and Installing Apache v2 Under Unix
      Section 1.13.  Apache Under Windows
   
    Chapter 2.  Configuring Apache: The First Steps
      Section 2.1.  What's Behind an Apache Web Site?
      Section 2.2.  site.toddle
      Section 2.3.  Setting Up a Unix Server
      Section 2.4.  Setting Up a Win32 Server
      Section 2.5.  Directives
      Section 2.6.  Shared Objects
   
    Chapter 3.  Toward a Real Web Site
      Section 3.1.  More and Better Web Sites: site.simple
      Section 3.2.  Butterthlies, Inc., Gets Going
      Section 3.3.  Block Directives
      Section 3.4.  Other Directives
      Section 3.5.  HTTP Response Headers
      Section 3.6.  Restarts
      Section 3.7.  .htaccess
      Section 3.8.  CERN Metafiles
      Section 3.9.  Expirations
   
    Chapter 4.  Virtual Hosts
      Section 4.1.  Two Sites and Apache
      Section 4.2.  Virtual Hosts
      Section 4.3.  Two Copies of Apache
      Section 4.4.  Dynamically Configured Virtual Hosting
   
    Chapter 5.  Authentication
      Section 5.1.  Authentication Protocol
      Section 5.2.  Authentication Directives
      Section 5.3.  Passwords Under Unix
      Section 5.4.  Passwords Under Win32
      Section 5.5.  Passwords over the Web
      Section 5.6.  From the Client's Point of View
      Section 5.7.  CGI Scripts
      Section 5.8.  Variations on a Theme
      Section 5.9.  Order, Allow, and Deny
      Section 5.10.  DBM Files on Unix
      Section 5.11.  Digest Authentication
      Section 5.12.  Anonymous Access
      Section 5.13.  Experiments
      Section 5.14.  Automatic User Information
      Section 5.15.  Using .htaccess Files
      Section 5.16.  Overrides
   
    Chapter 6.  Content Description and Modification
      Section 6.1.  MIME Types
      Section 6.2.  Content Negotiation
      Section 6.3.  Language Negotiation
      Section 6.4.  Type Maps
      Section 6.5.  Browsers and HTTP 1.1
      Section 6.6.  Filters
   
    Chapter 7.  Indexing
      Section 7.1.  Making Better Indexes in Apache
      Section 7.2.  Making Our Own Indexes
      Section 7.3.  Imagemaps
      Section 7.4.  Image Map Directives
   
    Chapter 8.  Redirection
      Section 8.1.  Alias
      Section 8.2.  Rewrite
      Section 8.3.  Speling
   
    Chapter 9.  Proxying
      Section 9.1.  Security
      Section 9.2.  Proxy Directives
      Section 9.3.  Apparent Bug
      Section 9.4.  Performance
      Section 9.5.  Setup
   
    Chapter 10.  Logging
      Section 10.1.  Logging by Script and Database
      Section 10.2.  Apache's Logging Facilities
      Section 10.3.  Configuration Logging
      Section 10.4.  Status
   
    Chapter 11.  Security
      Section 11.1.  Internal and External Users
      Section 11.2.  Binary Signatures, Virtual Cash
      Section 11.3.  Certificates
      Section 11.4.  Firewalls
      Section 11.5.  Legal Issues
      Section 11.6.  Secure Sockets Layer (SSL)
      Section 11.7.  Apache's Security Precautions
      Section 11.8.  SSL Directives
      Section 11.9.  Cipher Suites
      Section 11.10.  Security in Real Life
      Section 11.11.  Future Directions
   
    Chapter 12.  Running a Big Web Site
      Section 12.1.  Machine Setup
      Section 12.2.  Server Security
      Section 12.3.  Managing a Big Site
      Section 12.4.  Supporting Software
      Section 12.5.  Scalability
      Section 12.6.  Load Balancing
   
    Chapter 13.  Building Applications
      Section 13.1.  Web Sites as Applications
      Section 13.2.  Providing Application Logic
      Section 13.3.  XML, XSLT, and Web Applications
   
    Chapter 14.  Server-Side Includes
      Section 14.1.  File Size
      Section 14.2.  File Modification Time
      Section 14.3.  Includes
      Section 14.4.  Execute CGI
      Section 14.5.  Echo
      Section 14.6.  Apache v2: SSI Filters
   
    Chapter 15.  PHP
      Section 15.1.  Installing PHP
      Section 15.2.  Site.php
   
    Chapter 16.  CGI and Perl
      Section 16.1.  The World of CGI
      Section 16.2.  Telling Apache About the Script
      Section 16.3.  Setting Environment Variables
      Section 16.4.  Cookies
      Section 16.5.  Script Directives
      Section 16.6.  suEXEC on Unix
      Section 16.7.  Handlers
      Section 16.8.  Actions
      Section 16.9.  Browsers
   
    Chapter 17.  mod_perl
      Section 17.1.  How mod_perl Works
      Section 17.2.  mod_perl Documentation
      Section 17.3.  Installing mod_perl — The Simple Way
      Section 17.4.  Modifying Your Scripts to Run Under mod_perl
      Section 17.5.  Global Variables
      Section 17.6.  Strict Pregame
      Section 17.7.  Loading Changes
      Section 17.8.  Opening and Closing Files
      Section 17.9.  Configuring Apache to Use mod_perl
   
    Chapter 18.  mod_jserv and Tomcat
      Section 18.1.  mod_jserv
      Section 18.2.  Tomcat
      Section 18.3.  Connecting Tomcat to Apache
   
    Chapter 19.  XML and Cocoon
      Section 19.1.  XML
      Section 19.2.  XML and Perl
      Section 19.3.  Cocoon
      Section 19.4.  Cocoon 1.8 and JServ
      Section 19.5.  Cocoon 2.0.3 and Tomcat
      Section 19.6.  Testing Cocoon
   
    Chapter 20.  The Apache API
      Section 20.1.  Documentation
      Section 20.2.  APR
      Section 20.3.  Pools
      Section 20.4.  Per-Server Configuration
      Section 20.5.  Per-Directory Configuration
      Section 20.6.  Per-Request Information
      Section 20.7.  Access to Configuration and Request Information
      Section 20.8.  Hooks, Optional Hooks, and Optional Functions
      Section 20.9.  Filters, Buckets, and Bucket Brigades
      Section 20.10.  Modules
   
    Chapter 21.  Writing Apache Modules
      Section 21.1.  Overview
      Section 21.2.  Status Codes
      Section 21.3.  The Module Structure
      Section 21.4.  A Complete Example
      Section 21.5.  General Hints
      Section 21.6.  Porting to Apache 2.0
   
    Appendix A.  The Apache 1.x API
      Section A.1.  Pools
      Section A.2.  Per-Server Configuration
      Section A.3.  Per-Directory Configuration
      Section A.4.  Per-Request Information
      Section A.5.  Access to Configuration and Request Information
      Section A.6.  Functions
   
    Colophon
    Index

Copyright

Copyright © O'Reilly & Associates, Inc.

Printed in the United States of America.

Published by O'Reilly & Associates, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.

O'Reilly & Associates books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://safari.oreilly.com). For more information, contact our corporate/institutional sales department: (800) 998-9938 or corporate@oreilly.com.

Nutshell Handbook, the Nutshell Handbook logo, and the O'Reilly logo are registered trademarks of O'Reilly & Associates, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O'Reilly & Associates, Inc. was aware of a trademark claim, the designations have been printed in caps or initial caps. The association between the image of Appaloosa horse and the topic of Apache is a trademark of O'Reilly & Associates, Inc.

While every precaution has been taken in the preparation of this book, the publisher and authors assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein.

Preface

Apache: The Definitive Guide, Third Edition, is principally about the Apache web-server software. We explain what a web server is and how it works, but our assumption is that most of our readers have used the World Wide Web and understand in practical terms how it works, and that they are now thinking about running their own servers and sites.

This book takes the reader through the process of acquiring, compiling, installing, configuring, and modifying Apache. We exercise most of the package's functions by showing a set of example sites that take a reasonably typical web business — in our case, a postcard publisher — through a process of development and increasing complexity. However, we have deliberately tried to make each site as simple as possible, focusing on the particular feature being described. Each site is pretty well self-contained, so that the reader can refer to it while following the text without having to disentangle the meat from extraneous vegetables. If desired, it is possible to install and run each site on a suitable system.

Perhaps it is worth saying what this book is not. It is not a manual, in the sense of formally documenting every command — such a manual exists on the Apache site and has been much improved with Versions 1.3 and 2.0; we assume that if you want to use Apache, you will download it and keep it at hand. Rather, if the manual is a road map that tells you how to get somewhere, this book tries to be a tourist guide that tells you why you might want to make the journey.

In passing, we do reproduce some sections of the web site manual simply to save the reader the trouble of looking up the formal definitions as she follows the argument. Occasionally, we found the manual text hard to follow and in those cases we have changed the wording slightly. We have also interspersed comments as seemed useful at the time.

This is not a book about HTML or creating web pages, or one about web security or even about running a web site. These are all complex subjects that should be either treated thoroughly or left alone. As a result, a webmaster's library might include books on the following topics:

  • The Web and how it works

  • HTML — formal definitions, what you can do with it

  • How to decide what sort of web site you want, how to organize it, and how to protect it

  • How to implement the site you want using one of the available servers (for instance, Apache)

  • Handbooks on Java, Perl, and other languages

  • Security

Apache: The Definitive Guide is just one of the six or so possible titles in the fourth category.

Apache is a versatile package and is becoming more versatile every day, so we have not tried to illustrate every possible combination of commands; that would require a book of a million pages or so. Rather, we have tried to suggest lines of development that a typical webmaster could follow once an understanding of the basic concepts is achieved.

We realized from our own experience that the hardest stage of learning how to use Apache in a real-life context is right at the beginning, where the novice webmaster often has to get Apache, a scripting language, and a database manager to collaborate. This can be very puzzling. In this new edition we have therefore included a good deal of new material which tries to take the reader up these conceptual precipices. Once the collaboration is working, development is much easier. These new chapters are not intended to be an experts' account of, say, the interaction between Apache, Perl, and MySQL — but a simple beginners' guide, explaining how to make these things work with Apache. In the process we make some comments, from our own experience, on the merits of the various software products from which the user has to choose.

As with the first and second editions, writing the book was something of a race with Apache's developers. We wanted to be ready as soon as Version 2 was stable, but not before the developers had finished adding new features.

In many of the examples that follow, the motivation for what we make Apache do is simple enough and requires little explanation (for example, the different index formats in Chapter 7). Elsewhere, we feel that the webmaster needs to be aware of wider issues (for instance, the security issues discussed in Chapter 11) before making sensible decisions about his site's configuration, and we have not hesitated to branch out to deal with them.

Who Wrote Apache, and Why?

Apache gets its name from the fact that it consists of some existing code plus some patches. The FAQFAQ is netspeak for Frequently Asked Questions. Most sites/subjects have an FAQ file that tells you what the thing is, why it is, and where it's going. It is perfectly reasonable for the newcomer to ask for the FAQ to look up anything new to her, and indeed this is a sensible thing to do, since it reduces the number of questions asked. Apache's FAQ can be found at http://www.apache.org/docs/FAQ.html. thinks that this is cute; others may think it's the sort of joke that gets programmers a bad name. A more responsible group thinks that Apache is an appropriate title because of the resourcefulness and adaptability of the American Indian tribe.

You have to understand that Apache is free to its users and is written by a team of volunteers who do not get paid for their work. Whether they decide to incorporate your or anyone else's ideas is entirely up to them. If you don't like what they do, feel free to collect a team and write your own web server or to adapt the existing Apache code — as many have.

The first web server was built by the British physicist Tim Berners-Lee at CERN, the European Centre for Nuclear Research at Geneva, Switzerland. The immediate ancestor of Apache was built by the U.S. government's NCSA, the National Center for Supercomputing Applications. Because this code was written with (American) taxpayers' money, it is available to all; you can, if you like, download the source code in C from http://www.ncsa.uiuc.edu, paying due attention to the license conditions.

There were those who thought that things could be done better, and in the FAQ for Apache (at http://www.apache.org ), we read:

...Apache was originally based on code and ideas found in the most popular HTTP server of the time, NCSA httpd 1.3 (early 1995).

That phrase "of the time" is nice. It usually refers to good times back in the 1700s or the early days of technology in the 1900s. But here it means back in the deliquescent bogs of a few years ago!

While the Apache site is open to all, Apache is written by an invited group of (we hope) reasonably good programmers. One of the authors of this book, Ben, is a member of this group.

Why do they bother? Why do these programmers, who presumably could be well paid for doing something else, sit up nights to work on Apache for our benefit? There is no such thing as a free lunch, so they do it for a number of typically human reasons. One might list, in no particular order:

  • They want to do something more interesting than their day job, which might be writing stock control packages for BigBins, Inc.

  • They want to be involved on the edge of what is happening. Working on a project like this is a pretty good way to keep up-to-date. After that comes consultancy on the next hot project.

  • The more worldly ones might remember how, back in the old days of 1995, quite a lot of the people working on the web server at NCSA left for a thing called Netscape and became, in the passage of the age, zillionaires.

  • It's fun. Developing good software is interesting and amusing, and you get to meet and work with other clever people.

  • They are not doing the bit that programmers hate: explaining to end users why their treasure isn't working and trying to fix it in 10 minutes flat. If you want support on Apache, you have to consult one of several commercial organizations (see Appendix A), who, quite properly, want to be paid for doing the work everyone loathes.

The Demonstration Code

The code for the demonstration web sites referred to throughout the book is available at http://www.oreilly.com/catalog/apache3/. It contains the requisite README file with installation instructions and other useful information. The contents of the download are organized into two directories:

install/

This directory contains scripts to install the sample sites:

install

Run this script to install the sites.

install.conf

Unix configuration file for install.

installwin.conf

Win32 configuration file for install.

sites/

This directory contains the sample sites used in the book.

Conventions Used in This Book

This section covers the various conventions used in this book.

Typographic Conventions

Constant width

Used for HTTP headers, status codes, MIME content types, directives in configuration files, commands, options/switches, functions, methods, variable names, and code within body text

Constant width bold

Used in code segments to indicate input to be typed in by the user

Constant width italic

Used for replaceable items in code and text

Italic

Used for filenames, pathnames, newsgroup names, Internet addresses (URLs), email addresses, variable names (except in examples), terms being introduced, program names, subroutine names, CGI script names, hostnames, usernames, and group names

Icons

figs/unix.gif

Text marked with this icon applies to the Unix version of Apache.

figs/win32.gif

Text marked with this icon applies to the Win32 version of Apache.

This icon designates a note relating to the surrounding text.

 

This icon designates a warning related to the surrounding text.

Pathnames

We use the text convention ... / to indicate your path to the demonstration sites, which may well be different from ours. For instance, on our Apache machine, we kept all the demonstration sites in the directory /usr/www. So, for example, our path would be /usr/www/site.simple. You might want to keep the sites somewhere other than /usr/www, so we refer to the path as ... /site.simple.

Don't type .../ into your computer. The attempt will upset it!

Directives

Apache is controlled through roughly 150 directives. For each directive, a formal explanation is given in the following format:

Directive  

Syntax
Where used
 

An explanation of the directive is located here.

So, for instance, we have the following directive:

ServerAdmin  

ServerAdmin email address
Server config, virtual host
 

ServerAdmin gives the email address for correspondence. It automatically generates error messages so the user has someone to write to in case of problems.

The Where used line explains the appropriate environment for the directive. This will become clearer later.

Organization of This Book

The chapters that follow and their contents are listed here:

Chapter 1

Covers web servers, how Apache works, TCP/IP, HTTP, hostnames, what a client does, what happens at the server end, choosing a Unix version, and compiling and installing Apache under both Unix and Win32.

Chapter 2

Discusses getting Apache to run, creating Apache users, runtime flags, permissions, and site.simple.

Chapter 3

Introduces a demonstration business, Butterthlies, Inc.; some HTML; default indexing of web pages; server housekeeping; and block directives.

Chapter 4

Explains how to connect web sites to network addresses, including the common case where more than one web site is hosted at a given network address.

Chapter 5

Explains controlling access, collecting information about clients, cookies, DBM control, digest authentication, and anonymous access.

Chapter 6

Covers content and language arbitration, type maps, and expiration of information.

Chapter 7

Discusses better indexes, index options, your own indexes, and imagemaps.

Chapter 8

Describes Alias, ScriptAlias, and the amazing Rewrite module.

Chapter 9

Covers remote proxies and proxy caching.

Chapter 10

Explains Apache's facilities for tracking activity on your web sites.

Chapter 11

Explores the many aspects of protecting an Apache server and its content from uninvited guests and intruders, including user validation, binary signatures, virtual cash, certificates, firewalls, packet filtering, secure sockets layer (SSL), legal issues, patent rights, national security, and Apache-SSL directives.

Chapter 12

Explains best practices for running large sites, including support for multiple content-creators, separating test sites from production sites, and integrating the site with other Internet technologies.

Chapter 13

Explores the options available for using Apache to host automatically changing content and interactive applications.

Chapter 14

Explains using runtime commands in your HTML and XSSI — a more secure server-side include.

Chapter 15

Explains how to install and configure PHP, with an example for connecting it to MySQL.

Chapter 16

Demonstrates aliases, logs, HTML forms, a shell script, a CGI script in Perl, environment variables, and using MySQL through Perl and Apache.

Chapter 17

Demonstrates how to install, configure, and use the mod_perl module for efficient processing of Perl applications.

Chapter 18

Explains how to install these two modules for supporting Java in the Apache environment.

Chapter 19

Explains how to use XML in conjunction with Apache and how to install and configure the Cocoon set of tools for presenting XML content.

Chapter 20

Explores the foundations of the Apache 2.0 API.

Chapter 21

Describes how to create Apache modules using the Apache 2.0 Apache Portable Runtime, including how to port modules from 1.3 to 2.0.

Appendix A

Describes pools; per-server, per-directory, and per-request information; functions; warnings; and parsing.

In addition, the Apache Quick Reference Card provides an outline of Apache 1.3 and 2.0 syntax.

Acknowledgments

First, thanks to Robert S. Thau, who gave the world the Apache API and the code that implements it, and to the Apache Group, who worked on it before and have worked on it since. Thanks to Eric Young and Tim Hudson for giving SSLeay to the Web.

Thanks to Bryan Blank, Aram Mirzadeh, Chuck Murcko, and Randy Terbush, who read early drafts of the first edition text and made many useful suggestions; and to John Ackermann, Geoff Meek, and Shane Owenby, who did the same for the second edition. For the third edition, we would like to thank our reviewers Evelyn Mitchell, Neil Neely, Lemon, Dirk-Willem van Gulik, Richard Sonnen, David Reid, Joe Johnston, Mike Stok, and Steven Champeon.

We would also like to offer special thanks to Andrew Ford for giving us permission to reprint his Apache Quick Reference Card.

Many thanks to Simon St.Laurent, our editor at O'Reilly, who patiently turned our text into a book — again. The two layers of blunders that remain are our own contribution.

And finally, thanks to Camilla von Massenbach and Barbara Laurie, who have continued to put up with us while we rewrote this book.