|
|
Apache The Definitive Guide,
3rd Edition |
By
Ben Laurie,
Peter Laurie |
|
|
Publisher |
: O'Reilly |
Pub Date |
: December 2002 |
ISBN |
: 0-596-00203-3 |
Updated to cover the changes in Apache's latest release,
2.0, as well as Apache 1.3, this useful guide discusses how to obtain, set
up, secure, modify, and troubleshoot the Apache software on both Unix and
Windows systems. In addition to covering the installation and configuration
of mod_perl and Tomcat, the book examines PHP, Cocoon, and other new
technologies that are associated with the Apache web server. 777 |
|
|
Copyright |
|
|
Preface |
|
|
|
Who Wrote Apache, and Why? |
|
|
|
The Demonstration Code |
|
|
|
Conventions Used in This Book |
|
|
|
Organization of This Book |
|
|
|
Acknowledgments |
|
|
|
Chapter 1. Getting Started |
|
|
|
Section 1.1. What Does a Web Server Do? |
|
|
|
Section 1.2. How Apache Works |
|
|
|
Section 1.3. Apache and Networking |
|
|
|
Section 1.4. How HTTP Clients Work |
|
|
|
Section 1.5. What Happens at the Server End? |
|
|
|
Section 1.6. Planning the Apache Installation |
|
|
|
Section 1.7. Windows? |
|
|
|
Section 1.8. Which Apache? |
|
|
|
Section 1.9. Installing Apache |
|
|
|
Section 1.10. Building Apache 1.3.X Under Unix |
|
|
|
Section 1.11. New Features in Apache v2 |
|
|
|
Section 1.12. Making and Installing Apache v2 Under Unix |
|
|
|
Section 1.13. Apache Under Windows |
|
|
|
Chapter 2. Configuring Apache: The First Steps |
|
|
|
Section 2.1. What's Behind an Apache Web Site? |
|
|
|
Section 2.2. site.toddle |
|
|
|
Section 2.3. Setting Up a Unix Server |
|
|
|
Section 2.4. Setting Up a Win32 Server |
|
|
|
Section 2.5. Directives |
|
|
|
Section 2.6. Shared Objects |
|
|
|
Chapter 3. Toward a Real Web Site |
|
|
|
Section 3.1. More and Better Web Sites: site.simple |
|
|
|
Section 3.2. Butterthlies, Inc., Gets Going |
|
|
|
Section 3.3. Block Directives |
|
|
|
Section 3.4. Other Directives |
|
|
|
Section 3.5. HTTP Response Headers |
|
|
|
Section 3.6. Restarts |
|
|
|
Section 3.7. .htaccess |
|
|
|
Section 3.8. CERN Metafiles |
|
|
|
Section 3.9. Expirations |
|
|
|
Chapter 4. Virtual Hosts |
|
|
|
Section 4.1. Two Sites and Apache |
|
|
|
Section 4.2. Virtual Hosts |
|
|
|
Section 4.3. Two Copies of Apache |
|
|
|
Section 4.4. Dynamically Configured Virtual Hosting |
|
|
|
Chapter 5. Authentication |
|
|
|
Section 5.1. Authentication Protocol |
|
|
|
Section 5.2. Authentication Directives |
|
|
|
Section 5.3. Passwords Under Unix |
|
|
|
Section 5.4. Passwords Under Win32 |
|
|
|
Section 5.5. Passwords over the Web |
|
|
|
Section 5.6. From the Client's Point of View |
|
|
|
Section 5.7. CGI Scripts |
|
|
|
Section 5.8. Variations on a Theme |
|
|
|
Section 5.9. Order, Allow, and Deny |
|
|
|
Section 5.10. DBM Files on Unix |
|
|
|
Section 5.11. Digest Authentication |
|
|
|
Section 5.12. Anonymous Access |
|
|
|
Section 5.13. Experiments |
|
|
|
Section 5.14. Automatic User Information |
|
|
|
Section 5.15. Using .htaccess Files |
|
|
|
Section 5.16. Overrides |
|
|
|
Chapter 6. Content Description and Modification |
|
|
|
Section 6.1. MIME Types |
|
|
|
Section 6.2. Content Negotiation |
|
|
|
Section 6.3. Language Negotiation |
|
|
|
Section 6.4. Type Maps |
|
|
|
Section 6.5. Browsers and HTTP 1.1 |
|
|
|
Section 6.6. Filters |
|
|
|
Chapter 7. Indexing |
|
|
|
Section 7.1. Making Better Indexes in Apache |
|
|
|
Section 7.2. Making Our Own Indexes |
|
|
|
Section 7.3. Imagemaps |
|
|
|
Section 7.4. Image Map Directives |
|
|
|
Chapter 8. Redirection |
|
|
|
Section 8.1. Alias |
|
|
|
Section 8.2. Rewrite |
|
|
|
Section 8.3. Speling |
|
|
|
Chapter 9. Proxying |
|
|
|
Section 9.1. Security |
|
|
|
Section 9.2. Proxy Directives |
|
|
|
Section 9.3. Apparent Bug |
|
|
|
Section 9.4. Performance |
|
|
|
Section 9.5. Setup |
|
|
|
Chapter 10. Logging |
|
|
|
Section 10.1. Logging by Script and Database |
|
|
|
Section 10.2. Apache's Logging Facilities |
|
|
|
Section 10.3. Configuration Logging |
|
|
|
Section 10.4. Status |
|
|
|
Chapter 11. Security |
|
|
|
Section 11.1. Internal and External Users |
|
|
|
Section 11.2. Binary Signatures, Virtual Cash |
|
|
|
Section 11.3. Certificates |
|
|
|
Section 11.4. Firewalls |
|
|
|
Section 11.5. Legal Issues |
|
|
|
Section 11.6. Secure Sockets Layer (SSL) |
|
|
|
Section 11.7. Apache's Security Precautions |
|
|
|
Section 11.8. SSL Directives |
|
|
|
Section 11.9. Cipher Suites |
|
|
|
Section 11.10. Security in Real Life |
|
|
|
Section 11.11. Future Directions |
|
|
|
Chapter 12. Running a Big Web Site |
|
|
|
Section 12.1. Machine Setup |
|
|
|
Section 12.2. Server Security |
|
|
|
Section 12.3. Managing a Big Site |
|
|
|
Section 12.4. Supporting Software |
|
|
|
Section 12.5. Scalability |
|
|
|
Section 12.6. Load Balancing |
|
|
|
Chapter 13. Building Applications |
|
|
|
Section 13.1. Web Sites as Applications |
|
|
|
Section 13.2. Providing Application Logic |
|
|
|
Section 13.3. XML, XSLT, and Web Applications |
|
|
|
Chapter 14. Server-Side Includes |
|
|
|
Section 14.1. File Size |
|
|
|
Section 14.2. File Modification Time |
|
|
|
Section 14.3. Includes |
|
|
|
Section 14.4. Execute CGI |
|
|
|
Section 14.5. Echo |
|
|
|
Section 14.6. Apache v2: SSI Filters |
|
|
|
Chapter 15. PHP |
|
|
|
Section 15.1. Installing PHP |
|
|
|
Section 15.2. Site.php |
|
|
|
Chapter 16. CGI and Perl |
|
|
|
Section 16.1. The World of CGI |
|
|
|
Section 16.2. Telling Apache About the Script |
|
|
|
Section 16.3. Setting Environment Variables |
|
|
|
Section 16.4. Cookies |
|
|
|
Section 16.5. Script Directives |
|
|
|
Section 16.6. suEXEC on Unix |
|
|
|
Section 16.7. Handlers |
|
|
|
Section 16.8. Actions |
|
|
|
Section 16.9. Browsers |
|
|
|
Chapter 17. mod_perl |
|
|
|
Section 17.1. How mod_perl Works |
|
|
|
Section 17.2. mod_perl Documentation |
|
|
|
Section 17.3. Installing mod_perl — The Simple Way |
|
|
|
Section 17.4. Modifying Your Scripts to Run Under mod_perl |
|
|
|
Section 17.5. Global Variables |
|
|
|
Section 17.6. Strict Pregame |
|
|
|
Section 17.7. Loading Changes |
|
|
|
Section 17.8. Opening and Closing Files |
|
|
|
Section 17.9. Configuring Apache to Use mod_perl |
|
|
|
Chapter 18. mod_jserv and Tomcat |
|
|
|
Section 18.1. mod_jserv |
|
|
|
Section 18.2. Tomcat |
|
|
|
Section 18.3. Connecting Tomcat to Apache |
|
|
|
Chapter 19. XML and Cocoon |
|
|
|
Section 19.1. XML |
|
|
|
Section 19.2. XML and Perl |
|
|
|
Section 19.3. Cocoon |
|
|
|
Section 19.4. Cocoon 1.8 and JServ |
|
|
|
Section 19.5. Cocoon 2.0.3 and Tomcat |
|
|
|
Section 19.6. Testing Cocoon |
|
|
|
Chapter 20. The Apache API |
|
|
|
Section 20.1. Documentation |
|
|
|
Section 20.2. APR |
|
|
|
Section 20.3. Pools |
|
|
|
Section 20.4. Per-Server Configuration |
|
|
|
Section 20.5. Per-Directory Configuration |
|
|
|
Section 20.6. Per-Request Information |
|
|
|
Section 20.7. Access to Configuration and Request Information |
|
|
|
Section 20.8. Hooks, Optional Hooks, and Optional Functions |
|
|
|
Section 20.9. Filters, Buckets, and Bucket Brigades |
|
|
|
Section 20.10. Modules |
|
|
|
Chapter 21. Writing Apache Modules |
|
|
|
Section 21.1. Overview |
|
|
|
Section 21.2. Status Codes |
|
|
|
Section 21.3. The Module Structure |
|
|
|
Section 21.4. A Complete Example |
|
|
|
Section 21.5. General Hints |
|
|
|
Section 21.6. Porting to Apache 2.0 |
|
|
|
Appendix A. The Apache 1.x API |
|
|
|
Section A.1. Pools |
|
|
|
Section A.2. Per-Server Configuration |
|
|
|
Section A.3. Per-Directory Configuration |
|
|
|
Section A.4. Per-Request Information |
|
|
|
Section A.5. Access to Configuration and Request Information |
|
|
|
Section A.6. Functions |
|
|
|
Colophon |
|
|
Index |
Copyright
Copyright © O'Reilly & Associates, Inc.
Printed in the United States of America.
Published by O'Reilly & Associates, Inc., 1005 Gravenstein
Highway North, Sebastopol, CA 95472.
O'Reilly & Associates books may be purchased for educational,
business, or sales promotional use. Online editions are also available for most
titles (http://safari.oreilly.com).
For more information, contact our corporate/institutional sales department:
(800) 998-9938 or
corporate@oreilly.com.
Nutshell Handbook, the Nutshell Handbook logo, and the
O'Reilly logo are registered trademarks of O'Reilly & Associates, Inc. Many of
the designations used by manufacturers and sellers to distinguish their products
are claimed as trademarks. Where those designations appear in this book, and
O'Reilly & Associates, Inc. was aware of a trademark claim, the designations
have been printed in caps or initial caps. The association between the image of
Appaloosa horse and the topic of Apache is a trademark of O'Reilly & Associates,
Inc.
While every precaution has been taken in the preparation of
this book, the publisher and authors assume no responsibility for errors or
omissions, or for damages resulting from the use of the information contained
herein.
Preface
Apache: The Definitive Guide,
Third Edition, is principally about the Apache web-server software. We explain
what a web server is and how it works, but our assumption is that most of our
readers have used the World Wide Web and understand in practical terms how it
works, and that they are now thinking about running their own servers and sites.
This book takes the reader through the process of acquiring,
compiling, installing, configuring, and modifying Apache. We exercise most of
the package's functions by showing a set of example sites that take a reasonably
typical web business — in our case, a postcard publisher — through a process of
development and increasing complexity. However, we have deliberately tried to
make each site as simple as possible, focusing on the particular feature being
described. Each site is pretty well self-contained, so that the reader can refer
to it while following the text without having to disentangle the meat from
extraneous vegetables. If desired, it is possible to install and run each site
on a suitable system.
Perhaps it is worth saying what this book is
not. It is not a manual, in the sense of
formally documenting every command — such a manual exists on the Apache site and
has been much improved with Versions 1.3 and 2.0; we assume that if you want to
use Apache, you will download it and keep it at hand. Rather, if the manual is a
road map that tells you how to get somewhere, this book tries to be a tourist
guide that tells you why you might want to make the journey.
In passing, we do reproduce some sections of the web site
manual simply to save the reader the trouble of looking up the formal
definitions as she follows the argument. Occasionally, we found the manual text
hard to follow and in those cases we have changed the wording slightly. We have
also interspersed comments as seemed useful at the time.
This is not a book about
HTML or creating web pages, or one about web security or even about running a
web site. These are all complex subjects that should be either treated
thoroughly or left alone. As a result, a webmaster's library might include books
on the following topics:
-
The Web and how it works
-
HTML — formal definitions, what you can do with it
-
How to decide what sort of web site you want, how to
organize it, and how to protect it
-
How to implement the site you want using one of the
available servers (for instance, Apache)
-
Handbooks on Java, Perl, and other languages
-
Security
Apache: The Definitive Guide
is just one of the six or so possible titles in the fourth category.
Apache is a versatile package and is becoming more versatile
every day, so we have not tried to illustrate every possible combination of
commands; that would require a book of a million pages or so. Rather, we have
tried to suggest lines of development that a typical webmaster could follow once
an understanding of the basic concepts is achieved.
We realized from our own experience that the hardest stage of
learning how to use Apache in a real-life context is right at the beginning,
where the novice webmaster often has to get Apache, a scripting language, and a
database manager to collaborate. This can be very puzzling. In this new edition
we have therefore included a good deal of new material which tries to take the
reader up these conceptual precipices. Once the collaboration is working,
development is much easier. These new chapters are not intended to be an
experts' account of, say, the interaction between Apache, Perl, and MySQL — but
a simple beginners' guide, explaining how to make these things work with Apache.
In the process we make some comments, from our own experience, on the merits of
the various software products from which the user has to choose.
As with the first and second editions, writing the book was
something of a race with Apache's developers. We wanted to be ready as soon as
Version 2 was stable, but not before the developers had finished adding new
features.
In many of the examples that follow, the motivation for what
we make Apache do is simple enough and requires little explanation (for example,
the different index formats in Chapter 7). Elsewhere, we feel that the webmaster needs to be aware of wider
issues (for instance, the security issues discussed in
Chapter 11) before making sensible decisions about his site's configuration,
and we have not hesitated to branch out to deal with them.
Who Wrote Apache, and Why?
Apache gets its name from the fact that it consists of some
existing code plus some patches. The FAQFAQ is netspeak for Frequently Asked Questions. Most sites/subjects have an FAQ
file that tells you what the thing is, why it is, and where it's going. It is
perfectly reasonable for the newcomer to ask for the FAQ to look up anything new
to her, and indeed this is a sensible thing to do, since it reduces the number
of questions asked. Apache's FAQ can be found at
http://www.apache.org/docs/FAQ.html.
thinks that this is cute; others may think it's the sort of joke that gets
programmers a bad name. A more responsible group thinks that Apache is an
appropriate title because of the resourcefulness and adaptability of the
American Indian tribe.
You have to understand that Apache is free to its users and
is written by a team of volunteers who do not get paid for their work. Whether
they decide to incorporate your or anyone else's ideas is entirely up to them.
If you don't like what they do, feel free to collect a team and write your own
web server or to adapt the existing Apache code — as many have.
The first web server was built by the British physicist Tim
Berners-Lee at CERN, the European Centre for Nuclear Research at Geneva,
Switzerland. The immediate ancestor of Apache was built by the U.S. government's
NCSA, the National Center for Supercomputing Applications. Because this code was
written with (American) taxpayers' money, it is available to all; you can, if
you like, download the source code in C from
http://www.ncsa.uiuc.edu, paying due attention to the license conditions.
There were those who thought that things could be done
better, and in the FAQ for Apache (at
http://www.apache.org ), we read:
...Apache was originally based on code and ideas found in
the most popular HTTP server of the time, NCSA httpd 1.3 (early 1995).
That phrase "of the time" is nice. It usually refers to good
times back in the 1700s or the early days of technology in the 1900s. But here
it means back in the deliquescent bogs of a few years ago!
While the Apache site is open to all, Apache is written by an
invited group of (we hope) reasonably good programmers. One of the authors of
this book, Ben, is a member of this group.
Why do they bother? Why do these programmers, who presumably
could be well paid for doing something else, sit up nights to work on Apache for
our benefit? There is no such thing as a free lunch, so they do it for a number
of typically human reasons. One might list, in no particular order:
-
They want to do something more interesting than their day
job, which might be writing stock control packages for BigBins, Inc.
-
They want to be involved on the edge of what is happening.
Working on a project like this is a pretty good way to keep up-to-date. After
that comes consultancy on the next hot project.
-
The more worldly ones might remember how, back in the old
days of 1995, quite a lot of the people working on the web server at NCSA left
for a thing called Netscape and became, in the passage of the age,
zillionaires.
-
It's fun. Developing good software is interesting and
amusing, and you get to meet and work with other clever people.
-
They are not doing the bit that programmers hate:
explaining to end users why their treasure isn't working and trying to fix it
in 10 minutes flat. If you want support on Apache, you have to consult one of
several commercial organizations (see
Appendix A), who, quite properly, want to be paid for doing the work
everyone loathes.
The Demonstration Code
The code for the demonstration web sites referred to
throughout the book is available at
http://www.oreilly.com/catalog/apache3/. It contains the requisite README
file with installation instructions and other useful information. The contents
of the download are organized into two directories:
- install/
-
This directory contains scripts to install the sample
sites:
- install
-
Run this script to install the sites.
- install.conf
-
Unix configuration file for
install.
- installwin.conf
-
Win32 configuration file for
install.
- sites/
-
This directory contains the sample sites used in the book.
Conventions Used in This Book
This section covers the various conventions used in this
book.
Typographic Conventions
- Constant width
-
Used for HTTP headers, status codes, MIME content types,
directives in configuration files, commands, options/switches, functions,
methods, variable names, and code within body text
Constant width bold
-
Used in code segments to indicate input to be typed in by
the user
- Constant width italic
-
Used for replaceable items in code and text
- Italic
-
Used for filenames, pathnames, newsgroup names, Internet
addresses (URLs), email addresses, variable names (except in examples), terms
being introduced, program names, subroutine names, CGI script names,
hostnames, usernames, and group names
Icons
Text marked with this icon applies to the Unix version of
Apache.
Text marked with this icon applies to the Win32 version of
Apache.
|
This icon designates a note relating to the
surrounding text. |
|
|
This icon designates a warning related to the
surrounding text. |
|
Pathnames
We use the text convention ... / to indicate your path
to the demonstration sites, which may well be different from ours. For instance,
on our Apache machine, we kept all the demonstration sites in the directory /usr/www.
So, for example, our path would be /usr/www/site.simple. You might
want to keep the sites somewhere other than /usr/www, so we refer to the
path as ... /site.simple.
Don't type .../ into your
computer. The attempt will upset it!
Directives
Apache is controlled through roughly 150 directives. For each
directive, a formal explanation is given in the following format:
An explanation of the directive is located here.
So, for instance, we have the following directive:
ServerAdmin email address
Server config, virtual host
|
|
ServerAdmin gives the email address for
correspondence. It automatically generates error messages so the user has
someone to write to in case of problems.
The Where used line explains the appropriate
environment for the directive. This will become clearer later.
Organization of This Book
The chapters that follow and their contents are listed here:
-
Chapter 1
-
Covers web servers, how Apache works, TCP/IP, HTTP,
hostnames, what a client does, what happens at the server end, choosing a Unix
version, and compiling and installing Apache under both Unix and Win32.
-
Chapter 2
-
Discusses getting Apache to run, creating Apache users,
runtime flags, permissions, and site.simple.
-
Chapter 3
-
Introduces a demonstration business, Butterthlies, Inc.;
some HTML; default indexing of web pages; server housekeeping; and block
directives.
-
Chapter 4
-
Explains how to connect web sites to network addresses,
including the common case where more than one web site is hosted at a given
network address.
-
Chapter 5
-
Explains controlling access, collecting information about
clients, cookies, DBM control, digest authentication, and anonymous access.
-
Chapter 6
-
Covers content and language arbitration, type maps, and
expiration of information.
-
Chapter 7
-
Discusses better indexes, index options, your own indexes,
and imagemaps.
-
Chapter 8
-
Describes Alias, ScriptAlias, and the
amazing Rewrite module.
-
Chapter 9
-
Covers remote proxies and proxy caching.
-
Chapter 10
-
Explains Apache's facilities for tracking activity on your
web sites.
-
Chapter 11
-
Explores the many aspects of protecting an Apache server
and its content from uninvited guests and intruders, including user
validation, binary signatures, virtual cash, certificates, firewalls, packet
filtering, secure sockets layer (SSL), legal issues, patent rights, national
security, and Apache-SSL directives.
-
Chapter 12
-
Explains best practices for running large sites, including
support for multiple content-creators, separating test sites from production
sites, and integrating the site with other Internet technologies.
-
Chapter 13
-
Explores the options available for using Apache to host
automatically changing content and interactive applications.
-
Chapter 14
-
Explains using runtime commands in your HTML and XSSI — a
more secure server-side include.
-
Chapter 15
-
Explains how to install and configure PHP, with an example
for connecting it to MySQL.
-
Chapter 16
-
Demonstrates aliases, logs, HTML forms, a shell script, a
CGI script in Perl, environment variables, and using MySQL through Perl and
Apache.
-
Chapter 17
-
Demonstrates how to install, configure, and use the
mod_perl module for efficient processing of Perl applications.
-
Chapter 18
-
Explains how to install these two modules for supporting
Java in the Apache environment.
-
Chapter 19
-
Explains how to use XML in conjunction with Apache and how
to install and configure the Cocoon set of tools for presenting XML content.
-
Chapter 20
-
Explores the foundations of the Apache 2.0 API.
-
Chapter 21
-
Describes how to create Apache modules using the Apache 2.0
Apache Portable Runtime, including how to port modules from 1.3 to 2.0.
-
Appendix A
-
Describes pools; per-server, per-directory, and per-request
information; functions; warnings; and parsing.
In addition, the Apache Quick Reference Card provides an
outline of Apache 1.3 and 2.0 syntax.
Acknowledgments
First, thanks to Robert S. Thau, who gave the world the
Apache API and the code that implements it, and to the Apache Group, who worked
on it before and have worked on it since. Thanks to Eric Young and Tim Hudson
for giving SSLeay to the Web.
Thanks to Bryan Blank, Aram Mirzadeh, Chuck Murcko, and Randy
Terbush, who read early drafts of the first edition text and made many useful
suggestions; and to John Ackermann, Geoff Meek, and Shane Owenby, who did the
same for the second edition. For the third edition, we would like to thank our
reviewers Evelyn Mitchell, Neil Neely, Lemon, Dirk-Willem van Gulik, Richard
Sonnen, David Reid, Joe Johnston, Mike Stok, and Steven Champeon.
We would also like to offer special thanks to Andrew Ford for
giving us permission to reprint his Apache Quick Reference Card.
Many thanks to Simon St.Laurent, our editor at O'Reilly, who
patiently turned our text into a book — again. The two layers of blunders that
remain are our own contribution.
And finally, thanks to Camilla von Massenbach and Barbara
Laurie, who have continued to put up with us while we rewrote this book.
|