home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Apache The Definitive Guide, 3rd EditionApache: The Definitive GuideSearch this book

Chapter 1. Getting Started

Apache is the dominant web server on the Internet today, filling a key place in the infrastructure of the Internet. This chapter will explore what web servers do and why you might choose the Apache web server, examine how your web server fits into the rest of your network infrastructure, and conclude by showing you how to install Apache on a variety of different systems.

1.1. What Does a Web Server Do?

The whole business of a web server is to translate a URL either into a filename, and then send that file back over the Internet, or into a program name, and then run that program and send its output back. That is the meat of what it does: all the rest is trimming.

When you fire up your browser and connect to the URL of someone's home page — say the notional http://www.butterthlies.com/ we shall meet later on — you send a message across the Internet to the machine at that address. That machine, you hope, is up and running; its Internet connection is working; and it is ready to receive and act on your message.

URL stands for Uniform Resource Locator. A URL such as http://www.butterthlies.com/ comes in three parts:

<scheme>://<host>/<path>

So, in our example, < scheme> is http, meaning that the browser should use HTTP (Hypertext Transfer Protocol); <host> is www.butterthlies.com ; and <path> is /, traditionally meaning the top page of the host.[2] The <host> may contain either an IP address or a name, which the browser will then convert to an IP address. Using HTTP 1.1, your browser might send the following request to the computer at that IP address:

[2]Note that since a URL has no predefined meaning, this really is just a tradition, though a pretty well entrenched one in this case.

GET / HTTP/1.1
Host: www.butterthlies.com

The request arrives at port 80 (the default HTTP port) on the host www.butterthlies.com. The message is again in four parts: a method (an HTTP method, not a URL method), that in this case is GET, but could equally be PUT, POST, DELETE, or CONNECT; the Uniform Resource Identifier (URI) /; the version of the protocol we are using; and a series of headers that modify the request (in this case, a Host header, which is used for name-based virtual hosting: see Chapter 4). It is then up to the web server running on that host to make something of this message.

The host machine may be a whole cluster of hypercomputers costing an oil sheik's ransom or just a humble PC. In either case, it had better be running a web server, a program that listens to the network and accepts and acts on this sort of message.

1.1.1. Criteria for Choosing a Web Server

What do we want a web server to do? It should:

1.1.2. Why Apache?

Apache has more than twice the market share than its next competitor, Microsoft. This is not just because it is freeware and costs nothing. It is also open source,[4] which means that the source code can be examined by anyone so inclined. If there are errors in it, thousands of pairs of eyes scan it for mistakes. Because of this constant examination by outsiders, it is substantially more reliable[5] than any commercial software product that can only rely on the scrutiny of a closed list of employees. This is particularly important in the field of security, where apparently trivial mistakes can have horrible consequences.

[4]For more on the open source movement, see Open Sources: Voices from the Open Source Revolution (O'Reilly & Associates, 1999).

[5]Netcraft also surveys the uptime of various sites. At the time of writing, the longest running site was http://wwwprod1.telia.com, which had been up for 1,386 days.

Anyone is free to take the source code and change it to make Apache do something different. In particular, Apache is extensible through an established technology for writing new Modules (described in more detail in Chapter 20), which many people have used to introduce new features.

Apache suits sites of all sizes and types. You can run a single personal page on it or an enormous site serving millions of regular visitors. You can use it to serve static files over the Web or as a frontend to applications that generate customized responses for visitors. Some developers use Apache as a test-server on their desktops, writing and trying code in a local environment before publishing it to a wider audience. Apache can be an appropriate solution for practically any situation involving the HTTP protocol.

Apache is freeware . The intending user downloads the source code and compiles it (under Unix) or downloads the executable (for Windows) from http://www.apache.org or a suitable mirror site. Although it sounds difficult to download the source code and configure and compile it, it only takes about 20 minutes and is well worth the trouble. Many operating system vendors now bundle appropriate Apache binaries.

The result of Apache's many advantages is clear. There are about 75 web-server software packages on the market. Their relative popularity is charted every month by Netcraft (http://www.netcraft.com). In July 2002, their June survey of active sites, shown in Table 1-1, had found that Apache ran nearly two-thirds of the sites they surveyed (continuing a trend that has been apparent for several years).

Table 1-1. Active sites counted by Netcraft survey, June 2002

Developer

May 2002

Percent

June 2002

Percent

Apache

10411000

65.11

10964734

64.42

Microsoft

4121697

25.78

4243719

24.93

iPlanet

247051

1.55

281681

1.66

Zeus

214498

1.34

227857

1.34



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.