Chapter 2. HTTP Servlet Basics

This chapter provides a quick introduction to some of the things an HTTP servlet can do. For example, an HTTP servlet can generate an HTML page, either when the servlet is accessed explicitly by name, by following a hypertext link, or as the result of a form submission. An HTTP servlet can also be embedded inside an HTML page, where it functions as a server-side include. Servlets can be chained together to produce complex effects--one common use of this technique is for filtering content. Finally, snippets of servlet code can be embedded directly in HTML pages using a new technique called JavaServer Pages.

Although the code for each of the examples in this chapter is available for download (as described in the the Preface), we would suggest that for these first examples you deny yourself the convenience of the Internet and type in the examples. It should help the concepts seep into your brain.

Don't be alarmed if we seem to skim lightly over some topics in this chapter. Servlets are powerful and, at times, complicated. The point here is to give you a general overview of how things work, before jumping in and overwhelming you with all of the details. By the end of this book, we promise that you'll be able to write servlets that do everything but make tea.

2.1. HTTP Basics

Before we can even show you a simple HTTP servlet, we need to make sure that you have a basic understanding of how the protocol behind the Web, HTTP, works. If you're an experienced CGI programmer (or if you've done any serious server-side web programming), you can safely skip this section. Better yet, you might skim it to refresh your memory about the finer points of the GET and POST methods. If you are new to the world of server-side web programming, however, you should read this material carefully, as the rest of the book is going to assume that you understand HTTP. For a more thorough discussion of HTTP and its methods, see Web Client Programmingby Clinton Wong (O'Reilly).

2.1.1. Requests, Responses, and Headers

HTTP is a simple, stateless protocol. A client, such as a web browser, makes a request, the web server responds, and the transaction is done. When the client sends a request, the first thing it specifies is an HTTP command, called a method , that tells the server the type of action it wants performed. This first line of the request also specifies the address of a document (a URL) and the version of the HTTP protocol it is using. For example:

GET /intro.html HTTP/1.0

This request uses the GET method to ask for the document named intro.html, using HTTP Version 1.0. After sending the request, the client can send optional header information to tell the server extra information about the request, such as what software the client is running and what content types it understands. This information doesn't directly pertain to what was requested, but it could be used by the server in generating its response. Here are some sample request headers:

User-Agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 95)
Accept: image/gif, image/jpeg, text/*, */*

The User-Agent header provides information about the client software, while the Accept header specifies the media (MIME) types that the client prefers to accept. (We'll talk more about request headers in the context of servlets in Chapter 4, "Retrieving Information".) After the headers, the client sends a blank line, to indicate the end of the header section. The client can also send additional data, if appropriate for the method being used, as it is with the POST method that we'll discuss shortly. If the request doesn't send any data, it ends with an empty line.

After the client sends the request, the server processes it and sends back a response. The first line of the response is a status line that specifies the version of the HTTP protocol the server is using, a status code, and a description of the status code. For example:

HTTP/1.0 200 OK

This status line includes a status code of 200, which indicates that the request was successful, hence the description "OK". Another common status code is 404, with the description "Not Found"--as you can guess, this means that the requested document was not found. Chapter 5, "Sending HTML Information", discusses common status codes and how you can use them in servlets, while Appendix C, "HTTP Status Codes", provides a complete list of HTTP status codes.

After the status line, the server sends response headers that tell the client things like what software the server is running and the content type of the server's response. For example:

Date: Saturday, 23-May-98 03:25:12 GMT
Server: JavaWebServer/1.1.1
MIME-version: 1.0
Content-type: text/html
Content-length: 1029
Last-modified: Thursday, 7-May-98 12:15:35 GMT

The Server header provides information about the server software, while the Content-type header specifies the MIME type of the data included with the response. (We'll also talk more about response headers in Chapter 5, "Sending HTML Information".) The server sends a blank line after the headers, to conclude the header section. If the request was successful, the requested data is then sent as part of the response. Otherwise, the response may contain human-readable data that explains why the server couldn't fulfill the request.

2.1.2. GET and POST

When a client connects to a server and makes an HTTP request, the request can be of several different types, called methods. The most frequently used methods are GET and POST. Put simply, the GET method is designed for getting information (a document, a chart, or the results from a database query), while the POST method is designed for posting information (a credit card number, some new chart data, or information that is to be stored in a database). To use a bulletin board analogy, GET is for reading and POST is for tacking up new material.

The GET method, although it's designed for reading information, can include as part of the request some of its own information that better describes what to get--such as an x, y scale for a dynamically created chart. This information is passed as a sequence of characters appended to the request URL in what's called a query string. Placing the extra information in the URL in this way allows the page to be bookmarked or emailed like any other. Because GET requests theoretically shouldn't need to send large amounts of information, some servers limit the length of URLs and query strings to about 240 characters.

The POST method uses a different technique to send information to the server because in some cases it may need to send megabytes of information. A POST request passes all its data, of unlimited length, directly over the socket connection as part of its HTTP request body. The exchange is invisible to the client. The URL doesn't change at all. Consequently, POST requests cannot be bookmarked or emailed or, in some cases, even reloaded. That's by design--information sent to the server, such as your credit card number, should be sent only once.

In practice, the use of GET and POST has strayed from the original intent. It's common for long parameterized requests for information to use POST instead of GET to work around problems with overly-long URLs. It's also common for simple forms that upload information to use GET because, well--why not, it works! Generally, this isn't much of a problem. Just remember that GET requests, because they can be bookmarked so easily, should not be allowed to cause damage for which the client could be held responsible. In other words, GET requests should not be used to place an order, update a database, or take an explicit client action in any way.

2.1.3. Other Methods

In addition to GET and POST, there are several other lesser-used HTTP methods. There's the HEAD method, which is sent by a client when it wants to see only the headers of the response, to determine the document's size, modification time, or general availability. There's also PUT, to place documents directly on the server, and DELETE, to do just the opposite. These last two aren't widely supported due to complicated policy issues. The TRACE method is used as a debugging aid--it returns to the client the exact contents of its request. Finally, the OPTIONS method can be used to ask the server which methods it supports or what options are available for a particular resource on the server.

Chapter 2. HTTP Servlet Basics

Contents:

2.1. HTTP Basics

2.1.1. Requests, Responses, and Headers

2.1.2. GET and POST

2.1.3. Other Methods