Chapter 9. Database Connectivity

It's hard to find a professional web site today that doesn't have some sort of database connectivity. Webmasters have hooked online front ends to all manner of legacy systems, including package tracking and directory databases, as well as many newer systems like online messaging, storefronts, and search engines. But web-database interaction comes with a price: database-backed web sites can be difficult to develop and can often exact heavy performance penalties. Still, for many web sites, especially intranet applications, database connectivity is just too useful to let go. More and more, databases are driving the Web.

This chapter introduces relational databases, the Structured Query Language (SQL) used to manipulate those databases, and the Java database connectivity (JDBC) API itself. Servlets, with their enduring life cycle, and JDBC, a well-defined database-independent database connectivity API, are an elegant and efficient solution for webmasters who need to hook their web sites to back-end databases. In fact, both of your authors started working with servlets specifically because of this efficiency and elegance. Although elsewhere in the book we have assumed that you are familiar with Java, this chapter breaks that assumption and begins with a quick course in JDBC.

The biggest advantage for servlets with regard to database connectivity is that the servlet life cycle (explained in depth in Chapter 3, "The Servlet Life Cycle") allows servlets to maintain open database connections. An existing connection can trim several seconds from a response time, compared to a CGI script that has to reestablish its connection for every invocation. Exactly how to maintain the database connection depends on the task at hand, and this chapter demonstrates several techniques appropriate for different tasks.

Another advantage of servlets over CGI and many other technologies is that JDBC is database-independent. A servlet written to access a Sybase database can, with a two-line modification or a change in a properties file, begin accessing an Oracle database (assuming none of the database calls it makes are vendor-specific). In fact, you should notice that the examples in this chapter are written to access a variety of different databases, including ODBC data sources (such as Microsoft Access), Oracle, and Sybase.

Servlets in the Middle Tier

One common place for servlets, especially servlets that access a database, is in what's called the middle tier. A middle tier is something that helps connect one endpoint to another (an applet to a database, for example) and along the way adds a little something of its own.

The most compelling reason for putting a middle tier between a client and our ultimate date source is that software in the middle tier (commonly referred to as middleware) can include business logic. Business logic abstracts complicated low-level tasks (such as updating database tables) into high-level tasks (placing and order), making the whole operation simpler and safer.

Imagine a client application that places an order. Without middleware, the application has to connect directly to the database server that stores the order records and then change the database fields to reflect the order. If the database server changes in any way (by moving to a different machine, altering its internal table structure, or changing database vendors), the client may break. Even worse, if someone makes a minor change to the client (either intentionally or accidentally), it's possible for the database to record orders without first receiving payment or to reject perfectly valid entries.

Middleware uses business logic to abstract the ordering process. Middleware accepts information about the order (for example, name, address, item, quantity, credit card number), sanity-checks the information, verifies that the credit card is valid, and enters the information into the database. Should the database change, the middleware can be updated without any changes in the client. Even if the orders database is temporarily replaced with a simple flat file order log, the middleware can present the same appearance to the client.

Middleware can improve efficiency by spreading the processing load across several back-end servers (CPU servers, database servers, file servers, directory servers, etc.). Middleware can also make more efficient use of bandwidth: instead of having a client perform the back-and-forth communication with the server over what might be a slow network connection, the client can tell the middleware what it needs and the middleware can do the work using a fast network connection and probably pooled database connections.

On the Web, middle tiers are often implemented using servlets. Servlets provide a convenient way to connect clients built using HTML forms or applets to back-end servers. A client communicates its requirements to the servlet using HTTP, and the business logic in the servlet handles the request by connecting to the back-end server. (More information on applet-servlet communication is coming up in Chapter 10, "Applet-Servlet Communication".)

Servlets sometimes use another middle tier to connect to a database. If a web browser sends an HTML form with order information to a servlet, that servlet may parse the information and make an RMI call to middleware on another machine that has the responsibility for handling all orders--from servlets as well as standalone programs. In these cases, what was once three tiers is now four tiers.

9.1. Relational Databases

In some earlier examples, we've seen servlets that used file storage on the local disk to store their persistent data. The use of a flat file is fine for a small amount of data, but it can quickly get out of control. As the amount of data grows, access times slow to a crawl. And just finding data can become quite a challenge: imagine storing the names, cities, and email addresses of all your customers in a text file. It works great for a company that is just starting out, but what happens when you have hundreds of thousands of customers and want to display a list of all your customers in Boston with email addresses ending in "aol.com"?

One of the best solutions to this problem is a Relational Database Management System (RDBMS). At the most basic level, an RDBMS organizes data into tables. These tables are organized into rows and columns, much like a spreadsheet. Particular rows and columns in a table can be related (hence the term "relational") to one or more rows and columns in another table.

One table in a relational database might contain information about customers, another might contain orders, and a third might contain information about individual items within an order. By including unique identifiers (say, customer numbers and order numbers), orders from the orders table can be linked to customer records and individual order components. Figure 9-1 shows how this might look if we drew it out on paper.

Figure 9-1. Related tables

Data in the tables can be read, updated, appended, and deleted using the Structured Query Language, or SQL, sometimes also referred to as the Standard Query Language. Java's JDBC API introduced in JDK 1.1 uses a specific subset of SQL known as ANSI SQL-2 Entry Level. Unlike most programming languages, SQL is declarative: you say what you want, and the SQL interpreter gives it to you. Other languages, like C, C++, and Java, by contrast, are essentially procedural, in that you specify the steps required to perform a certain task. SQL, while not prohibitively complex, is also rather too broad a subject to cover in great (or, indeed, merely adequate) detail here. In order to make the rest of the examples in this chapter comprehensible, though, here's a brief tutorial.

The simplest and most common SQL expression is the SELECTstatement, which queries the database and returns a set of rows that matches a set of search criteria. For example, the following SELECTstatement selects everything from the CUSTOMERStable:

SELECT * FROM CUSTOMERS

SQL keywords like SELECT and FROM and objects like CUSTOMERS are case insensitive but frequently written in uppercase. When run in Oracle's SQL*PLUS SQL interpreter, this query would produce something like the following output:

CUSTOMER_ID   NAME                          PHONE
------------- ----------------------------- ---------------
1             Bob Copier                    617 555-1212
2             Janet Stapler                 617 555-1213
3             Joel Laptop                   508 555-7171
4             Larry Coffee                  212 555-6525

More advanced statements might restrict the query to particular columns or include some specific limiting criteria:

SELECT ORDER_ID, CUSTOMER_ID, TOTAL FROM ORDERS
WHERE ORDER_ID = 4

This statement selects the ORDER_ID, CUSTOMER_ID, and TOTAL columns from all records where the ORDER_ID field is equal to 4. Here's a possible result:

 ORDER_ID CUSTOMER_ID     TOTAL
--------- ----------- ---------
        4           1     72.19

A SELECT statement can also link two or more tables based on the values of particular fields. This can be either a one-to-one relationship or, more typically, a one-to-many relation, such as one customer to several orders:

SELECT CUSTOMERS.NAME, ORDERS.TOTAL FROM CUSTOMERS, ORDERS 
WHERE ORDERS.CUSTOMER_ID = CUSTOMERS.CUSTOMER_ID AND ORDERS.ORDER_ID = 4

This statement connects (or, in database parlance, joins) the CUSTOMERS table with the ORDERS table via the CUSTOMER_ID field. Note that both tables have this field. The query returns information from both tables: the name of the customer who made order 4 and the total cost of that order. Here's some possible output:

NAME                             TOTAL
-------------------------------- ---------
Bob Copier                       72.19

SQL is also used to update the database. For example:

INSERT INTO CUSTOMERS (CUSTOMER_ID, NAME, PHONE) 
  VALUES (5, "Bob Smith", "555 123-3456")
UPDATE CUSTOMERS SET NAME = "Robert Copier" WHERE CUSTOMER_ID = 1
DELETE FROM CUSTOMERS WHERE CUSTOMER_ID = 2

The first statement creates a new record in the CUSTOMERS table, filling in the CUSTOMER_ID, NAME, and PHONE fields with certain values. The second updates an existing record, changing the value of the NAME field for a specific customer. The last deletes any records with a CUSTOMER_ID of 2. Be very careful with all of these statements, especially DELETE. A DELETEstatement without a WHEREclause will remove all the records in the table!

For a good primer on relational databases and SQL, we recommend SQL for Dummies, by Allen G. Taylor (IDG Books Worldwide).

Chapter 9. Database Connectivity

Contents:

Servlets in the Middle Tier

9.1. Relational Databases

Figure 9-1. Related tables