Data Manipulation Commands (Java Enterprise in a Nutshell)

8.4. Data Manipulation Commands

Empty tables aren't very useful, and, even once they've been populated with data, we need some way of getting that data back out. The SQL data manipulation commands allow you to read data from a table and to create, update, and remove existing data.

8.4.1. SELECT

The SELECT statement is the most important statement in SQL and also the most complex. It allows you to retrieve data from a table or a set of tables. Here's the syntax:

SELECT [ DISTINCT ]
   { summary_function, ... }
   | { data_manipulation_expression, ... }
   | { column_name, ... }
FROM 
   { { table_name [ AS correlation_name ] }
   | { subquery [ AS correlation_name ] }
   | joined_tables}
[ WHERE predicate ]
[ GROUP BY column_name, ... [ HAVING group_selection_predicate ] ]
[ { UNION | INTERSECT | EXCEPT } [ ALL ] 
     [ CORRESPONDING [ BY (column_name, ...] ]
      select_statement | { TABLE table_name } 
                       | table_value_constructor ]
[ ORDER BY {{output_column [ ASC | DESC ]}, ...}
           | {{positive_integer [ ASC | DESC ]}, ...}]

The simplest possible SELECT, one that displays all columns of all rows of a single table, looks like this:

SELECT * FROM BOOKS

If this statement is executed in a command-line SQL interpreter, the output might look like this:

TITLE                | AUTHOR           | EDITION | PRICE
---------------------+------------------+---------+-------
Me                   | Garrison Keillor |       1 | 24.99
Bleak House          | Charles Dickens  |      57 |  8.99
A Tale Of Two Cities | Charles Dickens  |     312 |  4.99

To sort the output by title, we can add an ORDERBY clause to the statement:

SELECT * FROM BOOKS ORDER BY TITLE

Now the output is:

TITLE                | AUTHOR           | EDITION | PRICE
---------------------+------------------+---------+-------
A Tale Of Two Cities | Charles Dickens  |     312 |  4.99
Bleak House          | Charles Dickens  |      57 |  8.99
Me                   | Garrison Keillor |       1 | 24.99

To select just the TITLE and AUTHOR columns:

SELECT TITLE, AUTHOR FROM BOOKS

To select a subset of records, use the WHERE clause:

SELECT * FROM BOOKS WHERE PRICE < 10.0

This returns the Charles Dickens books, but not the Garrison Keillor book. You can have multiple criteria:

SELECT * FROM BOOKS WHERE PRICE < 10.0 OR EDITION = 1

This returns all three books. If we had specified a price less than 10 and an edition equal to 1, we wouldn't have received any records back. The various predicates you can use are listed in Table 8-2. Note that not all of the more esoteric ones (such as LIKE) are supported by all databases.

Table 8-2. SQL Predicates

Operator	Meaning
`=`	Equals
`<`	Less than
`>`	Greater than
`<=`	Less than or equal to
`>=`	Greater than or equal to
`!=` or `< >`	Not equal to (some implementations may only support one of these)
`LIKE`	Wildcard match
`ISNULL`	Checks for a `null` value
`IN`	Checks to see if a value is contained within a set
`BETWEEN`	Checks to see if a date value is between two other date values

The final four predicates in the table can be used with the NOT modifier (e.g., NOTLIKE, ISNOTNULL, etc.).

8.4.1.1. String comparisons

The = operator can generally be used for exact string comparisons. The LIKE operator allows wildcard searches using two wildcard characters: % to match any number of characters and _ to match at most one character. Here's a query that selects all records that have a LAST_NAME that contains the letters "for":

SELECT LAST_NAME FROM CUSTOMERS WHERE LAST_NAME LIKE`%for%'

This matches last names like Buford, Crawford, and Trefor, but may not match Fordham, since most databases implement LIKE in a case-sensitive manner (Microsoft Access is a notable exception to this rule). Case-insensitive searches generally require a single-case version of the column or the use of a case-adjusting function (we'll discuss functions later in this chapter).

8.4.1.2. Subqueries and joins

The IN predicate allows you to check whether a value appears in another set of values. The simplest way to use this feature is in a SQL statement like this:

SELECT * FROM BOOKS WHERE PRICE IN (5.95, 4.95, 7.95)

This is simply another form for:

SELECT * FROM BOOKS WHERE PRICE = 5.95 OR PRICE = 4.95 OR PRICE = 7.95

But we can do something more interesting with IN. Let's assume that we have a table, called PRICES, that holds all the prices we want to search on. In this case, we can generate the set of values using another query, as follows:

SELECT * FROM BOOKS WHERE PRICE IN
  SELECT PRICE FROM PRICES

Note that we didn't put parentheses around the second SELECT statement: use parentheses only when you are specifying the entire list manually.

Subqueries like this are useful, but they still restrict the output columns to those of a single table.

Rather than using subqueries, two tables are normally connected via a join. A join allows a query to include references to multiple tables and to restrict the output based on the relations between those tables. The basic join is an equi-join (or inner join): data in two tables is linked based on a shared value. An order-tracking database might include a CUSTOMERS table and an ORDERStable. The CUSTOMERS table has a customer identifier (CUSTOMER_ID) as its primary key. The orders table also has a CUSTOMER_IDcolumn, although not as a primary key, since there may be more than one order per customer (see Figure 8-1 for a graphical representation of these tables). Here's the SQL to combine the two tables based on the CUSTOMER_ID column:

SELECT * FROM CUSTOMERS, ORDERS WHERE ORDERS.CUSTOMER_ID = 
   CUSTOMERS.CUSTOMER_ID

Since it's an equi-join, it works just as well the other way around:

SELECT * FROM CUSTOMERS, ORDERS WHERE CUSTOMERS.CUSTOMER_ID = 
   ORDERS.CUSTOMER_ID

In SQL-92, there is a JOIN operator that performs the same operation with a slightly different syntax:[4]

[4] Note that some databases allow you to use JOIN without the ON clause. In this case, the join operates on columns that have the same name.

SELECT CUSTOMERS.CUSTOMER_ID, CUSTOMERS.NAME, ORDERS.ORDER_ID, ORDERS.TOTAL
  FROM CUSTOMERS INNER JOIN ORDERS ON CUSTOMERS.CUSTOMER_ID = ORDERS.CUSTOMER_ID

This example indicates the specific columns to return. Using the data shown in Figure 8-1, the output of this query might look like this:

CUSTOMER_ID | NAME       | ORDER_ID | TOTAL 
------------+------------+----------+--------
          1 | Bob Copier |        4 | 72.19 
          1 | Bob Copier |        6 | 21.07

Note that some databases require you to use INNERJOIN instead of JOIN, while others allow just JOIN because inner joins are the default join.

To join on more than one table, use multiple JOIN statements. Here we add an ITEMS table that includes the ORDER_ID from the ORDERS table:

SELECT * FROM ITEMS JOIN ORDERS JOIN CUSTOMERS

This query joins the CUSTOMERS and ORDERS tables based on CUSTOMER_ID, and the ORDERS and ITEMS tables based on ORDER_ID. The join is performed from left to right, so this three-table join goes from the largest table to the smallest table.

As useful as the JOIN keyword is, it is not required for entry-level SQL-92 implementations, so here's a three-column join using the syntax we started with:

SELECT CUSTOMERS.CUSTOMER_ID, CUSTOMERS.NAME, ORDERS.ORDER_ID, ORDER.TOTAL,
  ITEMS.ITEM_NO, ITEMS.COST FROM CUSTOMERS, ORDERS, ITEMS 
  WHERE ORDERS.CUSTOMER_ID = CUSTOMERS.CUSTOMER_ID
  AND ITEMS.ORDER_ID = ORDERS.ORDER_ID

Again, using the data shown in Figure 8-1, the output from this query might look as follows:

CUSTOMER_ID | NAME       | ORDER_ID | TOTAL | ITEM_NO | COST 
------------+------------+----------+-------+---------+-------
          1 | Bob Copier |        4 | 72.19 |    1280 | 16.72
          1 | Bob Copier |        4 | 72.19 |    4129 | 41.10
          1 | Bob Copier |        4 | 72.19 |    3017 | 14.37

So far, all we've talked about is equi-joins, or inner joins. There are also outer joins, which do not require a matching key in both tables. An outer join includes all the records from one table and any records from another table that match the primary key of the first table. If there are no corresponding records in the second table, those columns are simply left blank in the result. Outer joins are divided into left outer and right outer joins: in a left join, the primary key table is on the left, and in a right join, it is on the right. Here's the syntax for a LEFTJOIN (or LEFTOUTERJOIN) on CUSTOMERS and ORDERS:

SELECT CUSTOMERS.CUSTOMER_ID, CUSTOMERS.NAME, ORDERS.ORDER_ID, ORDERS.TOTAL
  FROM CUSTOMERS LEFT JOIN ORDERS ON CUSTOMERS.CUSTOMER_ID = ORDERS.CUSTOMER_ID

This includes all the customer records and order records for all the customers that have them. Here's some possible output using the data shown in Figure 8-1:

CUSTOMER_ID | NAME         | ORDER_ID | TOTAL 
------------+--------------+----------+--------
          1 | Bob Copier   |        4 | 72.19 
          1 | Bob Copier   |        6 | 21.07 
          2 | John Stapler |          |

If we were to do a RIGHTJOIN on CUSTOMERS and ORDERS, the result would be the same as an inner join, since there are no records in ORDERS that don't have a match in CUSTOMERS.

8.4.1.3. Groups

The GROUP BY clause allows you to collapse multiple records into groups with a common field. For instance, to select all the records in the BOOKS table grouped by AUTHOR:

SELECT AUTHOR FROM BOOKS GROUP BY AUTHOR

This returns one row for each distinct author in the table. This query is not really that useful though, since we can do the same thing with the DISTINCT keyword (SELECTDISTINCT). However, we can use an aggregate function on each of the groups to do something more useful:

SELECT AUTHOR, COUNT(*) FROM BOOKS GROUP BY AUTHOR

This query returns two columns: the author and the number of books by that author in the table. Here's the output, based on the BOOKS table we used earlier:

AUTHOR           | COUNT(*)
-----------------+----------
Charles Dickens  |        2
Garrison Keillor |        1

We'll talk more about aggregate functions later in this chapter.

8.4.2. INSERT

The INSERT statement loads data into a table. Here's the syntax:

INSERT INTO table_name
   [ (column_name, ...) ]
   subquery | { VALUES (val1, val2,...) } | DEFAULT VALUES

To load static data, simply specify the table and the actual data:

INSERT INTO CUSTOMERS VALUES (3, 'Tom Fax', '617 555-1214')

This statement inserts the values 3, "Tom Fax", and "617 555-1214" into the first three fields of a new row in the CUSTOMERS table. If there are more than three fields in the CUSTOMERS table, this statement fails. If you don't want to insert into every column, you can specify the columns you want to insert into:

INSERT INTO CUSTOMERS (CUSTOMER_ID, NAME) VALUES (3, 'Tom Fax')

Note, however, that this kind of statement can fail if we do not provide a value for a field that is specified as NOTNULL.

To add more than one row at a time and to add data from other tables, we can specify a subquery rather than a set of specific values. To fill the JUNKMAIL table with values from the CUSTOMERS and ADDRESSES tables, run this query:

INSERT INTO JUNKMAIL (NAME, ADDR, CITY, STATE, ZIP) 
  SELECT NAME, ADDR, CITY, STATE, ZIP FROM CUSTOMERS JOIN ADDRESSES

The database first performs a join on CUSTOMERS and ADDRESSES. It matches on the CUSTOMER_ID field and outputs the NAME field from CUSTOMERS and the other fields from ADDRESSES. The rows from the join are then inserted into the JUNKMAIL table, which can now be used to fill our mailboxes with catalogs.

8.4.3. UPDATE

The UPDATE statement modifies data in one or more existing rows. It consists of one or more SET statements and an optional WHERE clause. If the WHERE clause is not present, the operation is performed on every row in the table. Here's the syntax:

UPDATE table_name
   SET { column_name = { value | NULL | DEFAULT }, ...}
   [ { WHERE predicate }  
     | { WHERE CURRENT OF cursor_name } ]

Here's an example that updates a few fields:

UPDATE ADDRESSES
   SET ADDR = '1282 Country Club Drive', STATE='CA' WHERE CUSTOMER_ID
   = 432

This statement sets the ADDR and STATE fields of the ADDRESSEStable to particular values on all records where CUSTOMER_ID equals 432. Sometimes we do want to run an UPDATE on all records. Here's an example that makes sure all the STATE fields are in uppercase:

UPDATE ADDRESSES SET STATE = UPPER(STATE)

Note how we can use a field from the table itself in the SET statement.

The WHERECURRENTOF clause allows you to update the row at the current cursor location in a multiple-table row. This is not something that JDBC programmers need to concern themselves with, although it can be of interest to an underlying JDBC 2.0 driver.

8.4.4. DELETE

DELETE is very simple: it removes rows from a table. Here's the syntax:

DELETE FROM table_name
   [ { WHERE predicate } 
   | { WHERE CURRENT OF cursor_name } ]

To delete all the rows in ORDERS:

DELETE FROM ORDERS

To delete a specific record:

DELETE FROM ORDERS WHERE ORDER_ID = 32

Once a row has been deleted, there is no way to recover it.