Advanced DBI (Programming the Perl DBI)

This chapter covers some of the more advanced topics of using DBI, including the ability to alter the way in which the database and statement handles function on-the-fly, as well as how to use explicit transaction handling within your database. These topics are not strictly necessary for basic DBI usage, but they contain useful information that will allows you to maximize the potential of your DBI programs.

6.1. Handle Attributes and Metadata

In addition to methods associated with database and statement handles, the DBI also defines attributes for these handles that allow the developer to examine or fine-tune the environment in which the handles operate. Some attributes are unique to either database or statement handles, and some are common to both.

The attribute values of a handle can be thought of as a hash of key/value pairs, and can be manipulated in the same way as you would manipulate an ordinary hash via a reference. Here are a few examples using the AutoCommit attribute:

### Set the database handle attribute "AutoCommit" to 1 (e.g., on)
$dbh->{AutoCommit} = 1;

### Fetch the current value of "AutoCommit" from the handle
$foo = $dbh->{AutoCommit};

Fetching attributes as hash values, rather than as method calls, has the added bonus that the hash lookup can be interpolated inside double-quoted strings:

### Print the current value of "AutoCommit" from the handle
print "AutoCommit: $dbh->{AutoCommit}\n";

With AutoCommit enabled, that would print:

AutoCommit: 1

as you might expect. Actually, since AutoCommit is a boolean attribute, it would print 1 after any value that Perl considers true had been assigned to it.

After a false value was assigned, you may reasonably expect a 0 to be printed, but you might be surprised to see:

AutoCommit:

That's because Perl uses an internal representation of false that is both a numeric zero and an empty string at the same time. When used in a string context, the empty string is printed. In a numeric context, the zero is used.

When getting or setting an attribute value, the DBI automatically checks that the attribute name you are using and generates an error if it's not known.[55] Similarly, any attempts to set a read-only attribute will result in an error. Be aware, however, that these errors are reported using die() regardless of the setting of the RaiseError attribute, and are thus potentially fatal. That's another good reason to use eval {...} blocks, as we discussed in Chapter 4, "Programming with the DBI ".

[55]Driver-specific attributes, e.g., those that start with a lowercase letter, are a special case. Any get or set of a driver-specific attribute that hasn't been handled by the driver is handled by the DBI without error. That makes life easier for driver developers. On the other hand, you need to take extra care with the spelling.

A statement handle is known as a child, or kid, of its parent database handle. Similarly, database handles are themselves children of their parent driver handle. Child handles inherit some attribute values from parent handles. The rules for this behavior are defined in a common-sense manner and are as follows:

A statement handle will inherit (copy) the current values of certain attributes from its parent database handle.

If that new statement handle then has its attribute values altered, this affects neither the parent database handle nor any other statement handles. The changes are contained entirely within the altered statement handle.

Changes to attributes within a database handle do not affect any of its existing child statement handles. The database handle attribute changes only affect future statement handles created from that database handle.

The DBI specification in Appendix A, "DBI Specification " should be consulted for complete information on which attributes are inherited.

6.1.1. Passing Attributes to DBI Methods

Handles carry with them their set of current attribute values that methods often use to control how they behave. Many methods are defined to also accept an optional reference to a hash of attribute values.

This is primarily an escape mechanism for driver developers and their users, and so does not always work in the way you might think. For example, you might expect this code:

$dbh->{RaiseError} = 1;
...
$dbh->do( $sql_statement, undef, { RaiseError => 0 } ); # WRONG

to turn off RaiseError for the do() method call. But it doesn't! Attribute parameters are ignored by the DBI on all database handle and statement handle method calls. You don't even get a warning that the attribute has been ignored.

If they're ignored, then what's the point in having them? Well, the DBI itself ignores them, but the DBD driver that processed the method call may not. Or then again, it may! Attribute hash parameters to methods are hints to the driver and typically only usefully hold driver-specific attributes.[56]

[56]It's possible that a future version of the DBI may look for certain non-driver-specific attributes, such as RaiseError.

That doesn't apply to the DBI->connect() method call because it's not a driver method, it's a DBI method. Its attribute hash parameter, \%attr , is used to set the attributes of the newly created database handle. We gave some examples using RaiseError in Chapter 4, "Programming with the DBI ", and we give more in the following section.

6.1.2. Connecting with Attributes

One of Perl's many catch phrases is "there's more than one way to do it," and the DBI is no exception. In addition to being able to set attributes on a handle by simple assignment and by the attribute parameter of the connect() method (as shown earlier), the DBI provides another way.

You can include attribute assignments in the data source name parameter of the connect() method. For example:

$dbh = DBI->connect( "dbi:Oracle:archaeo", "username", "password" , {
    RaiseError => 1
});

can also be expressed as:

$dbh = DBI->connect( "dbi:Oracle(RaiseError=>1):archaeo", '', '');

You can't have any space before the opening parenthesis or after the closing one before the colon, but you can have spaces within the parentheses. You can also use just = instead of => if you prefer. If you want to set more than one attribute then use a comma to separate each one.

The attribute settings in the data source name parameter take precedence over those in the attribute parameter. This can be very handy when you want to override a hardcoded attribute setting, such as PrintError. For example, this code will leave PrintError on:

$dbh = DBI->connect( "dbi:Oracle(PrintError=>1):archaeo", '', '', {
    PrintError => 0
});

But what's the point of just hardcoding the attribute setting in two different places? This example is not very useful as it stands, but we could let the application accept the data source name parameter from the command line as an option, or leave it empty and use the DBI_DSN environment variable. That makes the application much more flexible.

6.1.3. The Significance of Case

You may have noticed that some attribute names use all uppercase letters, like NUM_OF_FIELDS, while others use mixed case letters, like RaiseError. If you've seen any descriptions of individual database drivers you may have also noticed some attribute names that use all lowercase letters, like ado_conn and ora_type.

There is a serious method behind the apparently inconsistent madness. The letter case used for attribute names is significant and plays an important part in the portability of DBI scripts and the extensibility of the DBI itself. The letter case of the attribute name is used to signify who defined the meaning of that name and its values, as follows:

UPPER_CASE

Attribute names that use only uppercase letters and underscores are defined by external standards, such as ISO SQL or ODBC.

The statement handle TYPE attribute is a good example here. It's an uppercase attribute because the values it returns are the standard portable datatype numbers defined by ISO SQL and ODBC, and not the nonportable native database datatype numbers.

MixedCase

Attribute names that start with an uppercase letter but include lowercase letters are defined by the DBI specification.

lower_case

Attribute names that start with a lowercase letters are defined by individual database drivers. These are known as driver-specific attributes.

Because the meanings are assigned by driver authors without any central control, it's important that two driver authors don't pick the same name for attributes with different behaviors. To ensure this, driver-specific attributes all begin with a prefix that identifies the particular driver. For example, DBD::ADO attributes all begin with ado_, DBD::Informix attributes begin with ix_, etc.

For example, most drivers provide a driver-specific version of the statement handle TYPE attribute that returns the native database datatype numbers instead of the standard ones. DBD::Oracle calls it ora_type, DBD::Ingres calls it ing_ingtype, and DBD::mysql calls it mysql_type. The prefix also makes it easier to find driver-specific code in applications when maintaining them.

Driver-specific attributes play an important role in the DBI. They are an escape valve. They let drivers expose more of the special functionality and information that they have available without having to fit it inside the fairly narrow DBI straitjacket.

6.1.4. Common Attributes

Common attributes are those that can be queried and set within both database and statement handles. This section discusses some of the most commonly used attributes, including:

PrintError

The DBI specification in Appendix A, "DBI Specification " provides a complete list of all the common attributes defined within the DBI.

6.1.5. Database Handle Attributes

Database handle attributes are specific to database handles and are not valid for other types of handles. They include:

AutoCommit

The DBI Specification in Appendix A, "DBI Specification " provides a complete list of all the database handle attributes defined within the DBI. We'll discuss statement handle attributes in a moment, but first we'll explore database metadata.

6.1.6. Database Metadata

Database metadata is high-level information, or "data about data," stored within a database describing that database. This information is extremely useful for dynamically building SQL statements or even generating dynamic views of the database contents.

The metadata stored by a database, and the way in which it's stored, varies widely between different database systems. Most major systems provide a system catalog , consisting of a set of tables and views that can be queried to get information about all the entities in the database, including tables and views. There are two common problems with trying to query the system catalog directly: they can be complex and difficult to query, and the queries are not portable to other types of database.

The DBI should provide a range of handy methods to access this information in a portable way, and one day it will. However, currently it only provides two methods that can be executed against a valid database handle to extract entity metadata from the database.

The first of these methods is called tables() , and simply returns an array containing the names of tables and views within the database defined by the relevant database handle. The following code illustrates the use of this method:

### Connect to the database
my $dbh = DBI->connect( 'dbi:Oracle:archaeo', 'stones', 'stones' );

### Get a list of tables and views
my @tables = $dbh->tables();

### Print 'em out
foreach my $table ( @tables ) {
    print "Table: $table\n";
}

Connecting to a MySQL database would generate:

Table: megaliths
Table: media
Table: site_types

However, connecting to an Oracle database would generate:

Table: STONES.MEGALITHS
Table: STONES.MEDIA
Table: STONES.SITE_TYPES

In both cases, if the database contains other tables, they'd be included in the output.

Oracle stores all names in uppercase by default, so that explains one of the differences, but what about the "STONES." that's been prefixed to each table name?

Oracle, like most other big database systems, supports the concept of schemas . A schema is a way of grouping together related tables and other database objects into a named collection. In Oracle each user gets their own schema with the same name as the user. (Not all databases that support schemas take this approach.)

If an Oracle user other than stones wanted to refer to the media table then, by default, they would need to fully qualify the table name by adding the stones schema name, e.g., stones.media. If they didn't then the database would think they were refering to a media table in their own schema.

So, the leading STONES in the output is the name of the schema that the tables are defined in. Returning the fully qualified table names is important because the tables() method will return the names of all the tables owned by all the users that it can discover.

The other method used to retrieve database metadata is called table_info() , and returns more detailed information about the tables and views stored within the database.

When invoked, table_info() returns a prepared and executed statement handle that can be used to fetch information on the tables and views in the database. Each row fetched from this statement handle contains at least the following fields in the order listed:[60]

[60]Database drivers are free to include additional columns of information in the result data.

TABLE_QUALIFIER: This field contains the table qualifier identifier. In most cases this will be undef (NULL).
TABLE_OWNER: This field contains the name of the owner of the table. If your database does not support multiple schema or table owners, this field will contain undef (NULL).
TABLE_NAME: This field contains the name of the table and should never be undef.
TABLE_TYPE: This field contains the ``type'' of entity signified by this row. The possible values include TABLE, VIEW, SYSTEM TABLE, GLOBAL TEMPORARY, LOCAL TEMPORARY, ALIAS, SYNONYM, or some database driver-specific identifier.
REMARKS: This field contains a description or comment about the table. This field may be undef (NULL).

Therefore, if we wished to list some basic information on the tables contained within the current schema or database, we can write the following program that uses table_info() to retrieve all the table information, then formats the output:

#!/usr/bin/perl -w
#
# ch06/dbhdump: Dumps information about a SQL statement.

use DBI;

### Connect to the database
my $dbh = DBI->connect( "dbi:Oracle:archaeo", "username", "password" , {
    RaiseError => 1
} );

### Create a new statement handle to fetch table information
my $tabsth = $dbh->table_info();

### Print the header
print "Qualifier  Owner     Table Name                       Type   Remarks\n";
print "=========  ========  ===============================  =====  =======\n\n";

### Iterate through all the tables...
while ( my ( $qual, $owner, $name, $type, $remarks ) = 
        $tabsth->fetchrow_array() ) {

    ### Tidy up NULL fields
    foreach ($qual, $owner, $name, $type, $remarks) {
        $_ = "N/A" unless defined $_;
    }
    
    ### Print out the table metadata...
    printf "%-9s  %-9s %-32s %-6s %s\n", $qual, $owner, $name, $type, $remarks;
}

exit;

Running this program against our megalithic database on an Oracle database produces the following output:

Qualifier  Owner     Table Name                       Type   Remarks
=========  ========  ===============================  =====  =======

N/A        STONES    MEDIA                            TABLE  N/A
N/A        STONES    MEGALITHS                        TABLE  N/A
N/A        STONES    SITE_TYPES                       TABLE  N/A

This form of metadata is not tremendously useful, as it lists only metadata about the objects within the database, and not the structure of the objects themselves (such as table column names). Extracting the structure of each table or view within the database requires us to look to a different type of metadata, which is available via statement handles .

6.1.7. Statement Handle Attributes or Statement Metadata

Statement handle attributes are specific to statement handles, and inherit any inheritable attributes from their parent database handle. Many statement handle attributes are defined as being read-only because they simply describe the prepared statement or its results.

In theoretical terms, these attributes should be defined when the statement handle is prepared, but in practical terms, you should only rely on the attribute values after the statement handle has been both prepared and executed. Similarly, with a few drivers, fetching all the data from a SELECT statement or explicitly invoking the finish() method against a statement handle may cause the values of the statement handle attributes to be no longer available.

The DBI specification in Appendix A, "DBI Specification " provides a complete list of all the statement handle attributes defined within the DBI.

Statement

This attribute contains the statement string passed to the prepare() method.

NUM_OF_FIELDS

This attribute is set to contain the number of columns that will be returned by a SELECT statement. For example:

$sth = $dbh->prepare( "
            SELECT name, location, mapref
            FROM megaliths
          " );
$sth->execute();
print "SQL statement contains $sth->{NUM_OF_FIELDS} columns\n";

Non-SELECT statements will contain the attribute value of zero. This allows you to quickly determine whether or not the statement is a SELECT statement.

NAME

NAME_uc

NAME_lc

The NAME attribute contains the names of the selected columns within the statement. The attribute value is actually a reference to an array, with length equal to the number of fields in the original statement.

For example, you can list all the column names of a table like this:

$sth = $dbh->prepare( "SELECT * FROM megaliths" );
$sth->execute();
for ( $i = 1 ; $i <= $sth->{NUM_OF_FIELDS} ; $i++ ) {
    print "Column $i is called $sth->{NAME}->[$i-1]\n";
}

The names contained within the attribute array are the column names returned by the underlying database.

There are two additional attributes relating to the column names. NAME_uc contains the same column names as the NAME attribute, but with any lowercase characters converted to uppercase. Similarly the NAME_lc attribute has any uppercase characters converted to lowercase. Generally these attributes should be used in preference to NAME.

TYPE

The TYPE attribute contains a reference to an array of integer values representing the international standard values for the respective datatypes. The array of integers has a length equal to the number of columns selected within the original statement, and can be referenced in a similar way to the NAME attribute example shown earlier.

The standard values for common types are:

SQL_CHAR             1
SQL_NUMERIC          2
SQL_DECIMAL          3
SQL_INTEGER          4
SQL_SMALLINT         5
SQL_FLOAT            6
SQL_REAL             7
SQL_DOUBLE           8
SQL_DATE             9
SQL_TIME            10
SQL_TIMESTAMP       11
SQL_VARCHAR         12
SQL_LONGVARCHAR     -1
SQL_BINARY          -2
SQL_VARBINARY       -3
SQL_LONGVARBINARY   -4
SQL_BIGINT          -5
SQL_TINYINT         -6
SQL_BIT             -7
SQL_WCHAR           -8
SQL_WVARCHAR        -9
SQL_WLONGVARCHAR   -10

While these numbers are fairly standard,[61] the way drivers map their native types to these standard types varies greatly. Native types that don't correspond well to one of these types may be mapped into the range officially reserved for use by the Perl DBI: -9999 to -9000.

[61]Some are ISO standard, others are Microsoft ODBC de facto standard. See ftp://jerry.ece.umassd.edu/isowg3/dbl/SQL_Registry and search for "SQL Data Types," or the types names of interest, on http://search.microsoft.com/us/dev/ and browse the results.

PRECISION

The PRECISION attribute contains a reference to an array of integer values that represent the defined length or size of the columns in the SQL statement.

There are two general ways in which the precision of a column is calculated. String datatypes, such as CHAR and VARCHAR, return the maximum length of the column. For example, a column defined within a table as:

location        VARCHAR2(1000)

would return a precision value of 1000.

Numeric datatypes are treated slightly differently in that the number of significant digits is returned. This may have no direct relationship with the space used to store the number. Oracle, for example, stores numbers with 38 digits of precision but uses a variable length internal format of between 1 and 21 bytes.

For floating-point types such as REAL, FLOAT, and DOUBLE, the maximum ``display size'' can be up to seven characters greater than the precision due to concatenated sign, decimal point, the letter ``E,'' a sign, and two or three exponent digits.

SCALE

The SCALE attribute contains a reference to an array of integer values that represents the number of decimal places in the column. This is obviously only of any real use with floating-point numbers. Integers and non-numeric datatypes will return zero.

NULLABLE

The NULLABLE attribute contains a reference to an array of integer values that tells us whether or not a column may contain a NULL value. The elements of the attribute array each contain one of three values:

0: The column cannot contain a NULL value.
1: The column can contain a NULL value.
2: It is unknown if the column can contain a null value.

NUM_OF_PARAMS

Common uses for these statement handle attributes are to format and display data fetched from queries dynamically and to find out information about the tables stored within the database.

The following script performs the latter operation by first creating a statement handle that fetches information on all tables, as discussed earlier in Section 6.1.6, "Database Metadata ", and then iterating through each table listing the table structure via the statement metadata:

#!/usr/bin/perl -w
#
# ch06/tabledump: Dumps information about all the tables.

use DBI;

### Connect to the database
my $dbh = DBI->connect( "dbi:Oracle:archaeo", "username", "password" , {
    RaiseError => 1
});

### Create a new statement handle to fetch table information
my $tabsth = $dbh->table_info();

### Iterate through all the tables...
while ( my ( $qual, $owner, $name, $type ) = $tabsth->fetchrow_array() ) {

    ### The table to fetch data for
    my $table = $name;
    
    ### Build the full table name with quoting if required
    $table = qq{"$owner"."$table"} if defined $owner;
    
    ### The SQL statement to fetch the table metadata
    my $statement = "SELECT * FROM $table";
    
    print "\n";
    print "Table Information\n";
    print "=================\n\n";
    print "Statement:     $statement\n";
    
    ### Prepare and execute the SQL statement
    my $sth = $dbh->prepare( $statement );
    $sth->execute();
    
    my $fields = $sth->{NUM_OF_FIELDS};
    print "NUM_OF_FIELDS: $fields\n\n";
    
    print "Column Name                     Type  Precision  Scale  Nullable?\n";
    print "------------------------------  ----  ---------  -----  ---------\n\n";
    
    ### Iterate through all the fields and dump the field information
    for ( my $i = 0 ; $i < $fields ; $i++ ) {
    
        my $name = $sth->{NAME}->[$i];
    
        ### Describe the NULLABLE value
        my $nullable = ("No", "Yes", "Unknown")[ $sth->{NULLABLE}->[$i] ];
        ### Tidy the other values, which some drivers don't provide
        my $scale = $sth->{SCALE}->[$i];
        my $prec  = $sth->{PRECISION}->[$i];
        my $type  = $sth->{TYPE}->[$i];
    
        ### Display the field information
        printf "%-30s %5d      %4d   %4d   %s\n",
                $name, $type, $prec, $scale, $nullable;
    }
    
    ### Explicitly deallocate the statement resources
    ### because we didn't fetch all the data
    $sth->finish();
}

exit;

When executed against our megalithic database, the following output is displayed:

Table Information
=================

Statement: SELECT * FROM STONES.MEDIA
NUM_OF_FIELDS: 5

Column Name                     Type  Precision  Scale  Nullable?
------------------------------  ----  ---------  -----  ---------

ID                                 3         38      0  No
MEGALITH_ID                        3         38      0  Yes
URL                               12       1024      0  Yes
CONTENT_TYPE                      12         64      0  Yes
DESCRIPTION                       12       1024      0  Yes

Table Information
=================

Statement: SELECT * FROM STONES.MEGALITHS
NUM_OF_FIELDS: 6

Column Name                     Type  Precision  Scale  Nullable?
------------------------------  ----  ---------  -----  ---------

ID                                 3         38      0  No
NAME                              12        512      0  Yes
DESCRIPTION                       12       2048      0  Yes
LOCATION                          12       2048      0  Yes
MAPREF                            12         16      0  Yes
SITE_TYPE_ID                       3         38      0  Yes

Table Information
=================

Statement: SELECT * FROM STONES.SITE_TYPES
NUM_OF_FIELDS: 3
Column Name                     Type  Precision  Scale  Nullable?
------------------------------  ----  ---------  -----  ---------

ID                                 3         38      0  No
SITE_TYPE                         12        512      0  Yes
DESCRIPTION                       12       2048      0  Yes

This output shows the structural information of entities within our database. We could have achieved the same effect by querying our database's underlying system tables. This would give us more information, but would not be portable.

Chapter 6. Advanced DBI

Contents: