11.4 The Python Database API (DBAPI) 2.0
As I mentioned earlier,
the Python standard library does not come with an RDBMS interface,
but there are many free third-party modules that let your Python
programs access specific databases. Such modules mostly follow the
Python Database API 2.0 standard, also known as the DBAPI.
At the time of this writing, Python's DBAPI Special
Interest Group (SIG) was busy preparing a new version of the DBAPI
(possibly to be known as 3.0 when it is ready). Programs written
against DBAPI 2.0 should work with minimal or no changes with the
future DBAPI 3.0, although 3.0 will no doubt offer further
enhancements that future programs will be able to take advantage
of.
If your Python program runs only
on Windows, you may prefer to access databases by using
Microsoft's ADO package through
COM. For more information on using Python on
Windows, see the book Python Programming on Win32, by Mark Hammond and Andy Robinson
(O'Reilly). Since ADO and COM are platform-specific,
and this book focuses on cross-platform use of Python, I do not cover
ADO nor COM further in this book.
After importing a DBAPI-compliant module, you call the
module's connect function with
suitable parameters. connect returns an instance
of class Connection, which represents a connection
to the database. This instance supplies commit and
rollback methods to let you deal with
transactions, a close method to call as soon as
you're done with the database, and a
cursor method that returns an instance of class
Cursor. This instance supplies the methods and
attributes that you'll use for all database
operations. A DBAPI-compliant module also supplies exception classes,
descriptive attributes, factory functions, and
type-description attributes.
11.4.1 Exception Classes
A
DBAPI-compliant module supplies exception classes
Warning, Error, and several
subclasses of Error. Warning
indicates such anomalies as data truncation during insertion.
Error's subclasses indicate
various kinds of errors that your program can encounter when dealing
with the database and the DBAPI-compliant module that interfaces to
it. Generally, your code uses a statement of the form:
try:
...
except module.Error, err:
...
in order to trap all database-related errors that you need to handle
without terminating.
11.4.2 Thread Safety
When a DBAPI-compliant module has an
attribute threadsafety that is greater than
0, the module is asserting some specific level of
thread safety for database interfacing. Rather than relying on this,
it's safer and more portable to ensure that a single
thread has exclusive access to any given external resource, such as a
database, as outlined in Chapter 14.
11.4.3 Parameter Style
A DBAPI-compliant module has an
attribute paramstyle that identifies the style of
markers to use as placeholders for parameters. You insert such
markers in SQL statement strings that you pass to methods of
Cursor instances, such as method
execute, in order to use runtime-determined
parameter values. Say, for example, that you need to fetch the rows
of database table ATABLE where field
AFIELD equals the current value of Python
variable x. Assuming the cursor instance
is named c, you could perform this task by
using Python's string formatting operator
% as follows:
c.execute('SELECT * FROM ATABLE WHERE AFIELD=%r' % x)
However, this is not the recommended approach. This approach
generates a different statement string for each value of
x, requiring such statements to be parsed
and prepared anew each time. With parameter substitution, you pass to
execute a single statement string, with a
placeholder instead of the parameter value. This lets
execute perform parsing and preparation just once,
giving potentially better performance. For example, if a
module's paramstyle attribute is
'qmark', you can express the above query as:
c.execute('SELECT * FROM ATABLE WHERE AFIELD=?', [x])
The read-only attribute paramstyle is meant to
inform your program about the way to use parameter substitution with
that module. The possible values of paramstyle
are:
- format
-
The marker is %s, as in string formatting. A query
looks like:
c.execute('SELECT * FROM ATABLE WHERE AFIELD=%s', [x])
- named
-
The marker is :name and
parameters are named. A query look like:
c.execute('SELECT * FROM ATABLE WHERE AFIELD=:x', {'x':x})
- numeric
-
The marker is :n,
giving the parameter's number. A query looks like:
c.execute('SELECT * FROM ATABLE WHERE AFIELD=:1', [x])
- pyformat
-
The marker is
%(name)s
and parameters are named. A query looks like:
c.execute('SELECT * FROM ATABLE WHERE AFIELD=%(x)s', {'x':x})
- qmark
-
The marker is ?. A query looks like:
c.execute('SELECT * FROM ATABLE WHERE AFIELD=?', [x])
When paramstyle does not imply named parameters,
the second argument of method execute is a
sequence. When parameters are named, the second argument of method
execute is a dictionary.
11.4.4 Factory Functions
Parameters passed to the database via placeholders must typically be
of the right type. This means Python numbers (integers or
floating-point values), strings (plain or Unicode), and
None to represent SQL NULL.
Python has no specific types to represent dates, times, and binary
large objects (BLOBs). A DBAPI-compliant module supplies factory
functions to build such objects. The types used for this purpose by
most DBAPI-compliant modules are those supplied by module
mxDateTime, covered in Chapter 12, and strings or buffer types for BLOBs. The
factory functions are as follows.
Returns an object representing the given
string of bytes as a BLOB.
Returns an object representing the specified date.
Returns an object representing the date that is
s seconds after the epoch of module
time, covered in Chapter 12. For
example, DateFromTicks(time.time( )) is
today's date.
Returns an object representing the specified time.
Returns an object representing the time that is
s seconds after the epoch of module
time, covered in Chapter 12. For
example, TimeFromTicks(time.time( )) is the
current time.
Timestamp(year,month,day,hour,minute,second)
|
|
Returns an object representing the specified date and time.
Returns an object representing the date and time that is
s seconds after the epoch of module
time, covered in Chapter 12. For
example, TimestampFromTicks(time.time( )) is the
current date and time.
11.4.5 Type Description Attributes
A Cursor
instance's attribute description
describes the types and other characteristics of each column of a
query. Each column's type (the second item of the
tuple describing the column) equals one of the following attributes
of the DBAPI-compliant module:
- BINARY
-
Describes columns containing BLOBs
- DATETIME
-
Describes columns containing dates, times, or both
- NUMBER
-
Describes columns containing numbers of any kind
- ROWID
-
Describes columns containing a row-identification number
- STRING
-
Describes columns containing text of any kind
A cursor's description, and in particular each
column's type, is mostly useful for introspection
about the database your program is working with. Such introspection
can help you write general modules that are able to work with
databases that have different schemas, schemas that may not be fully
known at the time you are writing your code.
11.4.6 The connect Function
A DBAPI-compliant
module's connect function accepts
arguments that vary depending on the kind of database and the
specific module involved. The DBAPI standard recommends, but does not
mandate, that connect accept named arguments. In
particular, connect should at least accept
optional arguments with the following names:
- database
-
Name of the specific database to connect
- dsn
-
Data-source name to use for the connection
- host
-
Hostname on which the database is running
- password
-
Password to use for the connection
- user
-
Username for the connection
11.4.7 Connection Objects
A
DBAPI-compliant module's connect
function returns an object x that is an
instance of class Connection.
x supplies the following methods.
Terminates the database connection and releases all related
resources. Call close as soon as
you're done with the database, since keeping
database connections uselessly open can be a serious resource drain
on the system.
Commits the current transaction in the database. If the database does
not support transactions,
x.commit( ) is an
innocuous no-op.
Returns a new instance of class Cursor, covered
later in this section.
Rolls back the current transaction in
the database. If the database does not support transactions,
x.rollback( ) raises an
exception. The DBAPI recommends, but does not mandate, that for
databases that do not support transactions class
Connection supplies no rollback
method, so that x.rollback(
) raises AttributeError. You can test
whether transaction support is present with
hasattr(x,'rollback').
11.4.8 Cursor Objects
A
Connection instance provides a
cursor method that returns an object
c that is an instance of class
Cursor. A SQL cursor represents the set of results
of a query and lets you work with the records in that set, in
sequence, one at a time. A cursor as modeled by the DBAPI is a richer
concept, since it also represents the only way in which your program
executes SQL queries in the first place. On the other hand, a DBAPI
cursor allows you only to advance in the sequence of results (some
relational databases, but not all, also provide richer cursors that
are able to go backward as well as forward), and does not support the
SQL clause WHERE CURRENT
OF CURSOR. These limitations of
DBAPI cursors enable DBAPI-compliant modules to provide cursors even
on RDBMSes that provide no real SQL cursors at all. An instance of
class Cursor c supplies
many attributes and methods; the most frequently used ones are
documented here.
Closes the cursor and releases all related resources.
A read-only attribute that is a sequence of seven-item tuples, one
per column in the last query executed:
name, typecode, displaysize, internalsize, precision, scale, nullable c.description is
None if the last operation on
c was not a query or returned no usable
description of the columns involved. A cursor's
description is mostly useful for introspection about the database
your program is working with. Such introspection can help you write
general modules that are able to work with databases that have
different schemas, including schemas that may not be fully known at
the time you are writing your code.
c.execute(statement,parameters=None)
|
|
Executes a SQL statement on the database
with the given parameters.
parameters is a sequence when the
module's paramstyle is
'format', 'numeric', or
'qmark', and a dictionary when
'named' or 'pyformat'.
c.executemany(statement,*parameters)
|
|
Executes a SQL
statement on the database, once for each
item of the given parameters.
parameters is a sequence of sequences when
the module's paramstyle is
'format', 'numeric', or
'qmark', and a sequence of dictionaries when
'named' or 'pyformat'. For
example, the statement:
c.executemany('UPDATE atable SET x=? WHERE y=?',
(12,23),(23,34)) that uses a module whose paramstyle is
'qmark' is equivalent to, but probably faster
than, the two statements:
c.execute('UPDATE atable SET x=12 WHERE y=23')
c.execute('UPDATE atable SET x=23 WHERE y=34')
Returns all remaining result rows from
the last query as a sequence of tuples. Raises an exception if the
last operation was not a SELECT query.
Returns up to
n remaining result rows from the last
query as a sequence of tuples. Raises an exception if the last
operation was not a SELECT query.
Returns the next result row from the last query as a tuple. Raises an
exception if the last operation was not a SELECT
query.
A read-only attribute that specifies
the number of rows fetched or affected by the last operation, or
-1 if the module is unable to determine this
value.
11.4.9 DBAPI-Compliant Modules
Whatever
relational database you want to use, there's at
least one (and often more than one) DBAPI-compliant module that you
can download from the Internet. All modules listed in the following
sections, except mxODBC, have liberal licenses
that are mostly similar to Python's license (the SAP
DB, however, is licensed under GPL) and that let you use them freely
in either open source or closed source programs.
mxODBC can be used freely for noncommercial
purposes, but you must purchase a license for any commercial use.
There are so many relational databases that it's
impossible to list them all, but here are some of the most popular
ones:
- ODBC
-
Open DataBase Connectivity (ODBC) is a
popular standard that lets you connect to many different relational
databases, including ones not otherwise supported by DBAPI-compliant
modules, such as Microsoft Jet (also known as the Access database).
The Windows distribution of Python contains an
odbc module, but the module is unsupported and
complies to an older version of the DBAPI, not to the current version
2.0. On either Unix or Windows, use mxODBC,
available at http://www.lemburg.com/files/Python/mxODBC.html.
mxODBC's
paramstyle is 'qmark'. Its
connect function accepts three optional arguments,
named dsn,
user, and
password.
- Oracle
-
Oracle is a
widespread, commercial RDBMS. To interface to Oracle, I recommend
module DCOracle2, available at http://www.zope.org/Members/matt/dco2.
DCOracle2's
paramstyle is 'numeric'. Its
connect function accepts a single optional,
unnamed argument string with the syntax:
'user/password@service'
- Microsoft SQL Server
-
To interface
to Microsoft SQL Server, I recommend module
mssqldb, available at http://www.object-craft.com.au/projects/mssql/.
mssqldb's
paramstyle is 'qmark'. Its
connect function accepts three arguments, named
dsn, user, and
passwd, as well as an optional
database argument.
- DB2
-
For IBM DB/2, try
module DB2, available at ftp://people.linuxkorea.co.kr/pub/DB2/.
DB2's
paramstyle is 'format'. Its
connect function accepts three optional arguments,
named dsn, uid,
and pwd.
- MySQL
-
MySQL is a
widespread, open source RDBMS. To interface to MySQL, try
MySQLdb, available at http://sourceforge.net/projects/mysql-python.
MySQLdb's
paramstyle is 'format'. Its
connect function accepts four optional arguments,
named db, host,
user, and
passwd.
- PostgreSQL
-
PostgreSQL is
an excellent open source RDBMS. To interface to PostgreSQL, I
recommend psycopg, available at http://initd.org/Software/psycopg.
psycopg's
paramstyle is 'pyformat'. Its
connect function accepts a single mandatory
argument, named dsn, with the syntax:
'host=host dbname=dbname user=username password=password'
- SAP DB
-
SAP DB, once
known as Adabas, is a powerful RDBMS that used to be closed source,
but is now open source. SAP DB comes with
sapdbapi, available at http://www.sapdb.org/sapdbapi.html, as well
as other useful Python modules.
sapdbapi's
paramstyle is 'pyformat'. Its
connect function accepts three mandatory
arguments, named user,
password, and
database, and an optional argument named
host.
11.4.10 Gadfly
Gadfly,
available at http://gadfly.sf.net, is not an interface to
some other RDBMS, but rather a complete RDBMS engine written in
Python. Gadfly supports a large subset of standard SQL. For example,
Gadfly lacks NULL, but it does support
VIEW, which is a crucial SQL feature that engines
such as MySQL still lack at the time of this writing. Gadfly can run
as a daemon server, to which clients connect with TCP/IP.
Alternatively, you can run the Gadfly engine directly in your
application's process, if you don't
need other processes to be able to access the same database
concurrently.
The gadfly module has several discrepancies from
the DBAPI 2.0 covered in this chapter because Gadfly implements a
variant of the older DBAPI 1.0. The concepts are quite close, but
several details differ. The main differences are:
gadfly does not supply custom exception classes,
so Gadfly operations that fail raise normal Python exceptions, such
as IOError, NameError, etc.
gadfly does not supply a
paramstyle attribute. However, the module behaves
as if it supplied a paramstyle of
'qmark'.
gadfly does not supply a function named
connect; use the gadfly.gadfly
or gadfly.client.gfclient functions instead.
gadfly does not supply factory functions for data
types.
Gadfly cursors do not supply the executemany
method. Instead, in the specific case in which the SQL statement is
an INSERT, the execute method
optionally accepts as its second argument a list of tuples and
inserts all the data.
Gadfly cursors do not supply the rowcount method.
The gadfly module supplies the following functions.
gadfly.gadfly(dbname,dirpath)
|
|
Returns a connection object for the database named
dbname, which must have been previously
created in the directory indicated by string
dirpath. The database engine runs in the
same process as your application.
gadfly.client.gfclient(policyname, port, password, host)
|
|
Returns a connection object for the database served by a
gfserve process on the given
host and port.
policyname identifies the level of access
required, and is often 'admin' to specify
unlimited access.
|