Chapter 19

Chapter 19. Extending Python

So far in this book, we've been using Python as it comes out of the box. We have used interfaces to services outside Python, and coded extensions as Python modules. But we haven't added any external services beyond the built-in set. For many users, this makes perfect sense: such standalone programming is one of the main ways people apply Python. As we've seen, Python comes with batteries included -- interfaces to system tools, Internet protocols, GUIs, filesystems, and much more.

But for many systems, Python's ability to integrate with C-compatible components is a crucial feature of the language. In fact, Python's role as an extension and interface language in larger systems is one of the reasons for its popularity and why it is often called a "scripting" language. Its design supports hybrid systems that mix components written in a variety of programming languages. Because different languages have different strengths, being able to pick and choose on a component-by-component basis is a powerful concept. You can add Python to the mix anywhere you need an easy-to-use and flexible language tool.

For instance, compiled languages such as C and C++ are optimized for speed of execution, but are complex to program -- for developers, but especially for end users. Because Python is optimized for speed of development, using Python scripts to control or customize software components written in C or C++ can yield more flexible systems and dramatically faster development modes. Systems designed to delegate customizations to Python scripts don't need to be shipped with full source code and don't require end users to learn complex or proprietary languages. Moreover, moving selected components of a pure Python program to C can optimize program performance.

19.1.1 Integration Topics

The last two technical chapters of this book introduce Python's tools for interfacing to the outside world, and discuss both its ability to be used as an embedded language tool in other systems and its interfaces for extending Python scripts with new modules and types implemented in C-compatible languages. I'll also summarize other integration techniques that are less C-specific, such as COM and JPython.

When you mix Python with C components, either Python or C can be "on top." Because of that, there are two distinct integration APIs:

· The extending interface for running C extensions from Python programs

· The embedding interface for running Python code from C programs

This chapter covers extending, and the next explores embedding. Some systems use only one scheme, but many use both. For instance, embedded Python code run from C can also use linked-in C extensions to interface with the enclosing application. And in callback-based systems, C code accessed through extending interfaces may later use embedding techniques to run Python callback handlers. Python has an open and reentrant architecture that lets you mix languages arbitrarily.

Before we get into details, I should mention that Python/C integration is a big topic -- in principle, the entire set of extern C functions in the Python system makes up its runtime interface. The next two chapters concentrate only on the tools commonly used to implement integration with external modules. For additional examples beyond this book and its CD (view CD-ROM content online at http://examples.oreilly.com/python2), see the Python source code itself; its Modules and Objects directories are a wealth of code resources. Most of the Python built-ins we have used in this book -- from simple things such as integers and strings to more advanced tools such as files, system calls, Tkinter, and DBM -- utilize integration APIs and can be studied in Python's source code distribution.

These chapters assume that you know basic C programming concepts. If you don't, you won't miss much by skipping or skimming these chapters. Typically, C developers code the extending and embedding interfaces of a system, and others do the bulk of the system's programming with Python alone. But if you know enough about C programming to recognize a need for an extension language, you probably already have the required background knowledge for this chapter. The good news in both chapters is that much of the complexity inherent in integrating Python with a static compiled language like C can be automated with tools such as SWIG in the extension domain, and higher-level APIs in the embedding world.

19.2 C Extensions Overview

Because Python itself is coded in C today, compiled Python extensions can be coded in any language that is C-compatible in terms of call stacks and linking. That includes C, but also C++ with appropriate "extern C" declarations (which are automatically provided in Python header files). Python extensions coded in a C-compatible language can take two forms:

· C modules, which look and feel to their clients like Python module files

· C types, which behave like standard built-in types (numbers, lists, etc.)

Generally, C modules are used to implement flat function libraries, and C types are used to code objects that generate multiple instances. Because built-in types are really just precoded C extension types, your C extension types can do anything that built-in types can: method calls, addition, indexing, slicing, and so on.^[1] In the current Python release, though, types are not quite classes -- you cannot customize types by coding a Python subclass unless you add "wrapper" classes as frontend interfaces to the type. More on this later.

Both C modules and types register their operations with the Python interpreter as C function pointers. In all cases, the C layer is responsible for converting arguments passed from Python to C form, and converting results from C to Python form. Python scripts simply import C extensions and use them as though they were really coded in Python; C code does all the translation work.

C modules and types are also responsible for communicating errors back to Python, detecting errors raised by Python API calls, and managing garbage-collector reference counters on objects retained by the C layer indefinitely -- Python objects held by your C code won't be garbage-collected as long as you make sure their reference counts don't fall to zero. C modules and types may either be linked to Python statically (at build time) or dynamically (when first imported).

19.3 A Simple C Extension Module

At least that's the short story; we need to turn to some code to make this more concrete. C types generally export a C module with a constructor function. Because of that (and because they are simpler), let's start off by studying the basics of C module coding with a quick example.

When you add new or existing C components to Python, you need to code an interface (or "glue") logic layer in C that handles cross-language dispatching and data translation. The C source file in Example 19-1 shows how to code one by hand. It implements a simple C extension module named hello for use in Python scripts, with a function named message that simply returns its input string argument with extra text prepended.

Example 19-1. PP2E\Integrate\Extend\Hello\hello.c

/********************************************************************

 * A simple C extension module for Python, called "hello"; compile

 * this into a ".so" on python path, import and call hello.message;

 ********************************************************************/

#include <Python.h>

#include <string.h>

/* module functions */

static PyObject *                                 /* returns object */

message(PyObject *self, PyObject *args)           /* self unused in modules */

{                                                 /* args from python call */

    char *fromPython, result[64];

    if (! PyArg_Parse(args, "(s)", &fromPython))  /* convert Python -> C */

        return NULL;                              /* null=raise exception */

    else {

        strcpy(result, "Hello, ");                /* build up C string */

        strcat(result, fromPython);               /* add passed Python string */

        return Py_BuildValue("s", result);        /* convert C -> Python */

/* registration table  */

static struct PyMethodDef hello_methods[] = {

    {"message", message, 1},       /* method name, C func ptr, always-tuple */

    {NULL, NULL}                   /* end of table marker */

};

/* module initializer */

void inithello(  )                       /* called on first import */

{                                      /* name matters if loaded dynamically */

    (void) Py_InitModule("hello", hello_methods);   /* mod name, table ptr */

Ultimately, Python code will call this C file's message function with a string object and get a new string object back. First, though, it has to be somehow linked into the Python interpreter. To use this C file in a Python script, compile it into a dynamically loadable object file (e.g., hello.so on Linux) with a makefile like the one listed in Example 19-2, and drop the resulting object file into a directory listed on your PYTHONPATH module search path setting exactly as though it were a .py or .pyc file.^[2]

Example 19-2. PP2E\Integrate\Extend\Hello\makefile.hello

#############################################################

# Compile hello.c into a shareable object file on Linux,

# to be loaded dynamically when first imported by Python.

# MYPY is the directory where your Python header files live.

#############################################################

PY = $(MYPY)

hello.so: hello.c

        gcc hello.c -g -I$(PY)/Include -I$(PY) -fpic -shared -o hello.so

clean:

        rm -f hello.so core

This is a Linux makefile (other platforms will vary); to use it to build the extension module, simply type make -f makefile.hello at your shell. Be sure to include the path to Python's install directory with -I flags to access Python include (a.k.a. "header") files. When compiled this way, Python automatically loads and links the C module when it is first imported by a Python script.

Finally, to call the C function from a Python program, simply import module hello and call its hello.message function with a string:

[mark@toy ~/.../PP2E/Integrate/Extend/Hello]$ make -f makefile.hello

[mark@toy ~/.../PP2E/Integrate/Extend/Hello]$ python

>>> import hello                                   # import a C module

>>> hello.message('world')                         # call a C function

'Hello, world'

>>> hello.message('extending')

'Hello, extending'

And that's it -- you've just called an integrated C module's function from Python. The most important thing to notice here is that the C function looks exactly as if it were coded in Python. Python callers send and receive normal string objects from the call; the Python interpreter handles routing calls to the C function, and the C function itself handles Python/C data conversion chores.

In fact, there is little to distinguish hello as a C extension module at all, apart from its filename. Python code imports the module and fetches its attributes as if it had been written in Python. C extension modules even respond to dir calls as usual, and have the standard module and filename attributes (though the filename doesn't end in a .py or .pyc this time around):

>>> dir(hello)                                     # C module attributes

['__doc__', '__file__', '__name__', 'message']

>>> hello.__name__, hello.__file__

('hello', './hello.so')

>>> hello.message                                  # a C function object

<built-in function message>

>>> hello                                          # a C module object

<module 'hello' from './hello.so'>

Like any module in Python, you can also access the C extension from a script file. The Python file in Example 19-3, for instance, imports and uses the C extension module.

Example 19-3. PP2E\Integrate\Extend\Hello\hellouse.py

import hello

print hello.message('C')

print hello.message('module ' + hello.__file__)

for i in range(3):

    print hello.message(str(i))

Run this script as any other -- when the script first imports module hello, Python automatically finds the C module's .so object file in a directory on PYTHONPATH and links it into the process dynamically. All of this script's output represents strings returned from the C function in file hello.c :

[mark@toy ~/.../PP2E/Integrate/Extend/Hello]$ python hellouse.py

Hello, C

Hello, module ./hello.so

Hello, 0

Hello, 1

Hello, 2

19.3.1 Compilation and Linking

Now that I've shown you the somewhat longer story, let's fill in the rest of the details. You always must compile and somehow link C extension files like the hello.c example with the Python interpreter to make them accessible to Python scripts, but there is some flexibility on how you go about doing so. For example, the following rule could be used to compile this C file on Linux too:

hello.so: hello.c

    gcc hello.c -c -g -fpic -I$(PY)/Include -I$(PY) -o hello.o

    gcc -shared hello.o -o hello.so

    rm -f hello.o

To compile the C file into a shareable object file on Solaris, you might instead say something like this:

hello.so: hello.c

    cc hello.c -c -KPIC -o hello.o

    ld -G hello.o -o hello.so

    rm hello.o

On other platforms, it's more different still. Because compiler options vary widely, you'll have to consult your C or C++ compiler's documentation or Python's extension manuals for platform- and compiler-specific details. The point is to determine how to compile a C source file into your platform's notion of a shareable or dynamically loaded object file. Once you have, the rest is easy; Python supports dynamic loading of C extensions on all major platforms today.

19.3.1.1 Dynamic binding

Technically, what I've been showing you so far is called "dynamic binding," and represents one of two ways to link compiled C extensions with the Python interpreter. Since the alternative, "static binding," is more complex, dynamic binding is almost always the way to go. To bind dynamically, simply:

1. Compile hello.c into a shareable object file

2. Put the object file in a directory on Python's module search path

That is, once you've compiled the source code file into a shareable object file, simply copy or move the object file to a directory listed in PYTHONPATH. It will be automatically loaded and linked by the Python interpreter at runtime when the module is first imported anywhere in the Python process (e.g., from the interactive prompt, a standalone or embedded Python program, or a C API call).

Notice that the only non-static name in the hello.c example C file is the initialization function. Python calls this function by name after loading the object file, so its name must be a C global and should generally be of the form "initX", where "X" is both the name of the module in Python import statements and the name passed to Py_InitModule. All other names in C extension files are arbitrary, because they are accessed by C pointer, not by name (more on this later). The name of the C source file is arbitrary too -- at import time, Python cares only about the compiled object file.

19.3.1.2 Static binding

Under static binding, extensions are added to the Python interpreter permanently. This is more complex, though, because you must rebuild Python itself, and hence need access to the Python source distribution (an interpreter executable won't do). To link this example statically, add a line like:

hello ~/PP2E/Integrate/Extend/Hello/hello.c

to the Modules/Setup configuration file in the Python source code tree. Alternatively, you can copy your C file to the Modules directory (or add a link to it there with an ln command) and add a line to Setup like hello hello.c.

Then, rebuild Python itself by running a make command at the top level of the Python source tree. Python reconstructs its own makefiles to include the module you added to Setup, such that your code becomes part of the interpreter and its libraries. In fact, there's really no distinction between C extensions written by Python users and services that are a standard part of the language; Python is built with this same interface. The full format of module declaration lines looks like this (but see the Modules/Setup configuration file for more details):

<module> ... [<sourceOrObjectFile> ...] [<cpparg> ...] [<library> ...]

Under this scheme, the name of the module's initialization function must match the name used in the Setup file, or you'll get linking errors when you rebuild Python. The name of the source or object file doesn't have to match the module name; the leftmost name is the resulting Python module's name.

19.3.1.3 Static versus dynamic binding

Static binding works on any platform and requires no extra makefile to compile extensions. It can be useful if you don't want to ship extensions as separate files, or if you're on a platform without dynamic linking support. Its downsides are that you need to update the Python Setup configuration file and rebuild the Python interpreter itself, so you must therefore have the full source distribution of Python to use static linking at all. Moreover, all statically linked extensions are always added to your interpreter, whether or not they are used by a particular program. This can needlessly increase the memory needed to run all Python programs.

With dynamic binding, you still need Python include files, but can add C extensions even if all you have is a binary Python interpreter executable. Because extensions are separate object files, there is no need to rebuild Python itself or to access the full source distribution. And because object files are only loaded on demand in this mode, it generally makes for smaller executables too -- Python loads into memory only the extensions actually imported by each program run. In other words, if you can use dynamic linking on your platform, you probably should.

19.3.2 Anatomy of a C Extension Module

Though simple, the hello.c example illustrates the structure common to all C modules. This structure can vary somewhat, but this file consists of fairly typical boilerplate code:

Python header files

The C file first includes the standard Python.h header file (from the installed Python Include directory). This file defines almost every name exported by the Python API to C, and serves as a starting point for exploring the API itself.

Method functions

The file then defines a function to be called from the Python interpreter in response to calls in Python programs. C functions receive two Python objects as input, and send either a Python object back to the interpreter as the result, or a NULL to trigger an exception in the script (more on this later). In C, a PyObject* represents a generic Python object pointer; you can use more specific type names, but don't always have to. C module functions can all be declared C "static" (local to the file), because Python calls them by pointer, not name.

Registration table

Near the end, the file provides an initialized table (array) that maps function names to function pointers (addresses). Names in this table become module attribute names that Python code uses to call the C functions. Pointers in this table are used by the interpreter to dispatch C function calls. In effect, the table "registers" attributes of the module. A NULL entry terminates the table.

Initialization function

Finally, the C file provides an initialization function, which Python calls the first time this module is imported into a Python program. This function calls the API function Py_InitModule to build up the new module's attribute dictionary from the entries in the registration table and create an entry for the C module on the sys.modules table (described in Chapter 12). Once so initialized, calls from Python are routed directly to the C function through the registration table's function pointers.

19.3.3 Data conversions

C module functions are responsible for converting Python objects to and from C datatypes. In Example 19-1, message gets two Python input objects passed from the Python interpreter: args is a Python tuple holding the arguments passed from the Python caller (the values listed in parentheses in a Python program), and self is ignored; it is useful only for extension types (discussed later in this chapter).

After finishing its business, the C function can return any of the following to the Python interpreter: a Python object (known in C as PyObject*), for an actual result; a Python None, (known in C as Py_None), if the function returns no real result; or a C NULL pointer, to flag an error and raise a Python exception.

There are distinct API tools for handling input conversions (Python to C) and output conversions (C to Python). It's up to C functions to implement their call signatures (argument lists and types) by using these tools properly.

19.3.3.1 Python to C: Using Python argument lists

When the C function is run, the arguments passed from a Python script are available in the args Python tuple object. The API function PyArg_Parse(and PyArg_ParseTuple, its cousin that assumes it is converting a tuple object) is probably the easiest way to extract and convert passed arguments to C form.

PyArg_Parse takes a Python object, a format string, and a variable-length list of C target addresses. It converts the items in the tuple to C datatype values according to the format string, and stores the results in the C variables whose addresses are passed in. The effect is much like C's scanf string function. For example, the hello module converts a passed-in Python string argument to a C char* using the s convert code:

PyArg_Parse(args, "(s)", &fromPython)      # or PyArg_ParseTuple(args, "s",...

To handle multiple arguments, simply string format codes together and include corresponding C targets for each code in the string. For instance, to convert an argument list holding a string, an integer, and another string to C, say this:

PyArg_Parse(args, "(sis)", &s1, &i, &s2)   # or PyArg_ParseTuple(args, "sis",...

To verify that no arguments were passed, use an empty format string like this: PyArg_Parse(args, "( )"). This API call checks that the number and types of the arguments passed from Python matches the format string in the call. If there is a mismatch, it sets an exception and returns zero to C (more on errors below).

19.3.3.2 Python to C: Using Python return values

As we'll see in Chapter 20, Embedding Python, API functions may also return Python objects to C as results when Python is being run as an embedded language. Converting Python return values in this mode is almost the same as converting Python arguments passed to C extension functions, except that Python return values are not always tuples. To convert returned Python objects to C form, simply use PyArg_Parse. Unlike PyArg_ParseTuple, this call takes the same kinds of arguments but doesn't expect the Python object to be a tuple.

19.3.3.3 C to Python: Returning values to Python

There are two ways to convert C data to Python objects: by using type-specific API functions, or the general object-builder function Py_BuildValue. The latter is more general, and is essentially the inverse of PyArg_Parse, in that Py_BuildValue converts C data to Python objects according to a format string. For instance, to make a Python string object from a C char*, the hello module uses an s convert code:

return Py_BuildValue("s", result)            # "result" is a C char []/*

More specific object constructors can be used instead:

return PyString_FromString(result)           # same effect

Both calls make a Python string object from a C character array pointer. See the now-standard Python extension and runtime API manuals for an exhaustive list of such calls available. Besides being easier to remember, though, Py_BuildValue has syntax that allows you to build lists in a single step, described next.

19.3.3.4 Common conversion codes

With a few exceptions, PyArg_Parse(Tuple) and Py_BuildValue use the same conversion codes in format strings. A list of all supported conversion codes appears in Python's extension manuals. The most commonly used are shown in Table 19-1; the tuple, list, and dictionary formats can be nested.

Table 19-1. Common Python/C Data Conversion Codes
Format-String Code	C Datatype	Python Object Type
`s`	`char*`	String
`s#`	`char*, int`	String, length
`i`	`int`	Integer
`l`	`long int`	Integer
`c`	`char`	String
`f`	`float`	Floating-point
`d`	`double`	Floating-point
`O`	`PyObject*`	Raw (unconverted) object
`O&`	`&converter`, `void*`	Converted object (calls converter)
`(`items`)`	Targets or values	Nested tuple
`[`items`]`	Series of arguments/values	List
`{`items`}`	Series of `key,value` arguments	Dictionary

These codes are mostly what you'd expect (e.g., i maps between a C int and a Python integer object), but here are a few usage notes on this table's entries:

· Pass in the address of a char* for s codes when converting to C, not the address of a char array: Python copies out the address of an existing C string (and you must copy it to save it indefinitely on the C side: use strdup).

· The O code is useful to pass raw Python objects between languages; once you have a raw object pointer, you can use lower-level API tools to access object attributes by name, index and slice sequences, and so on.

· The O& code lets you pass in C converter functions for custom conversions. This comes in handy for special processing to map an object to a C datatype not directly supported by conversion codes (for instance, when mapping to or from an entire C struct or C++ class-instance). See the extensions manual for more details.

· The last two entries, [...] and {...}, are currently supported only by Py_BuildValue: you can construct lists and dictionaries with format strings, but can't unpack them. Instead, the API includes type-specific routines for accessing sequence and mapping components given a raw object pointer.

PyArg_Parsesupports some extra codes, which must not be nested in tuple formats ((...)):

The remaining arguments are all optional (varargs). The C targets are unchanged if arguments are missing in the Python tuple. For instance, si|sd requires two arguments but allows up to four.

The function name follows, for use in error messages set by the call (argument mismatches). Normally Python sets the error message to a generic string.

;

A full error message follows, running to the end of the format string.

This format code list isn't exhaustive, and the set of convert codes may expand over time; refer to Python's extension manual for further details.

19.3.4 Error Handling

When you write C extensions, you need to be aware that errors can occur on either side of the languages fence. The following sections address both possibilities.

19.3.4.1 Raising Python exceptions in C

C extension module functions return a C NULL value for the result object to flag an error. When control returns to Python, the NULL result triggers a normal Python exception in the Python code that called the C function. To name an exception, C code can also set the type and extra data of the exceptions it triggers. For instance, the PyErr_SetString API function sets the exception object to a Python object and sets the exception's extra data to a character string:

PyErr_SetString(ErrorObject, message)

We will use this in the next example to be more specific about exceptions raised when C detects an error. C modules may also set a built-in Python exception; for instance, returning NULL after saying this:

PyErr_SetString(PyExc_IndexError, "index out-of-bounds")

raises a standard Python IndexError exception with the message string data. When an error is raised inside a Python API function, both the exception object and its associated "extra data" are automatically set by Python; there is no need to set it again in the calling C function. For instance, when an argument-passing error is detected in the PyArg_Parsefunction, the hello stack module just returns NULL to propagate the exception to the enclosing Python layer, instead of setting its own message.

19.3.4.2 Detecting errors that occur in Python

Python API functions may be called from C extension functions, or from an enclosing C layer when Python is embedded. In either case, C callers simply check the return value to detect errors raised in Python API functions. For pointer result functions, Python returns NULL pointers on errors. For integer result functions, Python generally returns a status code of -1 to flag an error and a or positive value on success. (PyArg_Parse is an exception to this rule: it returns when it detects an error.) To make your programs robust, you should check return codes for error indicators after most Python API calls; some calls can fail for reasons you may not have expected (e.g., memory overflow).

19.3.5 Reference Counts

The Python interpreter uses a reference-count scheme to implement garbage collection. Each Python object carries a count of the number of places it is referenced; when that count reaches zero, Python reclaims the object's memory space automatically. Normally, Python manages the reference counts for objects behind the scenes; Python programs simply make and use objects without concern for managing storage space.

When extending or embedding Python, though, integrated C code is responsible for managing the reference counts of the Python objects it uses. How important this becomes depends on how many raw Python objects a C module processes and which Python API functions it calls. In simple programs, reference counts are of minor, if any, concern; the hello module, for instance, makes no reference-count management calls at all.

When the API is used extensively, however, this task can become significant. In later examples, we'll see calls of these forms show up:

· Py_INCREF(obj) increments an object's reference count.

· Py_DECREF(obj) decrements an object's reference count (reclaim if zero).

· Py_XINCREF(obj) is similar to Py_INCREF(obj), but ignores a NULL object pointer.

· Py_XDECREF(obj) is similar to py_DECREF(obj), but ignores a NULL object pointer.

C module functions are expected to return either an object with an incremented reference count, or NULL to signal an error. As a general rule, API functions that create new objects increment their reference counts before returning them to C; unless a new object is to be passed back to Python, the C program that creates it should eventually decrement the object's counts. In the extending scenario, things are relatively simple; argument object reference counts need not be decremented, and new result objects are passed back to Python with their reference counts intact.

The upside of reference counts is that Python will never reclaim a Python object held by C as long as C increments the object's reference count (or doesn't decrement the count on an object it owns). Although it requires counter management calls, Python's garbage collector scheme is fairly well-suited to C integration.

19.4 The SWIG Integration Code Generator

But don't do that. I'm introducing C extension basics so you understand the underlying structure, but today, C extensions are usually better and more easily implemented with the SWIG integration code generator.

SWIG -- the Simplified Wrapper and Interface Generator -- is an open source system created by Dave Beazley. It uses C and C++ type declarations to generate complete C extension modules that integrate existing libraries for use in Python scripts. The generated C extension modules are complete: they automatically handle data conversion, error protocols, reference-count management, and more.

That is, SWIG automatically generates all the "glue" code needed to plug C and C++ components into Python programs; simply compile its output and your extension work is done. You still have to manage compilation and linking details, but the rest of the C extension task is done by SWIG.

19.4.1 A Simple SWIG Example

For instance, instead of writing all that C code in the prior section, write the C function you want to use from Python without any Python integration logic at all, as though it is to be used from C alone. This is illustrated in Example 19-4.

Example 19-4. PP2E\Integrate\Extend\HelloLib\hellolib.c

/*********************************************************************

 * A simple C library file, with a single function, "message",

 * which is to be made available for use in Python programs.

 * There is nothing about Python here--this C function can be

 * called from a C program, as well as Python (with glue code).

 *********************************************************************/

#include <string.h>

#include <hellolib.h>

static char result[64];                  /* this isn't exported */

char *

message(char *label)                     /* this is exported */

    strcpy(result, "Hello, ");           /* build up C string */

    strcat(result, label);               /* add passed-in label */

    return result;                       /* return a temporary */

While you're at it, define the usual C header file to declare the function externally; as shown in Example 19-5. This is probably overkill, but will prove a point.

Example 19-5. PP2E\Integrate\Extend\HelloLib\hellolib.h

/********************************************************************

 * Define hellolib.c exports to the C namespace, not to Python

 * programs--the latter is defined by a method registration

 * table in a Python extension module's code, not by this .h;

 ********************************************************************/

extern char *message(char *label);

Now, instead of all the Python extension glue code shown in the prior section, simply write a SWIG type declarations input file, as in Example 19-6.

Example 19-6. PP2E\Integrate\Extend\Swig\hellolib.i

/******************************************************

 * Swig module description file, for a C lib file.

 * Generate by saying "swig -python hellolib.i".

 ******************************************************/

%module hellowrap

%{

#include <hellolib.h>

%}

extern char *message(char*);    /* or: %include "../HelloLib/hellolib.h"   */

                                /* or: %include hellolib.h, and use -I arg */

This file spells out the C function's type signature. In general, SWIG scans files containing ANSI C and C++ declarations. Its input file can take the form of an interface description file (usually with an .i suffix), or a C/C++ header or source file. Interface files like this one are the most common input form; they can contain comments in C or C++ format, type declarations just like standard header files, and SWIG directives that all start with %. For example:

· %module sets the module's name as known to Python importers.

· %{...%} encloses code added to generated wrapper file verbatim.

· extern statements declare exports in normal ANSI C/C++ syntax.

· %include makes SWIG scan another file (-I flags give search paths).

In this example, SWIG could also be made to read the hellolib.h header file directly. But one of the advantages of writing special SWIG input files like hellolib.i is that you can pick and choose which functions are wrapped and exported to Python; scanning a library's entire header file wraps everything it defines.

SWIG is really a utility that you run from your build scripts, not a programming language, so there is not much more to show here. Simply add a step to your makefile that runs SWIG, and compile its output to be linked with Python. Example 19-7 shows one way to do it on Linux.

Example 19-7. PP2E\Integrate\Extend\Swig\makefile.hellolib-swig

###############################################################

# Use SWIG to integrate hellolib.c for use in Python programs.

###############################################################

# unless you've run make install

SWIG = ./myswig

PY   = $(MYPY)

LIB  = ../HelloLib

# the library plus its wrapper

hellowrap.so: hellolib_wrap.o $(LIB)/hellolib.o

        ld -shared hellolib_wrap.o $(LIB)/hellolib.o -o hellowrap.so

# generated wrapper module code

hellolib_wrap.o: hellolib_wrap.c $(LIB)/hellolib.h

        gcc hellolib_wrap.c -c -g -I$(LIB) -I$(PY)/Include -I$(PY)

hellolib_wrap.c: hellolib.i

        $(SWIG) -python -I$(LIB) hellolib.i

# C library code (in another directory)

$(LIB)/hellolib.o: $(LIB)/hellolib.c $(LIB)/hellolib.h

        gcc $(LIB)/hellolib.c -c -g -I$(LIB) -o $(LIB)/hellolib.o

clean:

        rm -f *.o *.so core

force:

        rm -f *.o *.so core hellolib_wrap.c hellolib_wrap.doc

When run on the hellolob.i input file by this makefile, SWIG generates two files:

· hellolib_wrap.doc is a text summary of the functions in the module.

· hellolib_wrap.c is the generated C extension module glue code file.^[3]

This makefile simply runs SWIG, compiles the generated C glue code file into an .o object file, and then combines it with hellolib.c 's compiled object file to produce hellowrap.so. The latter is the dynamically loaded C extension module file, and the one to place in a directory on your Python module search path (or "." if you're working in the directory where you compile).

Assuming you've got SWIG set to go, run the makefile to generate and compile wrappers for the C function. Here is the build process running on Linux:

[mark@toy ~/.../PP2E/Integrate/Extend/Swig]$ make -f makefile.hellolib-swig

./myswig -python -I../HelloLib hellolib.i

Generating wrappers for Python

gcc hellolib_wrap.c -c -g -I../HelloLib  ...more text deleted here...

ld -shared hellolib_wrap.o ../HelloLib/hellolib.o -o hellowrap.so

And once you've run this makefile, you are finished. The generated C module is used exactly like the manually coded version shown before, except that SWIG has taken care of the complicated parts automatically:

[mark@toy ~/.../PP2E/Integrate/Extend/Swig]$ python

>>> import hellowrap                           # import the glue+library file

>>> hellowrap.__file__                         # cwd always searched on imports

'./hellowrap.so'

>>> hellowrap.message('swig world')

'Hello, swig world'

In other words, once you learn how to use SWIG, you can largely forget all the integration coding details introduced in this chapter. In fact, SWIG is so adept at generating Python glue code that it's usually much easier and less error-prone to code C extensions for Python as purely C or C++-based libraries first, and later add them to Python by running their header files through SWIG, as demonstrated here.

19.4.2 SWIG Details

Of course, you must have SWIG before you can run SWIG; it's not part of Python itself. Unless it is already on your system, fetch SWIG off the Web (or find it at http://examples.oreilly.com/python2) and build it from its source code. You'll need a C++ compiler (e.g., g++), but the install is very simple; see SWIG's README file for more details. SWIG is a command-line program, and generally can be run just by saying this:

swig -python hellolib.i

In my build environment, things are a bit more complex because I have a custom SWIG build. I run SWIG from this csh script called myswig:

#!/bin/csh

# run custom swig install

source $PP2EHOME/Integrate/Extend/Swig/setup-swig.csh

swig $*

This file in turn sets up pointers to the SWIG install directory by loading the following csh file, called setup-swig.csh :

# source me in csh to run SWIG with an unofficial install

setenv SWIG_LIB /home/mark/PP2ndEd/dev/examples/SWIG/SWIG1.1p5/swig_lib

alias swig "/home/mark/PP2ndEd/dev/examples/SWIG/SWIG1.1p5/swig"

But you won't need either of these files if you run a make install command in the SWIG source directory to copy it to standard places.

Along the way in this chapter, I'll show you a few more SWIG-based alternatives to the remaining examples. You should consult the SWIG Python user manual for the full scoop, but here is a quick look at a few more SWIG highlights:

C++ "shadow" classes

Later in the chapter, I'll also show you how to use SWIG to integrate C++ classes for use in your Python scripts. When given C++ class declarations, SWIG generates glue code that makes C++ classes look just like Python classes in Python scripts. In fact, C++ classes are Python classes under SWIG; you get what SWIG calls a C++ "shadow" class that interfaces with a C++ coded extension module, which in turn talks to C++ classes. Because the integration's outer layer is Python classes, those classes may be subclassed in Python and their instances processed with normal Python object syntax.

Variables

Besides functions and C++ classes, SWIG can also wrap C global variables and constants for use in Python: they become attributes of an object named cvar inserted in generated modules (e.g., module.cvar.name fetches the value of C's variable name from a SWIG-generated wrapper module).

Pointers

SWIG passes pointers between languages as strings (not as special Python types) for uniformity, and to allow type safety tests. For instance, a pointer to a Vector type may look like _100f8e2_Vector_p. You normally won't care, because pointer values are not much to look at in C either. SWIG can also be made to handle output parameters and C++ references.

Structs

C structs are converted into a set of get and set accessor functions that are called to fetch and assign fields with a struct object pointer (e.g., module.Vector_fieldx_get(v) fetches C's Vector.fieldx from a Vector pointer v, like C's v->fieldx). Similar accessor functions are generated for data members and methods of C++ classes (the C++ class is roughly a struct with extra syntax), but the SWIG shadow class feature allows you to treat wrapped classes just like Python classes, instead of calling the lower-level accessor functions.

Although the SWIG examples in this book are simple, you should know that SWIG handles industrial-strength libraries just as easily. For instance, Python developers have successfully used SWIG to integrated libraries as complex as Windows extensions and commonly used graphics APIs.

SWIG can also generate integration code for other scripting languages such as Tcl and Perl. In fact, one of its underlying goals is to make components independent of scripting language choices -- C/C++ libraries can be plugged in to whatever scripting language you prefer to use (I prefer to use Python, but I might be biased). SWIG's support for things like classes seems strongest for Python, though, probably because Python is considered to be strong in the classes department. As a language-neutral integration tool, SWIG addresses some of the same goals as systems like COM and CORBA (described in Chapter 20), but provides a code-generation-based alternative instead of an object model.

You can find SWIG on this book's CD (see http://examples.oreilly.com/python2) or at its home page on the Web, http://www.swig.org. Along with full source code, SWIG comes with outstanding documentation (including a manual specifically for Python), so I won't cover all of its features in this book. The documentation also describes how to build SWIG extensions on Windows. A SWIG book is reportedly in the works as I write this, so be sure to check the books list at http://www.python.org for additional resources.

19.5 Wrapping C Environment Calls

Let's move on to a more useful application of C extension modules. The hand-coded C file in Example 19-8 integrates the standard C library's getenv and putenv shell environment variable calls for use in Python scripts.

Example 19-8. PP2E\Integrate\Extend\CEnviron\cenviron.c

/******************************************************************

 * A C extension module for Python, called "cenviron".  Wraps the

 * C library's getenv/putenv routines for use in Python programs.

 ******************************************************************/

#include <Python.h>

#include <stdlib.h>

#include <string.h>

/***********************/

/* 1) module functions */

/***********************/

static PyObject *                                   /* returns object */

wrap_getenv(PyObject *self, PyObject *args)         /* self not used */

{                                                   /* args from python */

    char *varName, *varValue;

    PyObject *returnObj = NULL;                         /* null=exception */

    if (PyArg_Parse(args, "s", &varName)) {             /* Python -> C */

        varValue = getenv(varName);                     /* call C getenv */

        if (varValue != NULL)

            returnObj = Py_BuildValue("s", varValue);   /* C -> Python */

        else

            PyErr_SetString(PyExc_SystemError, "Error calling getenv");

    return returnObj;

static PyObject *

wrap_putenv(PyObject *self, PyObject *args)

    char *varName, *varValue, *varAssign;

    PyObject *returnObj = NULL;

    if (PyArg_Parse(args, "(ss)", &varName, &varValue))

        varAssign = malloc(strlen(varName) + strlen(varValue) + 2);

        sprintf(varAssign, "%s=%s", varName, varValue);

        if (putenv(varAssign) == 0) {

            Py_INCREF(Py_None);                   /* C call success */

            returnObj = Py_None;                  /* reference None */

        else

            PyErr_SetString(PyExc_SystemError, "Error calling putenv");

    return returnObj;

/**************************/

/* 2) registration table  */

/**************************/

static struct PyMethodDef cenviron_methods[] = {

    {"getenv", wrap_getenv},

    {"putenv", wrap_putenv},        /* method name, address */

    {NULL, NULL}

};

/*************************/

/* 3) module initializer */

/*************************/

void initcenviron(  )                  /* called on first import */

    (void) Py_InitModule("cenviron", cenviron_methods);   /* mod name, table */

This example is less useful now than it was in the first edition of this book -- as we learned in Part I, not only can you fetch shell environment variables by indexing the os.environ table, but assigning to a key in this table automatically calls C's putenv to export the new setting to the C code layer in the process. That is, os.environ['key'] fetches the value of shell variable 'key', and os.environ['key']=value assigns a variable both in Python and C.

The second action -- pushing assignments out to C -- was added to Python releases after the first edition of this book was published. Besides demonstrating additional extension coding techniques, though, this example still serves a practical purpose: even today, changes made to shell variables by the C code linked in to a Python process are not picked up when you index os.environ in Python code. That is, once your program starts, os.environ reflects only subsequent changes made by Python code.

If you want your Python code to be truly integrated with shell settings made by your C extension modules' code, you still must rely on calls to the C library's environment tools: putenv is available as os.putenv, but getenv is not present in the Python library. This will probably rarely, if ever, be an issue; but this C extension module is not completely without purpose (at least until Guido tightens this up again).^[4]

This cenviron.c C file creates a Python module called cenviron that does a bit more than the last example -- it exports two functions, sets some exception descriptions explicitly, and makes a reference count call for the Python None object (it's not created anew, so we need to add a reference before passing it to Python). As before, to add this code to Python, compile and link into an object file; the Linux makefile in Example 19-9 builds the C source code for dynamic binding.

Example 19-9. PP2E\Integrate\Extend\Cenviron\makefile.cenviron

##################################################################

# Compile cenviron.c into cenviron.so--a shareable object file

# on Linux, which is loaded dynamically when first imported.

##################################################################

PY = $(MYPY)

cenviron.so: cenviron.c

    gcc cenviron.c -g -I$(PY)/Include -I$(PY) -fpic -shared -o cenviron.so

clean:

    rm -f *.pyc cenviron.so

To build, type make -f makefile.cenviron at your shell. To run, make sure the .so file is in a directory on Python's module path ("." works too):

[mark@toy ~/.../PP2E/Integrate/Extend/Cenviron]$ python

>>> import cenviron

>>> cenviron.getenv('USER')                # like os.environ[key] but refetched

'mark'

>>> cenviron.putenv('USER', 'gilligan')    # like os.environ[key]=value

>>> cenviron.getenv('USER')                # C sees the changes too

'gilligan'

As before, cenviron is a bona fide Python module object after it is imported, with all the usual attached information:

>>> dir(cenviron)

['__doc__', '__file__', '__name__', 'getenv', 'putenv']

>>> cenviron.__file__

'./cenviron.so'

>>> cenviron.__name__

'cenviron'

>>> cenviron.getenv

<built-in function getenv>

>>> cenviron

<module 'cenviron' from './cenviron.so'>

>>> print cenviron.getenv('HOST'), cenviron.getenv('DISPLAY')

toy :0.0

Here is an example of the problem this module addresses (but you have to pretend that the getenv calls are made by linked-in C code, not Python):

>>> import os

>>> os.environ['USER']                      # initialized from the shell

'skipper'

>>> from cenviron import getenv, putenv     # direct C library call access

>>> getenv('USER')

'skipper'

>>> putenv('USER', 'gilligan')              # changes for C but not Python

>>> getenv('USER')

'gilligan'

>>> os.environ['USER']                      # oops--does not fetch values again

'skipper'

As is, the C extension module exports a function-based interface, but you can wrap its functions in Python code that makes the interface look any way you like. For instance, Example 19-10 makes the functions accessible by dictionary indexing, and integrates with the os.environ object.

Example 19-10. PP2E\Integrate\Extend\Cenviron\envmap.py

import os

from cenviron import getenv, putenv       # get C module's methods

class EnvMapping:                         # wrap in a Python class

    def __setitem__(self, key, value):

        os.environ[key] = value           # on writes: Env[key]=value

        putenv(key, value)                # put in os.environ too

    def __getitem__(self, key):

        value = getenv(key)               # on reads: Env[key]

        os.environ[key] = value           # integrity check

        return value

Env = EnvMapping(  )                        # make one instance

And Example 19-11 exports the functions as qualified attribute names instead of calls. The point here is that you can graft many different sorts of interface models on top of extension functions by providing Python wrappers (an idea we'll revisit when we meet type wrappers and SWIG shadow classes later in this chapter).

Example 19-11. PP2E\Integrate\Extend\Cenviron\envattr.py

import os

from cenviron import getenv, putenv       # get C module's methods

class EnvWrapper:                         # wrap in a Python class

    def __setattr__(self, name, value):

        os.environ[name] = value          # on writes: Env.name=value

        putenv(name, value)               # put in os.environ too

    def __getattr__(self, name):

        value = getenv(name)              # on reads: Env.name

        os.environ[name] = value          # integrity check

        return value

Env = EnvWrapper(  )                        # make one instance

19.5.1 But Don't Do That Either -- SWIG

You can manually code extension modules like we just did, but you don't necessarily have to. Because this example really just wraps functions that already exist in standard C libraries, the entire cenviron.c C code file of Example 19-8 can be replaced with a simple SWIG input file that looks like Example 19-12.

Example 19-12. PP2E\Integrate\Extend\Swig\Environ\environ.i

/***************************************************************

 * Swig module description file, to generate all Python wrapper

 * code for C lib getenv/putenv calls: "swig -python environ.i".

 ***************************************************************/

%module environ

%{

#include <stdlib.h>

%}

extern char * getenv(const char *varname);

extern int    putenv(const char *assignment);

And you're done. Well, almost; you still need to run this file through SWIG and compile its output. As before, simply add a SWIG step to your makefile, compile its output file into a shareable object, and you're in business. Example 19-13 is a Linux makefile that does the job.

Example 19-13. PP2E\Integrate\Extend\Swig\Environ\makefile.environ-swig

# build environ.so extension from SWIG generated code

# unless you've run make install

SWIG = ../myswig

PY   = $(MYPY)

environ.so: environ_wrap.c

        gcc environ_wrap.c -g -I$(PY)/Include -I$(PY) -shared -o environ.so

environ_wrap.c: environ.i

        $(SWIG) -python environ.i

clean:

        rm -f *.o *.so core

force:

        rm -f *.o *.so core environ_wrap.c environ_wrap.doc

When run on environ.i, SWIG generates two files -- environ_wrap.doc (a list of wrapper function descriptions) and environ_wrap.c (the glue code module file). Because the functions being wrapped here live in standard linked-in C libraries, there is nothing to combine with the generated code; this makefile simply runs SWIG and compiles the wrapper file into a C extension module, ready to be imported:

[mark@toy ~/....../Integrate/Extend/Swig/Environ]$ make -f makefile.environ-swig

../myswig -python environ.i

Generating wrappers for Python

gcc environ_wrap.c -g -I/...  more...  -shared -o environ.so

And now you're really done. The resulting C extension module is linked when imported, and used as before (except that SWIG handled all the gory bits):

[mark@toy ~/....../Integrate/Extend/Swig/Environ]$ python

>>> import environ

>>> environ.getenv('USER')

'mark'

>>> environ.putenv('USER=gilligan')             # use C lib call pattern now

>>> environ.getenv('USER')

'gilligan'

>>> dir(environ)

['__doc__', '__file__', '__name__', 'getenv', 'putenv']

>>> environ.__name__, environ.__file__, environ

('environ', './environ.so', <module 'environ' from './environ.so'>)

You could also run SWIG over the C header file where getenv and putenv are defined, but that would result in wrappers for every function in the header file. With the input file coded here, you'll wrap only two library functions.

19.6 A C Extension Module String Stack

Let's kick it up another notch -- the following C extension module implements a stack of strings for use in Python scripts. Example 19-14 demonstrates additional API calls, but also serves as a basis of comparison. It is roughly equivalent to the Python stack module we met earlier in Chapter 14 but it stacks only strings (not arbitrary objects), has limited string storage and stack lengths, and is written in C.

Alas, the last point makes for a complicated program listing -- C code is never quite as nice to look at as equivalent Python code. C must declare variables, manage memory, implement data structures, and include lots of extra syntax. Unless you're a big fan of C, you should focus on the Python interface code in this file, not the internals of its functions.

Example 19-14. PP2E\Integrate\Extend\Stacks\stackmod.c

/*****************************************************

 * stackmod.c: a shared stack of character-strings;

 * a C extension module for use in Python programs;

 * linked into python libraries or loaded on import;

 *****************************************************/

#include "Python.h"             /* Python header files */

#include <stdio.h>              /* C header files */

#include <string.h>

static PyObject *ErrorObject;   /* locally-raised exception */

#define onError(message) \

       { PyErr_SetString(ErrorObject, message); return NULL; }

/******************************************************************************

* LOCAL LOGIC/DATA (THE STACK)

******************************************************************************/

#define MAXCHARS 2048

#define MAXSTACK MAXCHARS

static int  top = 0;                 /* index into 'stack' */

static int  len = 0;                 /* size of 'strings' */

static char *stack[MAXSTACK];        /* pointers into 'strings' */

static char strings[MAXCHARS];       /* string-storage area */

/******************************************************************************

* EXPORTED MODULE METHODS/FUNCTIONS

******************************************************************************/

static PyObject *

stack_push(PyObject *self, PyObject *args)       /* args: (string) */

    char *pstr;

    if (!PyArg_ParseTuple(args, "s", &pstr))     /* convert args: Python->C */

        return NULL;                             /* NULL triggers exception */

    if (top == MAXSTACK)                         /* python sets arg-error msg */

        onError("stack overflow")                /* iff maxstack < maxchars */

    if (len + strlen(pstr) + 1 >= MAXCHARS)

        onError("string-space overflow")

    else {

        strcpy(strings + len, pstr);             /* store in string-space */

        stack[top++] = &(strings[len]);          /* push start address */

        len += (strlen(pstr) + 1);               /* new string-space size */

        Py_INCREF(Py_None);                      /* a 'procedure' call */

        return Py_None;                          /* None: no errors */

static PyObject *

stack_pop(PyObject *self, PyObject *args)

{                                                /* no arguments for pop */

    PyObject *pstr;

    if (!PyArg_ParseTuple(args, ""))             /* verify no args passed */

        return NULL;

    if (top == 0)

        onError("stack underflow")               /* return NULL = raise */

    else {

        pstr = Py_BuildValue("s", stack[--top]); /* convert result: C->Py */

        len -= (strlen(stack[top]) + 1);

        return pstr;                             /* return new python string */

    }                                            /* pstr ref-count++ already */

static PyObject *

stack_top(PyObject *self, PyObject *args)        /* almost same as item(-1) */

{                                                /* but different errors */

    PyObject *result = stack_pop(self, args);    /* get top string */

    if (result != NULL)

        len += (strlen(stack[top++]) + 1);       /* undo pop */

    return result;                               /* NULL or string object */

static PyObject *

stack_empty(PyObject *self, PyObject *args)      /* no args: '(  )' */

    if (!PyArg_ParseTuple(args, ""))             /* or PyArg_NoArgs */

        return NULL;

    return Py_BuildValue("i", top == 0);         /* boolean: a python int */

static PyObject *

stack_member(PyObject *self, PyObject *args)

    int i;

    char *pstr;

    if (!PyArg_ParseTuple(args, "s", &pstr))

        return NULL;

    for (i = 0; i < top; i++)                /* find arg in stack */

        if (strcmp(pstr, stack[i]) == 0)

            return PyInt_FromLong(1);        /* send back a python int */

    return PyInt_FromLong(0);                /* same as Py_BuildValue("i" */

static PyObject *

stack_item(PyObject *self, PyObject *args)    /* return Python string or NULL */

{                                             /* inputs = (index): Python int */

    int index;

    if (!PyArg_ParseTuple(args, "i", &index))    /* convert args to C */

        return NULL;                             /* bad type or arg count? */

    if (index < 0)

        index = top + index;                     /* negative: offset from end */

    if (index < 0 || index >= top)

        onError("index out-of-bounds")           /* return NULL = 'raise' */

    else

        return Py_BuildValue("s", stack[index]); /* convert result to Python */

}                                                /* no need to INCREF new obj */

static PyObject *

stack_len(PyObject *self, PyObject *args)     /* return a Python int or NULL */

{                                             /* no inputs */

    if (!PyArg_ParseTuple(args, ""))

        return NULL;

    return PyInt_FromLong(top);               /* wrap in python object */

static PyObject *

stack_dump(PyObject *self, PyObject *args)    /* not "print": reserved word */

    int i;

    if (!PyArg_ParseTuple(args, ""))

        return NULL;

    printf("[Stack:\n");

    for (i=top-1; i >= 0; i--)                   /* formatted output */

        printf("%d: '%s'\n", i, stack[i]);

    printf("]\n");

    Py_INCREF(Py_None);

    return Py_None;

/******************************************************************************

* METHOD REGISTRATION TABLE: NAME-STRING -> FUNCTION-POINTER

******************************************************************************/

static struct PyMethodDef stack_methods[] = {

 {"push",       stack_push,     1},                /* name, address */

 {"pop",        stack_pop,      1},                /* '1'=always tuple args */

 {"top",        stack_top,      1},

 {"empty",      stack_empty,    1},

 {"member",     stack_member,   1},

 {"item",       stack_item,     1},

 {"len",        stack_len,      1},

 {"dump",       stack_dump,     1},

 {NULL,         NULL}                              /* end, for initmodule */

};

/******************************************************************************

* INITIALIZATION FUNCTION (IMPORT-TIME)

******************************************************************************/

void

initstackmod(  )

    PyObject *m, *d;

    /* create the module and add the functions */

    m = Py_InitModule("stackmod", stack_methods);        /* registration hook */

    /* add symbolic constants to the module */

    d = PyModule_GetDict(m);

    ErrorObject = Py_BuildValue("s", "stackmod.error");  /* export exception */

    PyDict_SetItemString(d, "error", ErrorObject);       /* add more if need */

    /* check for errors */

    if (PyErr_Occurred(  ))

        Py_FatalError("can't initialize module stackmod");

This C extension file is compiled and statically or dynamically linked with the interpreter just like in previous examples. File makefile.stack on the CD (see http://examples.oreilly.com/python2) handles the build with a rule like this:

stackmod.so: stackmod.c

    gcc stackmod.c -g -I$(PY)/Include -I$(PY) -fpic -shared -o stackmod.so

The whole point of implementing such a stack in a C extension module (apart from demonstrating API calls in a Python book) is optimization: in theory, this code should present a similar interface to the Python stack module we wrote earlier, but run considerably faster due to its C coding. The interface is roughly the same, though we've sacrificed some Python flexibility by moving to C -- there are limits on size and stackable object types:

[mark@toy ~/.../PP2E/Integrate/Extend/Stacks]$ python

>>> import stackmod                                      # load C module

>>> stackmod.push('new')                                 # call C functions

>>> stackmod.dump(  )                                      # dump format differs

[Stack:

0: 'new'

>>> for c in "SPAM": stackmod.push(c)

...

>>> stackmod.dump(  )

[Stack:

4: 'M'

3: 'A'

2: 'P'

1: 'S'

0: 'new'

>>> stackmod.len(), stackmod.top(  )

(5, 'M')

>>> x = stackmod.pop(  )

>>> x

'M'

>>> stackmod.dump(  )

[Stack:

3: 'A'

2: 'P'

1: 'S'

0: 'new'

>>> stackmod.push(99)

Traceback (innermost last):

  File "<stdin>", line 1, in ?

TypeError: argument 1: expected string, int found

Some of the C stack's type and size limitations could be removed by alternate C coding (which might eventually create something that looks and performs almost exactly like a Python built-in list). Before we check on this stack's speed, though, we'll see what can be done about also optimizing our stack classes with a C type.

19.6.1 But Don't Do That Either -- SWIG

You can manually code extension modules like this, but you don't necessarily have to. As we saw earlier, if you instead code the stack module's functions without any notion of Python integration, they can be integrated into Python automatically by running their type signatures through SWIG. I haven't coded these functions that way here, because I also need to teach the underlying Python C extension API. But if I were asked to write a C string stack for Python in any other context, I'd do it with SWIG instead.

19.7 A C Extension Type String Stack

To implement multiple-instance objects in C, you need to code a C extension type, not a module. Like Python classes, C types generate multiple-instance objects and can overload (i.e., intercept and implement) Python expression operators and type operations. Unlike classes, though, types do not support attribute inheritance by themselves -- attributes are fetched from a flat names table, not a namespace objects tree. That makes sense if you realize that Python's built-in types are simply precoded C extension types; when you ask for the list append method, for instance, inheritance never enters the picture. We can add inheritance for types by coding "wrapper" classes, but it is a manual process (more on this later).

One of the biggest drawbacks of types, though, is their size -- to implement a realistically equipped C type, you need to code lots of not-very-pretty C code, and fill out type descriptor tables with pointers to link up operation handlers. In fact, C extension types are so complex that I'm going to cut some details here. To give you a feel for the overall structure, Example 19-15 presents a C string stack type implementation, but with the bodies of all its functions stripped out. For the complete implementation, see this file on the book's CD (see http://examples.oreilly.com/python2).

This C type roughly implements the same interface as the stack classes we met earlier in Chapter 17, but imposes a few limits on the stack itself and does not support specialization by subclassing (it's a type, not a class). The stripped parts use the same algorithms as the C module in Example 19-14, but operate on the passed-in self object, which now refers to the particular type instance object being processed, just as the first argument does in class methods. In types, self is a pointer to an allocated C struct that represents a type instance object.

Example 19-15. PP2E\Integrate\Extend\Stacks\stacktyp.c

/****************************************************

 * stacktyp.c: a character-string stack data-type;

 * a C extension type, for use in Python programs;

 * stacktype module clients can make multiple stacks;

 * similar to stackmod, but 'self' is the instance,

 * and we can overload sequence operators here;

 ****************************************************/

#include "Python.h"

static PyObject *ErrorObject;      /* local exception */

#define onError(message) \

       { PyErr_SetString(ErrorObject, message); return NULL; }

/*****************************************************************************

 * STACK-TYPE INFORMATION

 *****************************************************************************/

#define MAXCHARS 2048

#define MAXSTACK MAXCHARS

typedef struct {                 /* stack instance object format */

    PyObject_HEAD                /* python header: ref-count + &typeobject */

    int top, len;

    char *stack[MAXSTACK];       /* per-instance state info */

    char strings[MAXCHARS];      /* same as stackmod, but multiple copies */

} stackobject;

/*****************************************************************************

 * INSTANCE METHODS

 *****************************************************************************/

static PyObject *             /* on "instance.push(arg)" */

stack_push(self, args)        /* 'self' is the stack instance object */

    stackobject *self;        /* 'args' are args passed to self.push method */

    PyObject    *args;

{    ...

static PyObject *

stack_pop(self, args)

    stackobject *self;

    PyObject    *args;        /* on "instance.pop(  )" */

{    ...

static PyObject *

stack_top(self, args)

    stackobject *self;

    PyObject    *args;

{    ...

static PyObject *

stack_empty(self, args)

    stackobject *self;

    PyObject    *args;

{    ...

static struct PyMethodDef stack_methods[] = {     /* instance methods */

 {"push",       stack_push,     1},               /* name/address table */

 {"pop",        stack_pop,      1},               /* like list append,sort */

 {"top",        stack_top,      1},

 {"empty",      stack_empty,    1},               /* extra ops besides optrs */

 {NULL,         NULL}                             /* end, for getattr here */

};

/*****************************************************************************

 * BASIC TYPE-OPERATIONS

 *****************************************************************************/

static stackobject *             /* on "x = stacktype.Stack(  )" */

newstackobject(  )                 /* instance constructor function */

{   ...                            /* these don't get an 'args' input */

static void                      /* instance destructor function */

stack_dealloc(self)              /* when reference-count reaches zero */

    stackobject *self;

{   ...                            /* do cleanup activity */

static int

stack_print(self, fp, flags)

    stackobject *self;

    FILE *fp;

    int flags;                   /* print self to file */

{   ...

static PyObject *

stack_getattr(self, name)        /* on "instance.attr" reference  */

    stackobject *self;           /* make a bound-method or member */

    char *name;

{   ...

static int

stack_compare(v, w)              /* on all comparisons */

    stackobject *v, *w;

{   ...

/*****************************************************************************

 * SEQUENCE TYPE-OPERATIONS

 *****************************************************************************/

static int

stack_length(self)

    stackobject *self;               /* called on "len(instance)" */

{   ...

static PyObject *

stack_concat(self, other)

    stackobject *self;               /* on "instance + other" */

    PyObject    *other;              /* 'self' is the instance */

{   ...

static PyObject *

stack_repeat(self, n)                /* on "instance * N" */

    stackobject *self;               /* new stack = repeat self n times */

    int n;

{   ...

static PyObject *

stack_item(self, index)              /* on "instance[offset]", "in/for" */

    stackobject *self;               /* return the i-th item of self */

    int index;                       /* negative index pre-adjusted */

{   ...

static PyObject *

stack_slice(self, ilow, ihigh)

    stackobject *self;               /* on "instance[ilow:ihigh]" */

    int ilow, ihigh;                 /* negative-adjusted, not scaled */

{   ...

/*****************************************************************************

 * TYPE DESCRIPTORS

 *****************************************************************************/

static PySequenceMethods stack_as_sequence = {  /* sequence supplement     */

      (inquiry)       stack_length,             /* sq_length    "len(x)"   */

      (binaryfunc)    stack_concat,             /* sq_concat    "x + y"    */

      (intargfunc)    stack_repeat,             /* sq_repeat    "x * n"    */

      (intargfunc)    stack_item,               /* sq_item      "x[i], in" */

      (intintargfunc) stack_slice,              /* sq_slice     "x[i:j]"   */

      (intobjargproc)     0,                    /* sq_ass_item  "x[i] = v" */

      (intintobjargproc)  0,                    /* sq_ass_slice "x[i:j]=v" */

};

static PyTypeObject Stacktype = {      /* main python type-descriptor */

  /* type header */                    /* shared by all instances */

      PyObject_HEAD_INIT(&PyType_Type)

      0,                               /* ob_size */

      "stack",                         /* tp_name */

      sizeof(stackobject),             /* tp_basicsize */

      0,                               /* tp_itemsize */

  /* standard methods */

      (destructor)  stack_dealloc,     /* tp_dealloc  ref-count==0  */

      (printfunc)   stack_print,       /* tp_print    "print x"     */

      (getattrfunc) stack_getattr,     /* tp_getattr  "x.attr"      */

      (setattrfunc) 0,                 /* tp_setattr  "x.attr=v"    */

      (cmpfunc)     stack_compare,     /* tp_compare  "x > y"       */

      (reprfunc)    0,                 /* tp_repr     `x`, print x  */

  /* type categories */

      0,                               /* tp_as_number   +,-,*,/,%,&,>>,...*/

      &stack_as_sequence,              /* tp_as_sequence +,[i],[i:j],len, ...*/

      0,                               /* tp_as_mapping  [key], len, ...*/

  /* more methods */

      (hashfunc)     0,                /* tp_hash    "dict[x]" */

      (ternaryfunc)  0,                /* tp_call    "x(  )"     */

      (reprfunc)     0,                /* tp_str     "str(x)"  */

};  /* plus others: see Include/object.h */

/*****************************************************************************

 * MODULE LOGIC

 *****************************************************************************/

static PyObject *

stacktype_new(self, args)                 /* on "x = stacktype.Stack(  )" */

    PyObject *self;                       /* self not used */

    PyObject *args;                       /* constructor args */

    if (!PyArg_ParseTuple(args, ""))      /* Module-method function */

        return NULL;

    return (PyObject *)newstackobject(  );  /* make a new type-instance object */

}                                         /* the hook from module to type... */

static struct PyMethodDef stacktype_methods[] = {

    {"Stack",  stacktype_new,  1},             /* one function: make a stack */

    {NULL,     NULL}                           /* end marker, for initmodule */

};

void

initstacktype(  )                 /* on first "import stacktype" */

    PyObject *m, *d;

    m = Py_InitModule("stacktype", stacktype_methods);   /* make the module, */

    d = PyModule_GetDict(m);                             /* with 'Stack' func */

    ErrorObject = Py_BuildValue("s", "stacktype.error");

    PyDict_SetItemString(d, "error", ErrorObject);       /* export exception */

    if (PyErr_Occurred(  ))

        Py_FatalError("can't initialize module stacktype");

19.7.1 Anatomy of a C Extension Type

Although most of file stacktyp.c is missing, there is enough here to illustrate the global structure common to C type implementations:

Instance struct

The file starts off by defining a C struct called stackobject that will be used to hold per-instance state information -- each generated instance object gets a newly malloc'd copy of the struct. It serves the same function as class instance attribute dictionaries, and contains data that was saved in global variables by the C stack module.

Instance methods

As in the module, a set of instance methods follows next; they implement method calls such as push and pop. But here, method functions process the implied instance object, passed in to the self argument. This is similar in spirit to class methods. Type instance methods are looked up in the registration table of the code listing (Example 19-15) when accessed.

Basic type operations

Next, the file defines functions to handle basic operations common to all types: creation, printing, qualification, and so on. These functions have more specific type signatures than instance method handlers. The object creation handler allocates a new stack struct, and initializes its header fields: the reference count is set to 1, and its type object pointer is set to the Stacktype type descriptor that appears later in the file.

Sequence operations

Functions for handling sequence type operations come next. Stacks respond to most sequence operators: len, +, *, and [i]. Much like the __getitem__ class method, the stack_item indexing handler performs indexing, but also in membership tests and for iterator loops. These latter two work by indexing an object until an IndexError exception is caught by Python.

Type descriptors

The type descriptor tables (really, structs) that appear near the end of the file are the crux of the matter for types -- Python uses these tables to dispatch an operation performed on an instance object to the corresponding C handler function in this file. In fact, everything is routed through these tables; even method attribute lookups start by running a C stack_getattr function listed in the table (which in turn looks up the attribute name in a name/function-pointer table). The main Stacktype table includes a link to the supplemental stack_as_sequence table where sequence operation handlers are registered; types can provide such tables to register handlers for mapping, number, and sequence operation sets. See Python's integer and dictionary objects' source code for number and mapping examples; they are analogous to the sequence type here, but their operation tables vary.^[5]

Constructor module

Besides defining a C type, this file also creates a simple C module at the end that exports a stacktype.Stack constructor function, which Python scripts call to generate new stack instance objects. The initialization function for this module is the only C name in this file that is not static (local to the file); everything else is reached by following pointers -- from instance, to type descriptor, to C handler function.

Again, see the book CD (see http://examples.oreilly.com/python2) for the full C stack type implementation. But to give you the general flavor of C type methods, here is what the C type's pop function looks like; compare this with the C module's pop function to see how the self argument is used to access per-instance information in types:

static PyObject *

stack_pop(self, args)

    stackobject *self;

    PyObject *args;                            /* on "instance.pop()" */

    PyObject *pstr;

    if (!PyArg_ParseTuple(args, ""))           /* verify no args passed */

        return NULL;

    if (self->top == 0)

        onError("stack underflow")             /* return NULL = raise */

    else {

        pstr = Py_BuildValue("s", self->stack[--self->top]);

        self->len -= (strlen(self->stack[self->top]) + 1);

        return pstr;

19.7.2 Compiling and Running

This C extension file is compiled and dynamically or statically linked like previous examples; file makefile.stack on the CD (see http://examples.oreilly.com/python2) handles the build like this:

stacktype.so: stacktyp.c

        gcc stacktyp.c -g -I$(PY)/Include -I$(PY) -fpic -shared -o stacktype.so

Once compiled, you can import the C module and make and use instances of the C type it defines much as if it were a Python class (but without inheritance). You would normally do this from a Python script, but the interactive prompt is a convenient place to test the basics:

[mark@toy ~/.../PP2E/Integrate/Extend/Stacks]$ python

>>> import stacktype                            # import C constructor module

>>> x = stacktype.Stack(  )                       # make C type instance object

>>> x.push('new')                               # call C type methods

>>> x                                           # call C type print handler

[Stack:

0: 'new'

>>> x[0]                                        # call C type index handler

'new'

>>> y = stacktype.Stack(  )                       # make another type instance

>>> for c in 'SPAM': y.push(c)                  # a distinct stack object

...

>>> y

[Stack:

3: 'M'

2: 'A'

1: 'P'

0: 'S'

>>> z = x + y                                   # call C type concat handler

>>> z

[Stack:

4: 'M'

3: 'A'

2: 'P'

1: 'S'

0: 'new'

>>> y.pop(  )

'M'

>>> len(z), z[0], z[-1]                         # for loops work too (indexing)

(5, 'new', 'M')

19.7.3 Timing the C Implementations

So how did we do on the optimization front this time? Let's resurrect that timer module we wrote back in Example 17-6 to compare the C stack module and type to the Python stack module and classes we coded in Chapter 17. Example 19-16 calculates the system time in seconds that it takes to run tests on all of this book's stack implementations.

Example 19-16. PP2E\Integrate\Extend\Stacks\exttime.py

#!/usr/local/bin/python

# time the C stack module and type extensions

# versus the object chapter's Python stack implementations

from PP2E.Dstruct.Basic.timer  import test      # second count function

from PP2E.Dstruct.Basic import stack1           # python stack module

from PP2E.Dstruct.Basic import stack2           # python stack class: +/slice

from PP2E.Dstruct.Basic import stack3           # python stack class: tuples

from PP2E.Dstruct.Basic import stack4           # python stack class: append/pop

import stackmod, stacktype                      # c extension type, module

from sys import argv

rept, pushes, pops, items = 200, 200, 200, 200  # default: 200 * (600 ops)

try:

    [rept, pushes, pops, items] = map(int, argv[1:])

except: pass

print 'reps=%d * [push=%d+pop=%d+fetch=%d]' % (rept, pushes, pops, items)

def moduleops(mod):

    for i in range(pushes): mod.push('hello')   # strings only for C

    for i in range(items):  t = mod.item(i)

    for i in range(pops):   mod.pop(  )

def objectops(Maker):                           # type has no init args

    x = Maker(  )                                 # type or class instance

    for i in range(pushes): x.push('hello')     # strings only for C

    for i in range(items):  t = x[i]

    for i in range(pops):   x.pop(  )

# test modules: python/c

print "Python module:", test(rept, moduleops, stack1)

print "C ext module: ", test(rept, moduleops, stackmod), '\n'

# test objects: class/type

print "Python simple Stack:", test(rept, objectops, stack2.Stack)

print "Python tuple  Stack:", test(rept, objectops, stack3.Stack)

print "Python append Stack:", test(rept, objectops, stack4.Stack)

print "C ext type Stack:   ", test(rept, objectops, stacktype.Stack)

Running this script on Linux produces the following results. As we saw before, the Python tuple stack is slightly better than the Python in-place append stack in typical use (when the stack is only pushed and popped), but it is slower when indexed. The first test here runs 200 repetitions of 200 stack pushes and pops, or 80,000 stack operations (200 x 400); times listed are test duration seconds:

[mark@toy ~/.../PP2E/Integrate/Extend/Stacks]$ python exttim.py 200 200 200 0

reps=200 * [push=200+pop=200+fetch=0]

Python module: 2.09

C ext module:  0.68

Python simple Stack: 2.15

Python tuple  Stack: 0.68

Python append Stack: 1.16

C ext type Stack:    0.5

[mark@toy ~/.../PP2E/Integrate/Extend/Stacks]$ python exttim.py 100 300 300 0

reps=100 * [push=300+pop=300+fetch=0]

Python module: 1.86

C ext module:  0.52

Python simple Stack: 1.91

Python tuple  Stack: 0.51

Python append Stack: 0.87

C ext type Stack:    0.38

At least when there are no indexing operations on the stack as in these two tests (just pushes and pops), the C type is only slightly faster than the best Python stack (tuples). In fact, it's almost a draw -- in these first two tests, the C type reports only a tenth of a second speedup after 200 stacks and 80,000 stack operations. It's not exactly the kind of performance difference that would generate a bug report.^[6]

The C module comes in at roughly three times faster than the Python module, but these results are flawed. The stack1 Python module tested here uses the same slow stack implementation as the Python "simple" stack (stack2). If it was recoded to use the tuple stack representation used in Chapter 17, its speed would be similar to the "tuple" figures listed here, and almost identical to the speed of the C module in the first two tests:

[mark@toy ~/.../PP2E/Integrate/Extend/Stacks]$ python exttim.py 200 200 200 50

reps=200 * [push=200+pop=200+fetch=50]

Python module: 2.17

C ext module:  0.79

Python simple Stack: 2.24

Python tuple  Stack: 1.94

Python append Stack: 1.25

C ext type Stack:    0.52

[mark@toy ~/.../PP2E/Integrate/Extend/Stacks]$ python exttim.py

reps=200 * [push=200+pop=200+fetch=200]

Python module: 2.42

C ext module:  1.1

Python simple Stack: 2.54

Python tuple  Stack: 19.09

Python append Stack: 1.54

C ext type Stack:    0.63

But under the different usage patterns simulated in these two tests, the C type wins the race. It is about twice as fast as the best Python stack (append) when indexing is added to the test mix, as illustrated by two of the preceding test runs that ran with a nonzero fetch count. Similarly, the C module would be twice as fast as the best Python module coding in this case as well.

In other words, the fastest Python stacks are as good as the C stacks if you stick to pushes and pops, but the C stacks are roughly twice as fast if any indexing is performed. Moreover, since you have to pick one representation, if indexing is possible at all you would likely pick the Python append stack; assuming they represent the best case, C stacks would always be twice as fast.

Of course, the measured time differences are so small that in many applications you won't care. Further, the C stacks are much more difficult to program, and achieve their speed by imposing substantial functional limits; in many ways, this is not quite an apples-to-apples comparison. But as a rule of thumb, C extensions can not only integrate existing components for use in Python scripts, they can also optimize time-critical components of pure Python programs. In other scenarios, migration to C might yield an even larger speedup.

On the other hand, C extensions should generally be used only as a last resort. As we learned earlier, algorithms and data structures are often bigger influences on program performance than implementation language. The fact that Python-coded tuple stacks are just as fast as the C stacks under common usage patterns speaks volumes about the importance of data structure representation.

19.7.4 Wrapping C Types in Classes

In the current Python implementation, to add inheritance to C types you must have a class somewhere. The most common way to support type customization is to introduce a wrapper class -- a Python class that does little but keep a reference to a type object and pass all operations off to the type. Because such a wrapper adds a class interface on top of the type, though, it allows the underlying type to be subclassed and extended as though the type was a class. This is illustrated in Example 19-17.

Example 19-17. PP2E\Integrate\Extend\Stacks\oopstack.py

import stacktype                                # get the C type/module

class Stack:

    def __init__(self, start=None):             # make/wrap a C type-instance

        self._base = start or stacktype.Stack(  ) # deleted when class-instance is

    def __getattr__(self, name):

        return getattr(self._base, name)        # methods/members: type-instance

    def __cmp__(self, other):

        return cmp(self._base, other)

    def __repr__(self):                         # 'print' is not really repr

        print self._base,; return ''

    def __add__(self, other):                   # operators: special methods

        return Stack(self._base + other._base)  # operators are not attributes

    def __mul__(self, n):

        return Stack(self._base * n)            # wrap result in a new Stack

    def __getitem__(self, i):

        return self._base[i]                    # 'item': index, in, for

    def __len__(self):

        return len(self._base)

This wrapper class can be used the same as the C type, because it delegates all operations to the type instance stored away in the class instance's self._base:

[mark@toy ~/.../PP2E/Integrate/Extend/Stacks]$ python

>>> import oopstack

>>> x = oopstack.Stack(  )

>>> y = oopstack.Stack(  )

>>> x.push('class')

>>> for c in "SPAM": y.push(c)

...

>>> x

[Stack:

0: 'class'

>>> y[2]

'A'

>>> z = x + y

>>> for s in z: print s,

...

class S P A M

>>> z.__methods__, z.__members__, z.pop(  )

(['empty', 'pop', 'push', 'top'], ['len'], 'M')

>>> type(z), type(z._base)

(<type 'instance'>, <type 'stack'>)

The point of coding such a wrapper is to better support extensions in Python. Subclasses really subclass the wrapper class, but because the wrapper is just a thin interface to the type, it's like subclassing the type itself, as in Example 19-18.

Example 19-18. PP2E\Integrate\Extend\Stacks\substack.py

from oopstack import Stack              # get the 'stub' class (C-type wrapper)

class Substack(Stack):

    def __init__(self, start=[]):       # extend the 'new' operation

        Stack.__init__(self)            # initialize stack from any sequence

        for str in start:               # start can be another stack too

            self.push(str)

    def morestuff(self):                # add a new method

        print 'more stack stuff'

    def __getitem__(self, i):           # extend 'item' to trace accesses

        print 'accessing cell', i

        return Stack.__getitem__(self, i)

This subclass extends the type (wrapper) to support an initial value at construction time, prints trace messages when indexed, and introduces a brand new morestuff method. This subclass is limited (e.g., the result of a + is a Stack, not a Substack), but proves the point -- wrappers let you apply inheritance and composition techniques we've met in this book to new types coded in C:

>>> import substack

>>> a = substack.Substack(x + y)

>>> a

[Stack:

4: 'M'

3: 'A'

2: 'P'

1: 'S'

0: 'class'

>>> a[3]

accessing cell 3

'A'

>>> a.morestuff(  )

more stack stuff

>>> b = substack.Substack("C" + "++")

>>> b.pop(), b.pop(  )

('+', '+')

>>> c = b + substack.Substack(['-', '-'])

>>> for s in c: print s,

...

C - -

19.7.5 But Don't Do That Either -- SWIG

You can code C types manually like this, but you don't necessarily have to. Because SWIG knows how to generate glue code for C++ classes, you can instead automatically generate all the C extension and wrapper class code required to integrate such a stack object, simply by running SWIG over an appropriate class declaration. The next section shows how.

19.8 Wrapping C++ Classes with SWIG

One of the neater tricks SWIG can perform is class wrapper generation -- given a C++ class declaration and special command-line settings, SWIG generates:

· A C++ coded Python extension module with accessor functions that interface with the C++ class's methods and members

· A Python coded wrapper class (called a "shadow" class in SWIG-speak) that interfaces with the C++ class accessor functions module

As before, simply run SWIG in your makefile to scan the C++ class declaration and compile its outputs. The end result is that by importing the shadow class in your Python scripts, you can utilize C++ classes as though they were really coded in Python. Not only can Python programs make and use instances of the C++ class, they can also customize it by subclassing the generated shadow class.

19.8.1 A Little C++ Class (But Not Too Much)

To see how this all works, we need a C++ class. To illustrate, let's code a simple one to be used in Python scripts.^[7] The following C++ files define a Number class with three methods (add, sub, display), a data member (data), and a constructor and destructor. Example 19-19 shows the header file.

Example 19-19. PP2E\Integrate\Extend\Swig\Shadow\number.h

class Number

public:

    Number(int start);

    ~Number(  );

    void add(int value);

    void sub(int value);

    void display(  );

    int data;

};

And Example 19-20 is the C++ class's implementation file; each method prints a message when called to trace class operations.

Example 19-20. PP2E\Integrate\Extend\Swig\Shadow\number.cxx

#include "number.h"

#include "iostream.h"

// #include "stdio.h"

Number::Number(int start) {

   data = start;

   cout << "Number: " << data << endl;    // cout and printf both work

   // printf("Number: %d\n", data);       // python print goes to stdout

Number::~Number(  ) {

   cout << "~Number: " << data << endl;

void Number::add(int value) {

   data += value;

   cout << "add " << value << endl;

void Number::sub(int value) {

   data -= value;

   cout << "sub " << value << endl;

void Number::display(  ) {

   cout << "Number = " << data << endl;

Just so that you can compare languages, here is how this class is used in a C++ program; Example 19-21 makes a Number object, call its methods, and fetches and sets its data attribute directly (C++ distinguishes between "members" and "methods," while they're usually both called "attributes" in Python).

Example 19-21. PP2E\Integrate\Extend\Swig\Shadow\main.cxx

#include "iostream.h"

#include "number.h"

main(  )

    Number *num;

    num = new Number(1);            // make a C++ class instance

    num->add(4);                    // call its methods

    num->display(  );

    num->sub(2);

    num->display(  );

    num->data = 99;                 // set C++ data member

    cout << num->data << endl;      // fetch C++ data member

    num->display(  );

    delete num;

You can use the g++ command-line C++ compiler program to compile and run this code on Linux. If you don't run Linux, you'll have to extrapolate (there are far too many C++ compiler differences to list here).

[mark@toy ~/.../PP2E/Integrate/Extend/Swig/Shadow]$ g++ main.cxx number.cxx

[mark@toy ~/.../PP2E/Integrate/Extend/Swig/Shadow]$ a.out

Number: 1

add 4

Number = 5

sub 2

Number = 3

Number = 99

~Number: 99

19.8.2 Wrapping the C++ Class with SWIG

Lets get back to Python. To use the C++ Number class in Python scripts, you need to code or generate a glue logic layer between the two languages, as in prior examples. To generate that layer automatically, just write a SWIG input file like the one shown in Example 19-22.

Example 19-22. PP2E\Integrate\Extend\Swig\Shadow\number.i

/********************************************************

 * Swig module description file for wrapping a C++ class.

 * Generate by saying "swig -python -shadow number.i".

 * The C module is generated in file number_wrap.c; here,

 * module 'number' refers to the number.py shadow class.

 ********************************************************/

%module number

%{

#include "number.h"

%}

%include number.h

This interface file simply directs SWIG to read the C++ class's type signature information from the included number.h header file. This time, SWIG uses the class declaration to generate three files, and two different Python modules:

· number_wrap.doc, a simple wrapper function description file

· number_wrap.c, a C++ extension module with class accessor functions

· number.py, a Python shadow class module that wraps accessor functions

The Linux makefile shown in Example 19-23 combines the generated C++ wrapper code module with the C++ class implementation file to create a numberc.so,the dynamically loaded extension module that must be in a directory on your Python module search path when imported from a Python script.

Example 19-23. PP2E\Integrate\Extend\Swig\Shadow\makefile.number-swig

###########################################################################

# Use SWIG to integrate the number.h C++ class for use in Python programs.

# Note: name "numberc.so" matters, because shadow class imports numberc.

###########################################################################

# unless you've run make install

SWIG = ../myswig

PY   = $(MYPY)

all: numberc.so number.py

# wrapper + real class

numberc.so: number_wrap.o number.o

        g++ -shared number_wrap.o number.o -o numberc.so

# generated class wrapper module

number_wrap.o: number_wrap.c number.h

        g++ number_wrap.c -c -g -I$(PY)/Include -I$(PY)

number_wrap.c: number.i

        $(SWIG) -c++ -python -shadow number.i

number.py: number.i

        $(SWIG) -c++ -python -shadow number.i

# wrapped C++ class code

number.o: number.cxx number.h

        g++ -c -g number.cxx

cxxtest:

        g++ main.cxx number.cxx

clean:

        rm -f *.pyc *.o *.so core a.out

force:

        rm -f *.pyc *.o *.so core a.out number.py number_wrap.c number_wrap.doc

As usual, run this makefile to generate and compile the necessary glue code into an extension module that can be imported by Python programs:

[mark@toy ~/....../Integrate/Extend/Swig/Shadow]$ make -f makefile.number-swig

Generating wrappers for Python

g++ number_wrap.c -c -g -I/...

g++ -c -g number.cxx

g++ -shared number_wrap.o number.o -o numberc.so

To help demystify SWIG's magic somewhat, here is a portion of the generated C++ number_wrap.c accessor functions module. You can find the full source file at http://examples.oreilly.com/python2 (or simply generate it yourself ). Notice that this file defines a simple C extension module of functions that generally expect a C++ object pointer to be passed in (i.e., a "this" pointer in C++ lingo). This is a slightly different structure than Example 19-17, which wrapped a C type with a Python class instead, but the net effect is similar:

..._wrap function implementations that run C++ operation syntax...

#define new_Number(_swigarg0) (new Number(_swigarg0))

static PyObject *_wrap_new_Number(PyObject *self, PyObject *args) {

    ...body deleted...

#define Number_add(_swigobj,_swigarg0)  (_swigobj->add(_swigarg0))

static PyObject *_wrap_Number_add(PyObject *self, PyObject *args) {

    ...body

deleted...

#define Number_data_get(_swigobj) ((int ) _swigobj->data)

static PyObject *_wrap_Number_data_get(PyObject *self, PyObject *args) {

    ...body deleted...

static PyMethodDef numbercMethods[] = {

         { "Number_data_get", _wrap_Number_data_get, 1 },

         { "Number_data_set", _wrap_Number_data_set, 1 },

         { "Number_display", _wrap_Number_display, 1 },

         { "Number_sub", _wrap_Number_sub, 1 },

         { "Number_add", _wrap_Number_add, 1 },

         { "delete_Number", _wrap_delete_Number, 1 },

         { "new_Number", _wrap_new_Number, 1 },

         { NULL, NULL }

};

SWIGEXPORT(void,initnumberc)(  ) {

         PyObject *m, *d;

         SWIG_globals = SWIG_newvarlink(  );

         m = Py_InitModule("numberc", numbercMethods);

         d = PyModule_GetDict(m);

On top of the accessor functions module, SWIG generates number.py, the following shadow class that Python scripts import as the actual interface to the class. This code is a bit more complicated than the wrapper class we saw in the prior section, because it manages object ownership and therefore handles new and existing objects differently. The important thing to notice is that it is a straight Python class that saves the C++ "this" pointer of the associated C++ object, and passes control to accessor functions in the generated C++ extension module:

import numberc

class NumberPtr :

    def __init__(self,this):

        self.this = this

        self.thisown = 0

    def __del__(self):

        if self.thisown == 1 :

            numberc.delete_Number(self.this)

    def add(self,arg0):

        val = numberc.Number_add(self.this,arg0)

        return val

    def sub(self,arg0):

        val = numberc.Number_sub(self.this,arg0)

        return val

    def display(self):

        val = numberc.Number_display(self.this)

        return val

    def __setattr__(self,name,value):

        if name == "data" :

            numberc.Number_data_set(self.this,value)

            return

        self.__dict__[name] = value

    def __getattr__(self,name):

        if name == "data" :

            return numberc.Number_data_get(self.this)

        raise AttributeError,name

    def __repr__(self):

        return "<C Number instance>"

class Number(NumberPtr):

    def __init__(self,arg0) :

        self.this = numberc.new_Number(arg0)

        self.thisown = 1

A subtle thing: the generated C++ module file is named number_wrap.c, but the Python module name it gives in its initialization function is numberc, which is the name also imported by the shadow class. The import works because the combination of the glue code module and the C++ library file is linked into a file numberc.so such that the imported module file and initialization function names match. When using shadow classes and dynamic binding, the compiled object file's name must generally be the module name given in the .i file with an appended "c". In general, given an input file named interface.i:

%module interface

...declarations...

SWIG generates glue code file interface_wrap.c, which you should somehow compile into an interfacec.so file to be dynamically loaded on import:

swig -python -shadow interface.i

g++ -c interface.c interface_wrap.c

...more...

g++ -shared interface.o interface_wrap.o -o interfacec.so

The module name interface is reserved for the generated shadow class module, interface.py. Keep in mind that this implementation structure is subject to change at the whims of SWIG's creator, but the interface it yields should remain the same -- a Python class that shadows the C++ class, attribute for attribute.^[8]

19.8.3 Using the C++ Class in Python

Once the glue code is generated and compiled, Python scripts can access the C++ class as though it were coded in Python. Example 19-24 repeats the main.cxx file's class tests; here, though, the C++ class is being utilized from the Python programming language.

Example 19-24. PP2E\Integrate\Extend\Swig\Shadow\main.py

from number import Number       # use C++ class in Python (shadow class)

                                # runs same tests as main.cxx C++ file

num = Number(1)                 # make a C++ class object in Python

num.add(4)                      # call its methods from Python

num.display(  )                   # num saves the C++ 'this' pointer

num.sub(2)

num.display(  )

num.data = 99                   # set C++ data member, generated __setattr__

print num.data                  # get C++ data member, generated __getattr__

num.display(  )

del num                         # runs C++ destructor automatically

Because the C++ class and its wrappers are automatically loaded when imported by the number shadow class, you run this script like any other:

[mark@toy ~/....../Integrate/Extend/Swig/Shadow]$ python main.py

Number: 1

add 4

Number = 5

sub 2

Number = 3

Number = 99

~Number: 99

This output is mostly coming from the C++ class's methods, and is the same as the main.cxx results shown in Example 19-21. If you really want to use the generated accessor functions module, you can, as shown in Example 19-25.

Example 19-25. PP2E\Integrate\Extend\Swig\Shadow\main_low.py

from numberc import *           # same test as main.cxx

                                # use low-level C accessor function interface

num = new_Number(1)

Number_add(num, 4)              # pass C++ 'this' pointer explicitly

Number_display(num)             # use accessor functions in the C module

Number_sub(num, 2)

Number_display(num)

Number_data_set(num, 99)

print Number_data_get(num)

Number_display(num)

delete_Number(num)

This script generates the same output as main.py, but there is no obvious advantage to moving from the shadow class to functions here. By using the shadow class, you get both an object-based interface to C++ and a customizable Python object. For instance, the Python module shown in Example 19-26 extends the C++ class, adding an extra print statement to the C++ add method, and defining a brand new mul method. Because the shadow class is pure Python, this works naturally.

Example 19-26. PP2E\Integrate\Extend\Swig\Shadow\main_subclass.py

from number import Number       # sublass C++ class in Python (shadow class)

class MyNumber(Number):

    def add(self, other):

        print 'in Python add...'

        Number.add(self, other)

    def mul(self, other):

        print 'in Python mul...'

        self.data = self.data * other

num = MyNumber(1)               # same test as main.cxx

num.add(4)                      # using Python subclass of shadow class

num.display()                   # add(  ) is specialized in Python

num.sub(2)

num.display(  )

num.data = 99

print num.data

num.display(  )

num.mul(2)                      # mul(  ) is implemented in Python

num.display(  )

del num

Now we get extra messages out of add calls, and mul changes the C++ class's data member automatically when it assigns self.data:

[mark@toy ~/....../Integrate/Extend/Swig/Shadow]$ python main_subclass.py

Number: 1

in Python add...

add 4

Number = 5

sub 2

Number = 3

Number = 99

in Python mul...

Number = 198

~Number: 198

In other words, SWIG makes it easy to use C++ class libraries as base classes in your Python scripts. As usual, you can import the C++ class interactively to experiment with it some more:

[mark@toy ~/....../Integrate/Extend/Swig/Shadow]$ python

>>> import numberc

>>> numberc.__file__              # the C++ class plus generated glue module

'./numberc.so'

>>> import number                 # the generated Python shadow class module

>>> number.__file__

'number.pyc'

>>> x = number.Number(2)          # make a C++ class instance in Python

Number: 2

>>> y = number.Number(4)          # make another C++ object

Number: 4

>>> x, y

(<C Number instance>, <C Number instance>)

>>> x.display(  )                   # call C++ method (like C++ x->display(  ))

Number = 2

>>> x.add(y.data)                 # fetch C++ data member, call C++ method

add 4

>>> x.display(  )

Number = 6

>>> y.data = x.data + y.data + 32         # set C++ data member

>>> y.display(  )                           # y records the C++ this pointer

Number = 42

So what's the catch? Nothing much, really, but if you start using SWIG in earnest, the biggest downside is that SWIG cannot handle every feature of C++ today. If your classes use advanced C++ tools such as operator overloading and templates, you may need to hand-code simplified class type declarations for SWIG, instead of running SWIG over the original class header files.

Also, SWIG's current string-based pointer representation sidesteps conversion and type-safety issues and works well in most cases, but it has sometimes been accused of creating performance or interface complications when wrapping existing libraries. SWIG development is ongoing, so you should consult the SWIG manuals and web site for more details on these and other topics.

In return for any such trade-offs, though, SWIG can completely obviate the need to code glue layers to access C and C++ libraries from Python scripts. If you have ever coded such layers by hand in the past, you already know that this is a very big win.

If you do go the manual route, though, consult Python's standard extension manuals for more details on both API calls used in this and the next chapter, as well as additional extension tools we don't have space to cover in this text. C extensions can run the gamut from short SWIG input files to code that is staunchly wedded to the internals of the Python interpreter; as a rule of thumb, the former survives the ravages of time much better than the latter.

Mixing Python and C++

Python's standard implementation is currently coded in C, so all the normal rules about mixing C programs with C++ programs apply to the Python interpreter. In fact, there is nothing special about Python in this context, but here are a few pointers.

When embedding Python in a C++ program, there are no special rules to follow. Simply link in the Python library and call its functions from C++. Python's header files automatically wrap themselves in extern "C" {...} declarations to suppress C++ name-mangling. Hence, the Python library looks like any other C component to C++; there is no need to recompile Python itself with a C++ compiler.

When extending Python with C++ components, Python header files are still C++-friendly, so Python API calls in C++ extensions work like any other C++ to C call. But be sure to wrap the parts of your extension code made visible to Python with extern "C" declarations so that they may be called by Python's C code. For example, to wrap a C++ class, SWIG generates a C++ extension module that declares its initialization function this way, though the rest of the module is pure C++.

The only other potential complication involves C++ static or global object constructor methods when extending. If Python (a C program) is at the top level of a system, such C++ constructors may not be run when the system starts up. This behavior may vary per compiler, but if your C++ objects are not initialized on startup, make sure that your main program is linked by your C++ compiler, not C.

If you are interested in Python/C++ integration in general, be sure to consult the C++ special interest group (SIG) pages at http://www.python.org for information about work in this domain. The CXX system, for instance, makes it easier to extend Python with C++.

^[1] Yes, every time you make an integer or string in Python, you generate a new C type instance object (whether you know it or not). This isn't as inefficient as you may think, though; as we'll see, type operations are dispatched through fast C pointers, and Python internally caches some integers and strings to avoid object creation when possible. [back]

^[2] Because Python always searches the current working directory on imports, this chapter's examples will run from the directory you compile them in (".") without any file copies or moves. Being on PYTHONPATHmatters more in larger programs and installs. [back]

^[3] You can wade through this generated file on the book's CD (see http://examples.oreilly.com/python2) if you are so inclined. Also see file PP2E\Integrate\Extend\HelloLib\hellolib_wrapper.con the CD for a hand-coded equivalent; it's shorter because SWIG also generates extra support code. [back]

^[4] This code is also open to customization (e.g., it can limit the set of shell variables read and written by checking names), but you could do the same by wrapping os.environ. In fact, because os.environ is simply a Python UserDict subclass that preloads shell variables on startup, you could almost add the required getenv call to load C layer changes by simply wrapping os.environ accesses in a Python class whose __getitem__ calls gentenv before passing the access off to os.environ. But you still need C's getenv call in the first place, and it's not available in os today. [back]

^[5] Note that type descriptor layouts, like most C API tools, are prone to change over time, and you should always consult Include/object.h in the Python distribution for an up-to-date list of fields. Some new Python releases may also require that types written to work with earlier releases be recompiled to pick up descriptor changes. As always, see Python's extension manuals and its full source code distribution for more information and examples. [back]

^[6] Interestingly, Python has gotten much faster since this book's first edition, relative to C. Back then, the C type was still almost three times faster than the best Python stack (tuples) when no indexing was performed. Today, it's almost a draw. One might infer from this that C migrations have become a third as important as they once were. [back]

^[7] For a more direct comparison, you could translate the stack type in Example 19-15 to a C++ class too, but that yields much more C++ code than I care to show in this Python book. Moreover, such a translation would sacrifice the type's operator overloading features (SWIG does not currently map C++ operator overloads). [back]

^[8] While I wrote this, Guido suggested a few times that a future Python release may merge the ideas of Python classes and C types more closely, and may even be rewritten in C++ to ease C++ integration in general. If and when that happens, it's possible that SWIG may use C types to wrap C++ classes, instead of the current accessor functions + Python class approach. Or not. Watch http://www.swig.org for more recent developments beyond the details presented in this book. [back]