24.1 Extending Python with Python's C API
A Python extension module named
x resides in a dynamic library with the
same filename (x.pyd on Windows,
x.so on most Unix-like platforms) in an
appropriate directory (normally the
site-packages subdirectory of the Python library
directory). You generally build the x
extension module from a C source file x.c with
the overall structure:
#include <Python.h>
/* omitted: the body of the x module */
void
initx(void)
{
/* omitted: the code that initializes the module named x */
}
When you have built and installed the extension module, a Python
statement import x
loads the dynamic library, then locates and calls the function named
initx, which must do
all that is needed to initialize the module object named
x.
24.1.1 Building and Installing C-Coded Python Extensions
To build and install a C-coded
Python extension module, it's simplest and most
productive to use the distribution utilities,
distutils, covered in Chapter 26. In the same directory as
x.c, place a file named
setup.py that contains at least the following
statements:
from distutils.core import setup, Extension
setup(name='x', ext_modules=[ Extension('x',sources=['x.c']) ])
From a shell prompt in this directory, you can now run:
C:\> python setup.py install
to build the module and install it so that it becomes usable in your
Python installation. The distutils perform all
needed compilation and linking steps, with the right compiler and
linker commands and flags, and copy the resulting dynamic library in
an appropriate directory, dependent on your Python installation. Your
Python code can then access the resulting module with the statement
import
x.
24.1.2 Overview of C-Coded Python Extension Modules
Your C
function initx
generally has the following overall structure:
void
initx(void)
{
PyObject* thismod = Py_InitModule3("x", x_methods, "docstring for x");
/* optional: calls to PyModule_AddObject(thismod, "somename", someobj)
and other Python C API calls to finish preparing module object
thismod and its types (if any) and other objects.
*/
}
More details are covered in
Section 24.1.4
later in this chapter. x_methods is an
array of PyMethodDef structs. Each
PyMethodDef struct in the
x_methods array describes a C function
that your module x makes available to
Python code that imports x. Each such C
function has the following overall structure:
static PyObject*
func_with_named_arguments(PyObject* self, PyObject* args, PyObject* kwds)
{
/* omitted: body of function, which accesses arguments via the Python C
API function PyArg_ParseTupleAndKeywords, and returns a PyObject*
result, NULL for errors */
}
or some simpler variant, such as:
static PyObject*
func_with_positional_args_only(PyObject* self, PyObject* args)
{
/* omitted: body of function, which accesses arguments via the Python C
API function PyArg_ParseTuple, and returns a PyObject* result,
NULL for errors */
}
How C-coded functions access arguments passed by Python code is
covered in Section 24.1.6
later in this chapter. How such functions build Python objects is
covered in Section 24.1.7, and how they raise or propagate exceptions back to the
Python code that called them is covered in Section 24.1.8. When your module defines
new Python types (as well as or instead of Python-callable
functions), your C code defines one or more instances of struct
PyTypeObject. This subject is covered in Section 24.1.12 later in this
chapter.
A simple example that makes use of all these concepts is shown in
Section 24.1.11 later in this chapter.
A toy-level "Hello World" example
could be as simple as:
#include <Python.h>
static PyObject*
helloworld(PyObject* self)
{
return Py_BuildValue("s", "Hello, C-coded Python extensions world!");
}
static char helloworld_docs[] =
"helloworld( ): return a popular greeting phrase\n";
static PyMethodDef helloworld_funcs[] = {
{"helloworld", (PyCFunction)helloworld, METH_NOARGS, helloworld_docs},
{NULL}
};
void
inithelloworld(void)
{
Py_InitModule3("helloworld", helloworld_funcs,
"Toy-level extension module");
}
Save this as helloworld.c, and build it through
a setup.py script with
distutils. After you have run python
setup.py install, you can use the newly installed module,
for example from a Python interactive session, such as:
>>> import helloworld
>>> print helloworld.helloworld( )
Hello, C-coded Python extensions world!
>>>
24.1.3 Return Values of Python's C API Functions
All functions in the Python
C API return either an int or a
PyObject*. Most functions returning
int return 0 in case of
success, and -1 to indicate errors. Some functions
return results that are true or false: those functions return
0 to indicate false and an integer not equal to
0 to indicate true, and never indicate errors.
Functions returning PyObject* return
NULL in case of errors. See
"Exceptions" later in this chapter
for more details on how C-coded functions handle and raise errors.
24.1.4 Module Initialization
Function
initx must contain, at
a minimum, a call to one of the module initialization functions
supplied by the C API. You can always use the
Py_InitModule3 function.
PyObject* Py_InitModule3(char* name,PyMethodDef* methods,char* doc)
|
|
name is the C string name of the module
you are initializing (e.g., "name").
methods is an array of
PyMethodDef structures, covered next in this
chapter. doc is the C string that becomes
the docstring of the module. Py_InitModule3
returns a PyObject* that is a borrowed reference
to the new module object, as covered in Section 24.1.5 later in this chapter. In
practice, this means that you can ignore the return value if you need
to perform no more initialization operations on this module.
Otherwise, assign the return value to a C variable of type
PyObject* and continue initialization.
Py_InitModule3 initializes the module object to
contain the functions described in table
methods. Further initialization, if any,
may add other module attributes, and is generally best performed with
calls to the following convenience functions.
int PyModule_AddIntConstant(PyObject* module,char* name,int value)
|
|
Adds to module module an attribute named
name with integer value
value.
int PyModule_AddObject(PyObject* module,char* name,PyObject* value)
|
|
Adds to module module an attribute named
name with value
value and steals a reference to value, as
covered in Section 24.1.5.
PyModule_AddStringConstant |
|
int PyModule_AddStringConstant(PyObject* module,char* name,char* value)
|
|
Adds to module module an attribute named
name with string value
value.
Some module initialization operations may be conveniently performed
by executing Python code with PyRun_String,
covered later in Section 24.3.4, with
the module's dictionary as both the
globals and
locals argument. If you find yourself
using PyRun_String extensively, rather than just
as an occasional convenience, consider the possibility of splitting
your extension module in two: a C-coded extension module offering
raw, fast functionality, and a Python module wrapping the C-coded
extension to provide further convenience and handy utilities.
When you do need to get a module's dictionary, use
the PyModule_GetDict function.
PyObject* PyModule_GetDict(PyObject* module)
|
|
Returns a borrowed reference to the dictionary of module
module. You should not use
PyModule_GetDict for the specific tasks supported
by the PyModule_Add functions covered earlier in
this section; I suggest using PyModule_GetDict
only for such purposes as supporting the use of
PyRun_String.
If you need to access another module, you can import it by calling
the PyImport_Import function.
PyObject* PyImport_Import(PyObject* name)
|
|
Imports the module named in Python string object
name and returns a new reference to the
module object, like Python's _ _import_
_(name).
PyImport_Import is the highest-level, simplest,
and most often used way to import a module.
Beware, in particular, of using function
PyImport_ImportModule, which may often look more
convenient because it accepts a char* argument.
PyImport_ImportModule operates on a lower level,
bypassing any import hooks that may be in force, so extensions that
use it will be far harder to incorporate in packages such as those
built by tools py2exe and
Installer, covered in Chapter 26. Therefore, always do your importing by
calling PyImport_Import, unless you have very
specific needs and know exactly what you're doing.
To add functions to a module (or non-special methods to new types, as
covered later in Section 24.1.12), you
must describe the functions or methods in an array of
PyMethodDef structures, and terminate the array
with a sentinel (i.e., a structure whose fields
are all 0 or NULL).
PyMethodDef is defined as follows:
typedef struct {
char* ml_name; /* Python name of function or method */
PyCFunction ml_meth; /* pointer to C function impl */
int ml_flags; /* flag describing how to pass arguments */
char* ml_doc; /* docstring for the function or method */
} PyMethodDef
You must cast the second field to (PyCFunction)
unless the C function's signature is exactly
PyObject*
function(PyObject*
self,
PyObject*
args), which is the
typedef for PyCFunction. This
signature is correct when ml_flags is
METH_O, meaning a function that accepts a single
argument, or METH_VARARGS, meaning a function that
accepts positional arguments. For METH_O,
args is the only argument. For
METH_VARARGS, args is a
tuple of all arguments, to be parsed with the C API function
PyArg_ParseTuple. However,
ml_flags can also be
METH_NOARGS, meaning a function that accepts no
arguments, or METH_KEYWORDS, meaning a function
that accepts both positional and named arguments. For
METH_NOARGS, the signature is
PyObject*
function(PyObject*
self), without
arguments. For METH_KEYWORDS, the signature is:
PyObject* function(PyObject* self, PyObject* args, PyObject* kwds)
args is the tuple of positional arguments,
and kwds the dictionary of named
arguments. args and
kwds are parsed together with the C API
function PyArg_ParseTupleAndKeywords.
When a C-coded function implements a module's
function, the self parameter of the C function is
always NULL for any value of the
ml_flags field. When a C-coded function implements
a non-special method of an extension type, the
self parameter points to the instance on which the
method is being called.
24.1.5 Reference Counting
Python
objects live on the heap, and C code sees them via
PyObject*. Each PyObject counts
how many references to itself are outstanding, and destroys itself
when the number of references goes down to 0. To
make this possible, your code must use Python-supplied macros:
Py_INCREF to add a reference to a Python object,
and Py_DECREF to abandon a reference to a Python
object. The Py_XINCREF and
Py_XDECREF macros are like
Py_INCREF and Py_DECREF, but
you may also use them innocuously on a null pointer. The test for a
non-null pointer is implicitly performed inside the
Py_XINCREF and Py_XDECREF
macros, which saves you from needing to write out that test
explicitly.
A PyObject* p, which
your code receives by calling or being called by other functions, is
known as a new reference if the code that
supplies p has already called
Py_INCREF on your behalf. Otherwise, it is called
a borrowed reference. Your
code is said to own new references it holds, but
not borrowed ones. You can call Py_INCREF on a
borrowed reference to make it into a reference that you own; you must
do this if you need to use the reference across calls to code that
might cause the count of the reference you borrowed to be
decremented. You must always call Py_DECREF before
abandoning or overwriting references that you own, but never on
references you don't own. Therefore, understanding
which interactions transfer reference ownership and which ones rely
on reference borrowing is absolutely crucial. For most functions in
the C API, and for all functions that you write and Python calls, the
following general rules apply:
PyObject* arguments are borrowed references
A PyObject* returned as the
function's result transfers ownership
For each of the two rules, there are occasional exceptions.
PyList_SetItem and
PyTuple_SetItem steal a
reference to the item they are setting (but not to the list or tuple
object into which they're setting it). So do the
faster versions of these two functions that exist as C preprocessor
macros, PyList_SET_ITEM and
PyTuple_SET_ITEM. So does
PyModule_AddObject, covered earlier in this
chapter. There are no other exceptions to the first rule. The
rationale for these exceptions, which may help you remember them, is
that the object you're setting is most often one you
created for the purpose, so the reference-stealing semantics save you
from having to call Py_DECREF immediately
afterward.
The second rule has more exceptions than the first one: there are
several cases in which the returned PyObject* is a
borrowed reference rather than a new reference. The abstract
functions, whose names begin with PyObject_,
PySequence_, PyMapping_, and
PyNumber_, return new references. This is because
you can call them on objects of many types, and there might not be
any other reference to the resulting object that they return (i.e.,
the returned object might be created on the fly). The concrete
functions, whose names begin with PyList_,
PyTuple_, PyDict_, and so on,
return a borrowed reference when the semantics of the object they
return ensure that there must be some other reference to the returned
object somewhere.
In this chapter, I indicate all cases of exceptions to these rules
(i.e., the return of borrowed references and the rare cases of
reference stealing from arguments) regarding all functions that I
cover. When I don't explicitly mention a function as
being an exception, it means that the function follows the rules: its
PyObject* arguments, if any, are borrowed
references, and its PyObject* result, if any, is a
new reference.
24.1.6 Accessing Arguments
A
function that has ml_flags in its
PyMethodDef set to METH_NOARGS
is called from Python with no arguments. The corresponding C function
has a signature with only one argument,
self. When ml_flags is
METH_O, Python code must call the function with
one argument. The C function's second argument is a
borrowed reference to the object that the Python caller passes as the
argument's value.
When ml_flags is METH_VARARGS,
Python code can call the function with any number of positional
arguments, which are collected as a tuple. The C
function's second argument is a borrowed reference
to the tuple. Your C code can then call the
PyArg_ParseTuple function.
int PyArg_ParseTuple(PyObject* tuple,char* format,...)
|
|
Returns 0 for errors, a value not equal to
0 for success. tuple is
the PyObject* that was the C
function's second argument.
format is a C string that describes
mandatory and optional arguments. The following arguments of
PyArg_ParseTuple are the addresses of the C
variables in which to put the values extracted from the tuple. Any
PyObject* variables among the C variables are
borrowed references. Table 24-1 lists the commonly
used code strings, of which zero or more are joined to form string
format.
Table 24-1. Format codes for PyArg_ParseTuple
c
|
char
|
A Python string of length 1 becomes a C
char
|
d
|
double
|
A Python float becomes a C
double
|
D
|
Py_Complex
|
A Python complex becomes a C
Py_Complex
|
f
|
float
|
A Python float becomes a C float
|
i
|
int
|
A Python int becomes a C int
|
l
|
long
|
A Python int becomes a C long
|
L
|
long long
|
A Python int becomes a C long
long (or _int64 on Windows)
|
O
|
PyObject*
|
Gets non-NULL borrowed reference to a Python
argument
|
O!
|
type + PyObject*
|
Like code O, plus type checking or
TypeError (see below)
|
O&
|
convert + void*
|
Arbitrary conversion (see below)
|
s
|
char*
|
Python string without embedded nulls to C char*
|
s#
|
char* + int
|
Any Python string to C address and length
|
t#
|
char* + int
|
Read-only single-segment buffer to C address and length
|
u
|
Py_UNICODE*
|
Python Unicode without embedded nulls to C (UTF-16)
|
u#
|
Py_UNICODE* + int
|
Any Python Unicode C (UTF-16) address and length
|
w#
|
char* + int
|
Read-write single-segment buffer to C address and length
|
z
|
char*
|
Like code s, also accepts None
(sets C's char* to
NULL)
|
z#
|
char* + int
|
Like code s#, also accepts None
(sets C's char* to
NULL)
|
(...)
|
as per ...
|
A Python sequence is treated as one argument per item
|
|
|
|
The following arguments are optional
|
:
|
|
Format finished, followed by function name for error messages
|
;
|
|
Format finished, followed by entire error message text
|
Code formats d to L accept
numeric arguments from Python. Python coerces the corresponding
values. For example, a code of i can correspond to
a Python float—the fractional part gets
truncated, as if built-in function int had been
called. Py_Complex is a C struct with two fields
named real and imag, both of
type double.
O is the most general format code and accepts any
argument, which you can later check and/or convert as needed. Variant
O! corresponds to two arguments in the variable
arguments: first the address of a Python type object, then the
address of a PyObject*. O!
checks that the corresponding value belongs to the given type (or any
subtype of that type) before setting the PyObject*
to point to the value. Variant O& also
corresponds to two arguments in the variable arguments: first the
address of a converter function you coded, then a
void* (i.e., any address at all). The converter
function must have signature int
convert(PyObject*,
void*). Python calls your conversion function with
the value passed from Python as the first argument and the
void* from the variable arguments as the second
argument. The conversion function must either return
0 and raise an exception (as covered in Section 24.1.8 later in this chapter) to
indicate an error, or return 1 and store whatever
is appropriate via the void* it gets.
Code format s accepts a string from Python and the
address of a char* (i.e., a
char**) among the variable arguments. It changes
the char* to point at the
string's buffer, which your C code must then treat
as a read-only, null-terminated array of chars
(i.e., a typical C string; however, your code must not modify it).
The Python string must contain no embedded null characters.
s# is similar, but corresponds to two arguments
among the variable arguments: first the address of a
char*, then the address of an
int to set to the string's
length. The Python string can contain embedded nulls, and therefore
so can the buffer to which the char* is set to
point. u and u# are similar,
but accept any Unicode string, and the C-side pointers must be
Py_UNICODE* rather than char*.
Py_UNICODE is a macro defined in
Python.h, and corresponds to the type of a
Python Unicode character in the implementation (this is often, but
not always, the same as a wchar_t in C).
t# and w# are similar to
s#, but the corresponding Python argument can be
any object of a type that respects the buffer protocol, respectively
read-only and read-write. Strings are a typical example of read-only
buffers. mmap and array
instances are typical examples of read-write buffers, and they are
also acceptable where a read-only buffer is required (i.e., for a
t#).
When one of the arguments is a Python sequence of known length, you
can use format codes for each of its items, and corresponding C
addresses among the variable arguments, by grouping the format codes
in parentheses. For example, code (ii) corresponds
to a Python sequence of two numbers, and, among the remaining
arguments, corresponds to two addresses of ints.
The format string may include a vertical bar (|)
to indicate that all following arguments are optional. You must
initialize the C variables, whose addresses you pass among the
variable arguments for later arguments, to suitable default values
before you call PyArg_ParseTuple.
PyArg_ParseTuple does not change the C variables
corresponding to optional arguments that were not passed in a given
call from Python to your C-coded function.
The format string may optionally end with
:name to indicate that
name must be used as the function name if
any error messages are needed. Alternatively, the format string may
end with ;text to
indicate that text must be used as the
entire error message if PyArg_ParseTuple detects
errors (this is rarely used).
A function that has ml_flags in its
PyMethodDef set to
METH_KEYWORDS accepts positional and keyword
arguments. Python code calls the function with any number of
positional arguments, which get collected as a tuple, and keyword
arguments, which get collected as a dictionary. The C
function's second argument is a borrowed reference
to the tuple, and the third one is a borrowed reference to the
dictionary. Your C code then calls the
PyArg_ParseTupleAndKeywords function.
PyArg_ParseTupleAndKeywords |
|
int PyArg_ParseTupleAndKeywords(PyObject* tuple,PyObject* dict,
char* format,char** kwlist,...)
|
|
Returns 0 for errors, a value not equal to
0 for success. tuple is
the PyObject* that was the C
function's second argument.
dict is the PyObject*
that was the C function's third argument.
format is like for
PyArg_ParseTuple, except that it cannot include
the (...) format code to parse nested sequences.
kwlist is an array of
char* terminated by a NULL
sentinel, with the names of the parameters, one after the other. For
example, the following C code:
static PyObject*
func_c(PyObject* self, PyObject* args, PyObject* kwds)
{
static char* argnames[] = {"x", "y", "z", NULL};
double x, y=0.0, z=0.0;
if(!PyArg_ParseTupleAndKeywords(
args,kwds,"d|dd",argnames,&x,&y,&z))
return NULL;
/* rest of function snipped */ is roughly equivalent to this Python code:
def func_py(x, y=0.0, z=0.0):
x, y, z = map(float, (x,y,z))
# rest of function snipped
24.1.7 Creating Python Values
C functions that communicate with Python
must often build Python values, both to return as their
PyObject* result and for other purposes, such as
setting items and attributes. The simplest and handiest way to build
a Python value is most often with the
Py_BuildValue function.
PyObject* Py_BuildValue(char* format,...)
|
|
format is a C string that describes the
Python object to build. The following arguments of
Py_BuildValue are C values from which the result
is built. The PyObject* result is a new reference.
Table 24-2 lists the commonly used code strings, of
which zero or more are joined into string
format. Py_BuildValue
builds and returns a tuple if format
contains two or more format codes, or if
format begins with (
and ends with ). Otherwise, the result is not a
tuple. When you pass buffers, as for example in the case of format
code s#, Py_BuildValue copies
the data. You can therefore modify, abandon, or free(
) your original copy of the data after
Py_BuildValue returns.
Py_BuildValue always returns a new reference
(except for format code N). Called with an empty
format,
Py_BuildValue("") returns a new reference to
None.
Table 24-2. Format codes for Py_BuildValue
c
|
char
|
A C char becomes a Python string of length
1
|
d
|
double
|
A C double becomes a Python
float
|
D
|
Py_Complex
|
A C Py_Complex becomes a Python
complex
|
i
|
int
|
A C int becomes a Python int
|
l
|
long
|
A C long becomes a Python int
|
N
|
PyObject*
|
Passes a Python object and steals a reference
|
O
|
PyObject*
|
Passes a Python object and INCREFs it as per
normal rules
|
O&
|
convert + void*
|
Arbitrary conversion (see below)
|
s
|
char*
|
C null-terminated char* to Python string, or
NULL to None
|
s#
|
char* + int
|
C char* and length to Python string, or
NULL to None
|
u
|
Py_UNICODE*
|
C wide (UCS-2) null-terminated string to Python Unicode, or
NULL to None
|
u#
|
Py_UNICODE* + int
|
C wide (UCS-2) string and length to Python Unicode, or
NULL to None
|
(...)
|
as per ...
|
Build Python tuple from C values
|
[...]
|
as per ...
|
Build Python list from C values
|
{...}
|
as per ...
|
Build Python dictionary from C values, alternating keys and values
(must be an even number of C values)
|
Code O& corresponds to two arguments among the
variable arguments: first the address of a converter function you
code, then a void* (i.e., any address at all). The
converter function must have signature PyObject*
convert(void*). Python
calls the conversion function with the void* from
the variable arguments as the only argument. The conversion function
must either return NULL and raise an exception (as
covered in Section 24.1.8 later in
this chapter) to indicate an error, or return a new reference
PyObject* built from the data in the
void*.
Code {...} builds dictionaries from an even number
of C values, alternately keys and values. For example,
Py_BuildValue("{issi}",23,"zig","zag",42) returns
a dictionary like Python's
{23:'zig','zag':42}.
Note the important difference between codes N and
O. N steals a reference from
the PyObject* corresponding value among the
variable arguments, so it's convenient when
you're building an object including a reference you
own that you would otherwise have to Py_DECREF.
O does no reference stealing, so
it's appropriate when you're
building an object including a reference you don't
own, or a reference you must also keep elsewhere.
24.1.8 Exceptions
To propagate exceptions raised from
other functions you call, return NULL as the
PyObject* result from your C function. To raise
your own exceptions, set the current-exception indicator and return
NULL. Python's built-in exception
classes (covered in Chapter 6) are globally
available, with names starting with PyExc_, such
as PyExc_AttributeError,
PyExc_KeyError, and so on. Your extension module
can also supply and use its own exception classes. The most commonly
used C API functions related to raising exceptions are the following.
PyObject* PyErr_Format(PyObject* type,char* format,...)
|
|
Raises an exception of class
type, a built-in such as
PyExc_IndexError, or an exception class created
with PyErr_NewException. Builds the associated
value from format string format, which has
syntax similar to printf's, and
the following C values indicated as variable arguments above. Returns
NULL, so your code can just call:
return PyErr_Format(PyExc_KeyError,
"Unknown key name (%s)", thekeystring);
PyObject* PyErr_NewException(char* name,PyObject* base,PyObject* dict)
|
|
Subclasses exception class base, with
extra class attributes and methods from dictionary
dict (normally NULL,
meaning no extra class attributes or methods), creating a new
exception class named name (string
name must be of the form
"modulename.classname")
and returning a new reference to the new class object. When
base is NULL, uses
PyExc_Exception as the base class. You normally
call this function during initialization of a module object
module. For example:
PyModule_AddObject(module, "error",
PyErr_NewException("mymod.error", NULL, NULL));
PyObject* PyErr_NoMemory( )
|
|
Raises an out-of-memory error and returns
NULL, so your code can just call:
return PyErr_NoMemory( );
void PyErr_SetObject(PyObject* type,PyObject* value)
|
|
Raises an exception of class type, a
built-in such as PyExc_KeyError, or an exception
class created with PyErr_NewException, with
value as the associated value (a borrowed
reference). PyErr_SetObject is a
void function (i.e., returns no value).
PyObject* PyErr_SetFromErrno(PyObject* type)
|
|
Raises an exception of class type, a
built-in such as PyExc_OSError, or an exception
class created with PyErr_NewException. Takes all
details from global variable errno, which C
library functions and system calls set for many error cases, and the
standard C library function strerror. Returns
NULL, so your code can just call:
return PyErr_SetFromErrno(PyExc_IOError);
PyErr_SetFromErrnoWithFilename |
|
PyObject* PyErr_SetFromErrnoWithFilename(PyObject* type,char* filename)
|
|
Like PyErr_SetFromErrno, but also provides string
filename as part of the
exception's value. When
filename is NULL, works
like PyErr_SetFromErrno.
Your C code may want to deal with an exception and continue, as a
try/except statement would let
you do in Python code. The most commonly used C API functions related
to catching exceptions are the following.
Clears the error indicator. Innocuous if no
error is pending.
int PyErr_ExceptionMatches(PyObject* type)
|
|
Call only when an error is pending, or the
whole program might crash. Returns a value not equal to
0 when the pending exception is an instance of the
given type or any subclass of
type, or 0 when the
pending exception is not such an instance.
PyObject* PyErr_Occurred( )
|
|
Returns NULL if no error is pending, otherwise a
borrowed reference to the type of the pending exception.
(Don't use the returned value; call
PyErr_ExceptionMatches instead, in order to catch
exceptions of subclasses as well, as is normal and expected.)
Call only when an error is pending, or the whole program might crash.
Outputs a standard traceback to sys.stderr, then
clears the error indicator.
If you need to process errors in highly sophisticated ways, study
other error-related functions of the C API, such as
PyErr_Fetch, PyErr_Normalize,
PyErr_GivenExceptionMatches, and
PyErr_Restore. However, I do not cover such
advanced and rarely needed possibilities in this book.
24.1.9 Abstract Layer Functions
The code for a C extension typically
needs to use some Python functionality. For example, your code may
need to examine or set attributes and items of Python objects, call
Python-coded and built-in functions and methods, and so on. In most
cases, the best approach is for your code to call functions from the
abstract layer of Python's C API. These are
functions that you can call on any Python object (functions whose
names start with PyObject_), or any object within
a wide category, such as mappings, numbers, or sequences (with names
respectively starting with PyMapping_,
PyNumber_, and PySequence_).
Some of the functions callable on objects within these categories
duplicate functionality that is also available from
PyObject_ functions; in these cases, you should
use the PyObject_ function instead. I
don't cover such redundant functions in this book.
Functions in the abstract layer raise Python exceptions if you call
them on objects to which they are not applicable. All of these
functions accept borrowed references for PyObject*
arguments, and return a new reference (NULL for an
exception) if they return a PyObject* result.
The most frequently used abstract layer functions are the following.
int PyCallable_Check(PyObject* x)
|
|
True if x is callable, like
Python's
callable(x).
PyObject* PyEval_CallObject(PyObject* x,PyObject* args)
|
|
Calls callable Python object x with the
positional arguments held in tuple args.
Returns the call's result, like
Python's return
x(*args).
PyEval_CallObjectWithKeywords |
|
PyObject* PyEval_CallObjectWithKeywords(PyObject* x,PyObject* args,PyObject* kwds)
|
|
Calls callable Python object x with the
positional arguments held in tuple args
and the named arguments held in dictionary
kwds Returns the call's
result, like Python's return
x(*args,**kwds).
int PyIter_Check(PyObject* x)
|
|
True if x supports the iterator protocol
(i.e., if x is an iterator).
PyObject* PyIter_Next(PyObject* x)
|
|
Returns the next item from iterator x.
Returns NULL without raising any exception if
x's iteration is finished
(i.e., when Python's
x.next( ) raises
StopIteration).
int PyNumber_Check(PyObject* x)
|
|
True if x supports the number protocol
(i.e., if x is a number).
PyObject* PyObject_CallFunction(PyObject* x,char* format,...)
|
|
Calls the callable Python object x with
positional arguments described by format string
format, using the same format codes as
Py_BuildValue, covered earlier. When
format is NULL, calls
x with no arguments. Returns the
call's result.
PyObject* PyObject_CallMethod(PyObject* x,char* method,char* format,...)
|
|
Calls the method named method of Python
object x with positional arguments
described by format string format, using
the same format codes as Py_BuildValue. When
format is NULL, calls
the method with no arguments. Returns the call's
result.
int PyObject_Cmp(PyObject* x1,PyObject* x2,int* result)
|
|
Compares objects x1 and
x2 and places the result
(-1, 0, or
1) in
*result, like
Python's
result=cmp(x1,x2).
int PyObject_DelAttrString(PyObject* x,char* name)
|
|
Deletes x's attribute
named name, like Python's
del
x.name.
int PyObject_DelItem(PyObject* x,PyObject* key)
|
|
Deletes x's item with key
(or index) key, like
Python's del
x[key].
int PyObject_DelItemString(PyObject* x,char* key)
|
|
Deletes x's item with key
key, like Python's
del
x[key].
PyObject* PyObject_GetAttrString(PyObject* x,char* name)
|
|
Returns x's attribute
named name, like Python's
x.name.
PyObject* PyObject_GetItem(PyObject* x,PyObject* key)
|
|
Returns x's item with key
(or index) key, like
Python's
x[key].
int PyObject_GetItemString(PyObject* x,char* key)
|
|
Returns x's item with key
key, like Python's
x[key].
PyObject* PyObject_GetIter(PyObject* x)
|
|
Returns an iterator on x, like
Python's
iter(x).
int PyObject_HasAttrString(PyObject* x,char* name)
|
|
True if x has an attribute named
name, like Python's
hasattr(x,name).
int PyObject_IsTrue(PyObject* x)
|
|
True if x is true for Python, like
Python's
bool(x).
int PyObject_Length(PyObject* x)
|
|
Returns x's length, like
Python's
len(x).
PyObject* PyObject_Repr(PyObject* x)
|
|
Returns x's detailed
string representation, like Python's
repr(x).
PyObject* PyObject_RichCompare(PyObject* x,PyObject* y,int op)
|
|
Performs the comparison indicated by op
between x and
y, and returns the result as a Python
object. op can be
Py_EQ, Py_NE,
Py_LT, Py_LE,
Py_GT, or Py_GE, corresponding
to Python comparisons
x==y,
x!=y,
x<y,
x<=y,
x>y,
or
x>=y,
respectively.
int PyObject_RichCompareBool(PyObject* x,PyObject* y,int op)
|
|
Like PyObject_RichCompare, but returns
0 for false, 1 for true.
int PyObject_SetAttrString(PyObject* x,char* name,PyObject* v)
|
|
Sets x's attribute named
name to v, like
Python's
x.name=v.
int PyObject_SetItem(PyObject* x,PyObject* k,PyObject *v)
|
|
Sets x's item with key
(or index) key to
v, like Python's
x[key]=v.
int PyObject_SetItemString(PyObject* x,char* key,PyObject *v)
|
|
Sets x's item with key
key to v, like
Python's
x[key]=v.
PyObject* PyObject_Str(PyObject* x)
|
|
Returns x's readable
string form, like Python's
str(x).
PyObject* PyObject_Type(PyObject* x)
|
|
Returns x's type object,
like Python's
type(x).
PyObject* PyObject_Unicode(PyObject* x)
|
|
Returns x's Unicode
string form, like Python's
unicode(x).
int PySequence_Contains(PyObject* x,PyObject* v)
|
|
True if v is an item in
x, like Python's
v in
x.
int PySequence_DelSlice(PyObject* x,int start,int stop)
|
|
Delete x's slice from
start to stop,
like Python's del
x[start:stop].
PyObject* PySequence_Fast(PyObject* x)
|
|
Returns a new reference to a tuple with the
same items as x, unless
x is a list, in which case returns a new
reference to x. When you need to get many
items of an arbitrary sequence x,
it's fastest to call
t=PySequence_Fast(x)
once, then call
PySequence_Fast_GET_ITEM(t,i)
as many times as needed, and finally call
Py_DECREF(t).
PyObject* PySequence_Fast_GET_ITEM(PyObject* x,int i)
|
|
Returns the i
item of x, where
x must be the result of
PySequence_Fast,
x!=NULL, and
0<=i<PySequence_Fast_GET_SIZE(t).
Violating these conditions can cause program crashes: this approach
is optimized for speed, not for safety.
int PySequence_Fast_GET_SIZE(PyObject* x)
|
|
Returns the length of x.
x must be the result of
PySequence_Fast,
x!=NULL.
PyObject* PySequence_GetSlice(PyObject* x,int start,int stop)
|
|
Returns x's slice from
start to stop,
like Python's
x[start:stop].
PyObject* PySequence_List(PyObject* x)
|
|
Returns a new list object with the same items as
x, like Python's
list(x).
int PySequence_SetSlice(PyObject* x,int start,int stop,PyObject* v)
|
|
Sets x's slice from
start to stop
to v, like Python's
x[start:stop]=v.
Just as in the equivalent Python statement,
v must be a sequence of the same type as
x.
PyObject* PySequence_Tuple(PyObject* x)
|
|
Returns a new reference to a tuple with the
same items as x, like
Python's
tuple(x).
The functions whose names start with PyNumber_
allow you to perform numeric operations. Unary
PyNumber functions, which take one argument
PyObject* x and return
a PyObject*, are listed in Table 24-3 with their Python equivalents.
Table 24-3. Unary PyNumber functions
PyNumber_Absolute
|
abs(x)
|
PyNumber_Float
|
float(x)
|
PyNumber_Int
|
int(x)
|
PyNumber_Invert
|
~x
|
PyNumber_Long
|
long(x)
|
PyNumber_Negative
|
-x
|
PyNumber_Positive
|
+x
|
Binary PyNumber functions, which take two
PyObject* arguments x
and y and return a
PyObject*, are similarly listed in Table 24-4.
Table 24-4. Binary PyNumber functions
PyNumber_Add
|
x + y
|
PyNumber_And
|
x & y
|
PyNumber_Divide
|
x / y
|
PyNumber_Divmod
|
divmod(x, y)
|
PyNumber_FloorDivide
|
x // y
|
PyNumber_Lshift
|
x << y
|
PyNumber_Multiply
|
x * y
|
PyNumber_Or
|
x | y
|
PyNumber_Remainder
|
x % y
|
PyNumber_Rshift
|
x >> y
|
PyNumber_Subtract
|
x - y
|
PyNumber_TrueDivide
|
x / y (non-truncating)
|
PyNumber_Xor
|
x ^ y
|
All the binary PyNumber functions have in-place
equivalents whose names start with
PyNumber_InPlace, such as
PyNumber_InPlaceAdd and so on. The in-place
versions try to modify the first argument in-place, if possible, and
in any case return a new reference to the result, be it the first
argument (modified) or a new object. Python's
built-in numbers are immutable; therefore, when the first argument is
a number of a built-in type, the in-place versions work just the same
as the ordinary versions. Function PyNumber_Divmod
returns a tuple with two items (the quotient and the remainder) and
has no in-place equivalent.
There is one ternary PyNumber function,
PyNumber_Power.
PyObject* PyNumber_Power(PyObject* x,PyObject* y,PyObject* z)
|
|
When z is Py_None,
returns x raised to the
y power, like Python's
x**y
or equivalently
pow(x,y).
Otherwise, returns
x**y%z,
like Python's
pow(x,y,z).
The in-place version is named
PyNumber_InPlacePower.
24.1.10 Concrete Layer Functions
Each specific type of Python built-in
object supplies concrete functions to operate on instances of that
type, with names starting with
Pytype_
(e.g., PyInt_ for functions related to Python
ints). Most such functions duplicate the
functionality of abstract-layer functions or auxiliary functions
covered earlier in this chapter, such as
Py_BuildValue, which can generate objects of many
types. In this section, I cover some frequently used functions from
the concrete layer that provide unique functionality or substantial
convenience or speed. For most types, you can check if an object
belongs to the type by calling
Pytype_Check,
which also accepts instances of subtypes, or
Pytype_CheckExact,
which accepts only instances of type, not
of subtypes. Signatures are as for functions
PyIter_Check, covered earlier in this chapter.
PyObject* PyDict_GetItem(PyObject* x,PyObject* key)
|
|
Returns a borrowed reference to the item with key
key of dictionary
x.
int PyDict_GetItemString(PyObject* x,char* key)
|
|
Returns a borrowed reference to the item with key
key of dictionary
x.
int PyDict_Next(PyObject* x,int* pos,PyObject** k,PyObject** v)
|
|
Iterates over items in dictionary x. You
must initialize *pos to
0 at the start of the iteration:
PyDict_Next uses and updates
*pos to keep track of
its place. For each successful iteration step, returns
1; when there are no more items, returns
0. Updates
*k and
*v to point to the next
key and value respectively (borrowed references) at each step that
returns 1. You can pass either
k or v as
NULL if you are not interested in the key or
value. During an iteration, you must not change in any way the set of
x's keys, but you can
change x's values as long
as the set of keys remains identical.
int PyDict_Merge(PyObject* x,PyObject* y,int override)
|
|
Updates dictionary x by merging the items
of dictionary y into
x. override
determines what happens when a key k is
present in both x and
y: if override
is 0, then
x[k]
remains the same; otherwise
x[k]
is replaced by the value
y[k].
int PyDict_MergeFromSeq2(PyObject* x,PyObject* y,int override)
|
|
Like PyDict_Merge, except that
y is not a dictionary but a sequence of
sequences, where each subsequence has length 2 and is used as a
(key,value)
pair.
double PyFloat_AS_DOUBLE(PyObject* x)
|
|
Returns the C double
value of Python float
x, very fast, without error checking.
PyObject* PyList_New(int length)
|
|
Returns a new, uninitialized list of the given
length. You must then initialize the list,
typically by calling PyList_SET_ITEM
length times.
PyObject* PyList_GET_ITEM(PyObject* x,int pos)
|
|
Returns the pos item of list
x, without error checking.
int PyList_SET_ITEM(PyObject* x,int pos,PyObject* v)
|
|
Sets the pos item of list
x to v, without
error checking. Steals a reference to v.
Use only immediately after creating a new list
x with PyList_New.
char* PyString_AS_STRING(PyObject* x)
|
|
Returns a pointer to the internal buffer of string
x, very fast, without error checking. You
must not modify the buffer in any way, unless you just allocated it
by calling
PyString_FromStringAndSize(NULL,size).
int PyString_AsStringAndSize(PyObject* x,char** buffer,int* length)
|
|
Puts a pointer to the internal buffer of string
x in
*buffer, and
x's length in
*length. You must not
modify the buffer in any way, unless you just allocated it by calling
PyString_FromStringAndSize(NULL,size).
PyObject* PyString_FromFormat(char* format,...)
|
|
Returns a Python string built from format string
format, which has syntax similar to
printf's, and the following C
values indicated as variable arguments above.
PyString_FromStringAndSize |
|
PyObject* PyString_FromFormat(char* data,int size)
|
|
Returns a Python string of length size,
copying size bytes from
data. When data
is NULL, the Python string is uninitialized, and
you must initialize it. You can get the pointer to the
string's internal buffer by calling
PyString_AS_STRING.
PyObject* PyTuple_New(int length)
|
|
Returns a new, uninitialized tuple of the given
length. You must then initialize the
tuple, typically by calling PyTuple_SET_ITEM
length times.
PyObject* PyTuple_GET_ITEM(PyObject* x,int pos)
|
|
Returns the pos item of tuple
x, without error checking.
int PyTuple_SET_ITEM(PyObject* x,int pos,PyObject* v)
|
|
Sets the pos item of tuple
x to v, without
error checking. Steals a reference to v.
Use only immediately after creating a new tuple
x with PyTuple_New.
24.1.11 A Simple Extension Example
Example 24-1 exposes the functionality of Python C
API functions PyDict_Merge and
PyDict_MergeFromSeq2 for Python use. The
update method of dictionaries works like
PyDict_Merge with
override=1, but Example 24-1 is more general.
Example 24-1. A simple Python extension module merge.c
#include <Python.h>
static PyObject*
merge(PyObject* self, PyObject* args, PyObject* kwds)
{
static char* argnames[] = {"x","y","override",NULL};
PyObject *x, *y;
int override = 0;
if(!PyArg_ParseTupleAndKeywords(args, kwds, "O!O|i", argnames,
&PyDict_Type, &x, &y, &override))
return NULL;
if(-1 == PyDict_Merge(x, y, override)) {
if(!PyErr_ExceptionMatches(PyExc_TypeError)):
return NULL;
PyErr_Clear( );
if(-1 == PyDict_MergeFromSeq2(x, y, override))
return NULL;
}
return Py_BuildValue("");
}
static char merge_docs[] = "\
merge(x,y,override=False): merge into dict x the items of dict y (or the pairs\n\
that are the items of y, if y is a sequence), with optional override.\n\
Alters dict x directly, returns None.\n\
";
static PyObject*
mergenew(PyObject* self, PyObject* args, PyObject* kwds)
{
static char* argnames[] = {"x","y","override",NULL};
PyObject *x, *y, *result;
int override = 0;
if(!PyArg_ParseTupleAndKeywords(args, kwds, "O!O|i", argnames,
&PyDict_Type, &x, &y, &override))
return NULL;
result = PyObject_CallMethod(x, "copy", "");
if(!result)
return NULL;
if(-1 == PyDict_Merge(result, y, override)) {
if(!PyErr_ExceptionMatches(PyExc_TypeError)):
return NULL;
PyErr_Clear( );
if(-1 == PyDict_MergeFromSeq2(result, y, override))
return NULL;
}
return result;
}
static char merge_docs[] = "\
mergenew(x,y,override=False): merge into dict x the items of dict y (or\n\
the pairs that are the items of y, if y is a sequence), with optional\n\
override. Does NOT alter x, but rather returns the modified copy as\n\
the function's result.\n\
";
static PyMethodDef funcs[] = {
{"merge", (PyCFunction)merge, METH_KEYWORDS, merge_docs},
{"mergenew", (PyCFunction)mergenew, METH_KEYWORDS, mergenew_docs},
{NULL}
};
void
initmerge(void)
{
Py_InitModule3("merge", funcs, "Example extension module");
}
This example declares as static every function and
global variable in the C source file, except
initmerge, which must be visible from the outside
to let Python call it. Since the functions and variables are exposed
to Python via the PyMethodDef structures, Python
does not need to see their names directly. Therefore, declaring them
static is best: this ensures that names
don't accidentally end up in the whole
program's global namespace, as might otherwise
happen on some platforms, possibly causing conflicts and errors.
The format string "O!O|i" passed to
PyArg_ParseTupleAndKeywords indicates that
function merge accepts three arguments from
Python: an object with a type constraint, a generic object, and an
optional integer. At the same time, the format string indicates that
the variable part of
PyArg_ParseTupleAndKeywords's
arguments must contain four addresses: in order, the address of a
Python type object, then two addresses of
PyObject* variables, and finally the address of an
int variable. The int variable
must have been previously initialized to its intended default value,
since the corresponding Python argument is optional.
And indeed, after the argnames argument,
the code passes &PyDict_Type (i.e., the
address of the dictionary type object). Then it passes the addresses
of the two PyObject* variables. Finally, it passes
the address of variable override, an
int that was previously initialized to
0, since the default, when the
override argument isn't
explicitly passed from Python, should be no overriding. If the return
value of PyArg_ParseTupleAndKeywords is
0, the code immediately returns
NULL to propagate the exception; this
automatically diagnoses most cases where Python code passes wrong
arguments to our new function merge.
When the arguments appear to be okay, it tries
PyDict_Merge, which succeeds if
y is a dictionary. When
PyDict_Merge raises a
TypeError, indicating that
y is not a dictionary, the code clears the
error and tries again, this time with
PyDict_MergeFromSeq2, which succeeds when
y is a sequence of pairs. If that also
fails, it returns NULL to propagate the exception.
Otherwise, it returns None in the simplest way
(i.e., with return
Py_BuildValue("")) to indicate success.
Function mergenew basically duplicates
merge's functionality; however,
mergenew does not alter its arguments, but rather
builds and returns a new dictionary as the
function's result. The C API function
PyObject_CallMethod lets
mergenew call the copy method
of its first Python-passed argument, a dictionary object, and obtain
a new dictionary object that it then alters (with exactly the same
logic as function merge). It then returns the
altered dictionary as the function result (thus, no need to call
Py_BuildValue in this case).
The code of Example 24-1 must reside in a source file
named merge.c. In the same directory, create the
following script named setup.py:
from distutils.core import setup, Extension
setup(name='merge', ext_modules=[ Extension('merge',sources=['merge.c']) ])
Now, run python setup.py install at a shell
prompt in this directory. This command builds the dynamically loaded
library for the merge extension module, and copies
it to the appropriate directory, depending on your Python
installation. Now your Python code can use the module. For example:
import merge
x = {'a':1,'b':2 }
merge.merge(x,[['b',3],['c',4]])
print x # prints: {'a':1, 'b':2, 'c':4 }
print merge.mergenew(x,{'a':5,'d':6},override=1)
# prints: {'a':5, 'b':2, 'c':4, 'd':6 }
print x # prints: {'a':1, 'b':2, 'c':4 }
This example shows the difference between merge
(which alters its first argument) and mergenew
(which returns a new object and does not alter its argument). It also
shows that the second argument can be either a dictionary or a
sequence of two-item subsequences. Further, it demonstrates default
operation (where keys that are already in the first argument are left
alone) as well as the override option (where keys
coming from the second argument take precedence, as in Python
dictionaries' update method).
24.1.12 Defining New Types
In your extension
modules, you often want to define new types and make them available
to Python. A type's definition is held in a large
struct named PyTypeObject. Most of the fields of
PyTypeObject are pointers to functions. Some
fields point to other structs, which in turn are blocks of pointers
to functions. PyTypeObject also includes a few
fields giving the type's name, size, and behavior
details (option flags). You can leave almost all fields of
PyTypeObject set to NULL if you
do not supply the related functionality. You can point some fields to
functions in the Python C API in order to supply certain aspects of
fundamental object functionality in standard ways.
The best way
to implement a type is to copy from the Python sources the file
Modules/xxsubtype.c, which Python supplies
exactly for such didactical purposes, and edit it.
It's a complete module with two types, subclassing
from list and dict
respectively. Another example in the Python sources,
Objects/xxobject.c, is not a complete module,
and the type in this file is minimal and old-fashioned, not using
modern recommended approaches. See http://www.python.org/dev/doc/devel/api/type-structs.html
for detailed documentation on PyTypeObject and
other related structs. File Include/object.h in
the Python sources contains the declarations of these types, as well
as several important comments that you would do well to study.
24.1.12.1 Per-instance data
To represent each instance of your type, declare a C struct that
starts, right after the opening brace, with macro
PyObject_HEAD. The macro expands into the data
fields that your struct must begin with in order to be a Python
object. Those fields include the reference count and a pointer to the
instance's type. Any pointer to your structure can
be correctly cast to a PyObject*.
The PyTypeObject struct that defines your
type's characteristics and behavior must contain the
size of your per-instance struct, as well as pointers to the C
functions you write to operate on your structure. Therefore, you
normally place the PyTypeObject toward the end of
your code, after the per-instance struct and all the functions that
operate on instances of the per-instance struct. Each
x that points to a structure starting with
PyObject_HEAD, and in particular each
PyObject* x, has a
field x->ob_type
that is the address of the PyTypeObject structure
that is x's Python type
object.
24.1.12.2 The PyTypeObject definition
Given a per-instance struct such as:
typedef struct {
PyObject_HEAD
/* other data needed by instances of this type, omitted */
} mytype;
the corresponding PyTypeObject struct almost
invariably begins in a way similar to:
static PyTypeObject t_mytype = {
/* tp_head */ PyObject_HEAD_INIT(NULL) /* use NULL, for MSVC++ */
/* tp_internal */ 0, /* must be 0 */
/* tp_name / "mymodule.mytype", /* type name with module */
/* tp_basicsize */ sizeof(mytype),
/* tp_itemsize */ 0, /* 0 except variable-size type */
/* tp_dealloc */ (destructor)mytype_dealloc,
/* tp_print */ 0, /* usually 0, use str instead */
/* tp_getattr */ 0, /* usually 0 (see getattro) */
/* tp_setattr */ 0, /* usually 0 (see setattro) */
/* tp_compare*/ 0, /* see also richcompare */
/* tp_repr */ (reprfunc)mytype_str, /* like Python's _ _repr_ _ */
/* rest of struct omitted */
For portability to Microsoft Visual C++, the
PyObject_HEAD_INIT macro at the start of the
PyTypeObject must have an argument of
NULL. During module initialization, you must call
PyType_Ready(&t_mytype), which, among other
tasks, inserts in t_mytype the address of its type
(the type of a type is also known as a metatype), normally
&PyType_Type. Another slot in
PyTypeObject that points to another type object is
tp_base, later in the structure. In the structure
definition itself, you must have a tp_base of
NULL, again for compatibility with Microsoft
Visual C++. However, before you invoke
PyType_Ready(&t_mytype), you can optionally
set t_mytype.tp_base to the address of another
type object. When you do so, your type inherits from the other type,
just like a class coded in Python 2.2 can optionally inherit from a
built-in type. For a Python type coded in C, inheriting means that
for most fields in the PyTypeObject, if you set
the field to NULL, PyType_Ready
copies the corresponding field from the base type. A type must
specifically assert in its field tp_flags that it
is usable as a base type, otherwise no other type can inherit from
it.
The tp_itemsize field is of interest only for
types that, like tuples, have instances of different sizes, and can
determine instance size once and forever at creation time. Most types
just set tp_itemsize to 0.
Fields such as tp_getattr and
tp_setattr are generally set to
NULL because they exist only for backward
compatibility: modern types use fields tp_getattro
and tp_setattro instead. Field
tp_repr is typical of most of the following
fields, which are omitted here: the field holds the address of a
function, which corresponds directly to a Python special method
(here, _ _repr_ _). You can set the field to
NULL, indicating that your type does not supply
the special method, or else set the field to point to a function with
the needed functionality. If you set the field to
NULL, but also point to a base type from the
tp_base slot, you inherit the special method, if
any, from your base type. You often need to cast your functions to
the specific typedef type that a field needs
(here, type reprfunc for field
tp_repr) because the typedef
has a first argument PyObject*
self, while your functions, being specific to your
type, normally use more specific pointers. For example:
static PyObject* mytype_str(mytype* self) { ... /* rest omitted */
Alternatively, you can declare mytype_str with a
PyObject* self, then use a cast
(mytype*)self in the function's
body. Either alternative is acceptable, but it's
more common to locate the casts in the
PyTypeObject declaration.
24.1.12.3 Instance initialization and finalization
The task of finalizing your instances is split among two functions.
The tp_dealloc slot must never be
NULL, except for immortal types (i.e., types whose
instances are never deallocated). Python calls
x->ob_type->tp_dealloc(x)
on each instance x whose reference count
decreases to 0, and the function thus called must
release any resource held by object x,
including x's memory.
When an instance of mytype holds no other
resources that must be released (in particular, no owned references
to other Python objects that you would have to
DECREF),
mytype's destructor can be
extremely simple:
static void mytype_dealloc(PyObject *x)
{
x->ob_type->tp_free((PyObject*)x);
}
The function in the tp_free slot has the specific
task of freeing x's
memory. In Python 2.2, the function has signature
void
name(PyObject*). In
Python 2.3, the signature has changed to void
name(void*). One way to
ensure your sources compile under both versions of Python is to put
in slot tp_free the C API function
_PyObject_Del, which has the right signature in
each version.
The task of initializing your instances is split among three
functions. To allocate memory for new instances of your type, put in
slot tp_alloc the C API function
PyType_GenericAlloc, which does absolutely minimal
initialization, clearing the newly allocated memory bytes to
0 except for the type pointer and reference count.
Similarly, you can often set field tp_new to the C
API function PyType_GenericNew. In this case, you
can perform all per-instance initialization in the function you put
in slot tp_init, which has the signature:
int init_name(PyObject *self,PyObject *args,PyObject *kwds)
The positional and named arguments to the function in slot
tp_init are those passed when calling the type to
create the new instance, just like, in Python, the positional and
named arguments to _ _init_ _ are those passed
when calling the class object. Again like for types (classes) defined
in Python, the general rule is to do as little initialization as
possible in tp_new and as much as possible in
tp_init. Using
PyType_GenericNew for tp_new
accomplishes this. However, you can choose to define your own
tp_new for special types, such as ones that have
immutable instances, where initialization must happen earlier. The
signature is:
PyObject* new_name(PyObject *subtype,PyObject *args,PyObject *kwds)
The function in tp_new must return the newly
created instance, normally an instance of
subtype (which may be a type that inherits
from yours). The function in tp_init, on the other
hand, must return 0 for success, or
-1 to indicate an exception.
If your type is subclassable, it's important that
any instance invariants be established before the function in
tp_new returns. For example, if it must be
guaranteed that a certain field of the instance is never
NULL, that field must be set to a
non-NULL value by the function in
tp_new. Subtypes of your type might fail to call
your tp_init function; therefore such
indispensable initializations should be in tp_new
for subclassable types.
24.1.12.4 Attribute access
Access to attributes of your instances, including methods (as covered
in Chapter 5) is mediated by the functions you put
in slots tp_getattro and
tp_setattro of your
PyTypeObject struct. Normally, you put there the
standard C API functions PyObject_GenericGetAttr
and PyObject_GenericSetAttr, which implement
standard semantics. Specifically, these API functions access your
type's methods via the slot
tp_methods, pointing to a sentinel-terminated
array of PyMethodDef structs, and your
instances' members via the slot
tp_members, a similar sentinel-terminated array of
PyMemberDef structs:
typedef struct {
char* name; /* Python-visible name of the member */
int type; /* code defining the data-type of the member */
int offset; /* offset of the member in the per-instance struct */
int flags; /* READONLY for a read-only member */
char* doc; /* docstring for the member */
} PyMemberDef
As an exception to the general rule that including
Python.h gets you all the declarations you need,
you have to include structmember.h explicitly in
order to have your C source see the declaration of
PyMemberDef.
type is generally
T_OBJECT for members that are
PyObject*, but many other type codes are defined
in Include/structmember.h for members that your
instances hold as C-native data (e.g., T_DOUBLE
for double or T_STRING for
char*). For example, if your per-instance struct
is something like:
typedef struct {
PyObject_HEAD
double datum;
char* name;
} mytype;
to expose to Python per-instance attributes
datum (read/write) and
name (read-only), you can define the
following array and point your
PyTypeObject's
tp_members to it:
static PyMemberDef[] mytype_members = {
{"datum", T_DOUBLE, offsetof(mytype, datum), 0, "The current datum"},
{"name", T_STRING, offsetof(mytype, name), READONLY,
"Name of the datum"},
{NULL}
};
Using PyObject_GenericGetAttr and
PyObject_GenericSetAttr for
tp_getattro and tp_setattro
also provides further possibilities, which I will not cover in detail
in this book. Field tp_getset points to a
sentinel-terminated array of PyGetSetDef structs,
the equivalent of having property instances in a
Python-coded class. If your
PyTypeObject's field
tp_dictoffset is not equal to
0, the field's value must be the
offset, within the per-instance struct, of a
PyObject* that points to a Python dictionary. In
this case, the generic attribute access API functions use that
dictionary to allow Python code to set arbitrary attributes on your
type's instances, just like for instances of
Python-coded classes.
Another dictionary is per-type, not per-instance: the
PyObject* for the per-type dictionary is slot
tp_dict of your PyTypeObject
struct. You can set slot tp_dict to
NULL, and then PyType_Ready
initializes the dictionary appropriately. Alternatively, you can set
tp_dict to a dictionary of type attributes, and
then PyType_Ready adds other entries to that same
dictionary, in addition to the type attributes you set.
It's generally easier to start with
tp_dict set to NULL, call
PyType_Ready to create and initialize the per-type
dictionary, and then, if need be, add any further entries to the
dictionary.
Field tp_flags is a long whose
bits determine your type struct's exact layout,
mostly for backward compatibility. Normally, set this field to
Py_TPFLAGS_DEFAULT to indicate that you are
defining a normal, modern type. You should set
tp_flags to
Py_TPFLAGS_DEFAULT|Py_TPFLAGS_HAVE_GC if your type
supports cyclic garbage collection. Your type should support cyclic
garbage collection if instances of the type contain
PyObject* fields that might point to arbitrary
objects and form part of a reference loop. However, to support cyclic
garbage collection, it's not enough to add
Py_TPFLAGS_HAVE_GC to field
tp_flags; you also have to supply appropriate
functions, indicated by slots tp_traverse and
tp_clear, and register and unregister your
instances appropriately with the cyclic garbage collector. Supporting
cyclic garbage collection is an advanced subject, and I do not cover
it further in this book. Similarly, I do not cover the advanced
subject of supporting weak references.
Field tp_doc, a char*, is a
null-terminated character string that is your type's
docstring. Other fields point to structs (whose fields point to
functions); you can set each such field to NULL to
indicate that you support none of the functions of that kind. The
fields pointing to such blocks of functions are
tp_as_number, for special methods typically
supplied by numbers; tp_as_sequence, for special
methods typically supplied by sequences;
tp_as_mapping, for special methods typically
supplied by mappings; and tp_as_buffer, for the
special methods of the buffer protocol.
For example, objects that are not sequences can still support one or
a few of the methods listed in the block to which
tp_as_sequence points, and in that case the
PyTypeObject must have a
non-NULL field tp_as_sequence,
even if the block of function pointers it points to is in turn mostly
full of NULLs. For example, dictionaries supply a
_ _contains_ _ special method so that you can
check if x in
d when d is a
dictionary. At the C code level, the method is a function pointed to
by field sq_contains, which is part of the
PySequenceMethods struct to which field
tp_as_sequence points. Therefore, the
PyTypeObject struct for the
dict type, named PyDict_Type,
has a non-NULL value for
tp_as_sequence, even though a dictionary supplies
no other field in PySequenceMethods except
sq_contains, and therefore all other fields in
*(PyDict_Type.tp_as_sequence) are
NULL.
24.1.12.5 Type definition example
Example 24-2 is a complete Python extension module
that defines the very simple type intpair, each
instance of which holds two integers named first
and second.
Example 24-2. Defining a new intpair type
#include "Python.h"
#include "structmember.h"
/* per-instance data structure */
typedef struct {
PyObject_HEAD
int first, second;
} intpair;
static int
intpair_init(PyObject *self, PyObject *args, PyObject *kwds)
{
static char* nams[] = {"first","second",NULL};
int first, second;
if(!PyArg_ParseTupleAndKeywords(args, kwds, "ii", nams, &first, &second))
return -1;
((intpair*)self)->first = first;
((intpair*)self)->second = second;
return 0;
}
static void
intpair_dealloc(PyObject *self)
{
self->ob_type->tp_free(self);
}
static PyObject*
intpair_str(PyObject* self)
{
return PyString_FromFormat("intpair(%d,%d)",
((intpair*)self)->first, ((intpair*)self)->second);
}
static PyMemberDef intpair_members[] = {
{"first", T_INT, offsetof(intpair, first), 0, "first item" },
{"second", T_INT, offsetof(intpair, second), 0, "second item" },
{NULL}
};
static PyTypeObject t_intpair = {
PyObject_HEAD_INIT(0) /* tp_head */
0, /* tp_internal */
"intpair.intpair", /* tp_name */
sizeof(intpair), /* tp_basicsize */
0, /* tp_itemsize */
intpair_dealloc, /* tp_dealloc */
0, /* tp_print */
0, /* tp_getattr */
0, /* tp_setattr */
0, /* tp_compare */
intpair_str, /* tp_repr */
0, /* tp_as_number */
0, /* tp_as_sequence */
0, /* tp_as_mapping */
0, /* tp_hash */
0, /* tp_call */
0, /* tp_str */
PyObject_GenericGetAttr, /* tp_getattro */
PyObject_GenericSetAttr, /* tp_setattro */
0, /* tp_as_buffer */
Py_TPFLAGS_DEFAULT,
"two ints (first,second)",
0, /* tp_traverse */
0, /* tp_clear */
0, /* tp_richcompare */
0, /* tp_weaklistoffset */
0, /* tp_iter */
0, /* tp_iternext */
0, /* tp_methods */
intpair_members, /* tp_members */
0, /* tp_getset */
0, /* tp_base */
0, /* tp_dict */
0, /* tp_descr_get */
0, /* tp_descr_set */
0, /* tp_dictoffset */
intpair_init, /* tp_init */
PyType_GenericAlloc, /* tp_alloc */
PyType_GenericNew, /* tp_new */
_PyObject_Del, /* tp_free */
};
void
initintpair(void)
{
static PyMethodDef no_methods[] = { {NULL} };
PyObject* this_module = Py_InitModule("intpair", no_methods);
PyType_Ready(&t_intpair);
PyObject_SetAttrString(this_module, "intpair", (PyObject*)&t_intpair);
}
The intpair type defined in Example 24-2 gives just about no substantial benefits when
compared to an equivalent definition in Python, such as:
class intpair(object):
__slots_ _ = 'first', 'second'
def __init_ _(self, first, second):
self.first = first
self.second = second
def __repr_ _(self):
return 'intpair(%s,%s)' % (self.first, self.second)
The C-coded version does ensure the two attributes are integers,
truncating float or complex number arguments as needed. For example:
import intpair
x=intpair.intpair(1.2,3.4) # x is: intpair(1,3)
Each instance of the C-coded version of intpair
occupies somewhat less memory than an instance of the Python version
in the above example. However, the purpose of Example 24-2 is purely didactic: to present a C-coded
Python extension that defines a new type.
|