home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Advanced Perl Programming

Advanced Perl ProgrammingSearch this book
Previous: 17.6 Resources Chapter 18 Next: 18.2 Example: Fractals with Perl
 

18. Extending Perl:A First Course

Thompson's rule for first-time telescope makers: "It is faster to make a four-inch mirror, then a six-inch mirror, than to make a six-inch mirror."

- Programming Pearls, Communications of the ACM , Sept. 1985

Scripting is almost always a more pleasant and productive alternative to using a systems programming language. Scripting languages aren't designed to do everything,[ 1 ] however, and there comes a time when you need to dig down to C/C++ for speed, fine-grained data structures, type safety, and access to existing libraries. The ability of languages such as Perl, Visual Basic, Python, and Tcl to integrate well with C accords them the status of a serious development language, in contrast to awk and early versions of BASIC, which were seldom used for production applications.

[1] In Perl's case, the definition of everything may be a bit hard to nail down!

In this chapter, we will examine what it takes to cement Perl and C code together and then study two tool sets that do a remarkable job of performing this binding for us. The first is a pair of tools called h2xs and xsubpp , packaged with the Perl distribution. For brevity, we will refer to this pair as XS,[ 2 ] because it involves an intermediate language of the same name. The other tool is SWIG (Simplified Wrapper and Interface Generator), written by Dave Beazley at the University of Utah.

[2] Both XSUB and XS stand for eXternal SUBroutine.

We'll cover an often-used subset of these tools' capabilities and learn that a lot can be achieved without having to know anything at all about the internal Perl API. But a number of powerful features will have to wait until the section "Meaty Extensions" in Chapter 20, Perl Internals .

This chapter requires you to have the following modules handy: C::Scan, Data::Flow, both required by h2xs and available from CPAN, and the gd library for creating GIF files, downloadable from www.boutell.com .

18.1 Writing an Extension: Overview

Figure 18.1 shows a file called testmatrix.pl making a call to an underlying Matrix library written in C. To bind the two sets of code together, we need to have some glue code, indicated by the dark gray boxes.

Figure 18.1: Calling C from Perl

Figure 18.1

XS and SWIG both create this glue code in two files - a Perl module and a C wrapper file - and address the following issues:

Data type translation

A Perl scalar argument can be translated to a fundamental C data type such as int , double , or char * (and vice versa) with ease. Dealing with a user-defined structure such as Matrix * or Vector * is trickier. $mat in Figure 18.1 holds a C pointer to a user-defined data type. Both xsubpp and SWIG are equipped with a type-mapping facility, which allows you to write custom code for handling translations between Perl and unfamiliar C data types. You have to know some internal API before you can write typemaps, so we will visit this issue again in Chapter 20 .

Memory management

Perl automatically manages the memory allocated for user-defined variables, while C expects the programmer to spell out everything. This issue is especially important when data crosses the Perl-C interface. Unfortunately, a C function's signature gives no clue about its memory management protocol; it is difficult for humans to divine it, let alone automated tools such as SWIG and XS. Let us assume that the C matrix library stores its data as a series of Vector objects internally (each row is represented as a Vector), and that matrix_get_row returns the Vector corresponding to that row. As you can see, both new_matrix and matrix_get_row return a pointer to an object, but in the first case, the caller is expected to take ownership of the object (delete it when it is no longer required), and in the latter, the matrix library owns the memory. While the extension tools provide certain default choices, you have to be constantly on the watch. You should also ensure that the appropriate function deletes the memory - free , delete , or delete[] , for objects allocated by malloc or C++'s new or new[] , respectively.

Perl conveniences

A simple call such as

     ($a,$b,$c) = $mat->get_row(10);

exercises Perl features such as packages, variable number of function arguments, multiple return values from functions, OO notation, the wantarray functionality, and so on. An extension should strive to make a Perl programmer feel at home.

Bootstrapping and initialization

For the C library to be called from Perl, it needs to be statically or dynamically linked in to the Perl interpreter. The Perl module generated by XS and SWIG contains the code for bootstrapping and initializing the C library. (The rest of the functions described above are present in the C wrapper code.)

18.1.1 The Extension Process

C header files (such as Matrix.h ) contain data structure declarations, preprocessor macros, publicly accessible variables, and function prototypes - essentially, the interface for a C library. You are typically not interested in making everything available to a Perl script; there's nothing worse than attempting C programming in Perl. In most cases, it suffices to export a subset of public functions, and some constants (which are available as initialized variables, #define 's, or enum s). We refer to them collectively as the public interface and extract them into a public header file.

Figure 18.2 shows how the Matrix library's header file is used as input for the two sets of tools.

Figure 18.2: SWIG and XS processes

Figure 18.2

The public header file may contain complex C declarations. SWIG expects you, the extension developer, to boil the interface down to a still simpler form and express it in its interface definition language. Fortunately, this language is close enough to ANSI C and simple C++ that a large number of header files don't need any translation at all. From the interface description, SWIG generates the glue code; in the Matrix case, it will be Matrix.pm and Matrix_wrap.c . If your system supports dynamic linking (shared libraries on Unix, and DLLs on Windows), and if the Perl executable has been built to use it, all that is left to be done is to convert the glue code and your C library into a dynamic library. If dynamic linking is not an option, then a new Perl executable is generated by statically linking the Perl archive library ( libperl.a on Unix or perl.lib on Microsoft Windows) with the pieces of code mentioned above.

h2xs and xsubpp take a slightly different approach. h2xs understands C header files (but not C++) and converts all constants and function prototypes to a meta language called XS. But a function declaration may still be too complex for scripting purposes, so this approach expects you to twiddle with the .xs file produced by h2xs and take the necessary steps to simplify the interface. Of course, the hand conversion is unnecessary if the interface is already simple enough. The XS language is a mixture of C and funny keywords and provides directives for you to override the glue code produced by xsubpp .

Incidentally, the code generated by both tools is quite similar, and it is perfectly acceptable to have some extensions built using the XS approach and some using SWIG. Which brings us to the question: which one should you use?

18.1.2 SWIG or XS?

Differences in SWIG's and XS's features spring from differences in their design goals. SWIG is designed to help create a scripting language wrapper over a C library and supports Python, Tcl, and Guile in addition to Perl. In contrast, XS is designed only for Perl and allows for a number of Perlisms that SWIG cannot easily generalize to the other languages.

I prefer SWIG to the XS approach because it feels a lot cleaner, is far less internals-oriented than XS is, and supports multiple languages. In addition, it has excellent support for data structures (not just functions), whereas XS supports only functions. I build C++ and Java applications for a living, so my focus is typically more on the application than on the scripting frontend - I leave the choice of scripting language to the user. Your mileage may vary.

You'll find that all modules in the Perl distribution and on CPAN are currently written by using XS. The chief reason is that XS comes bundled with Perl. Besides, it has supported powerful features such as typemaps since its inception, whereas SWIG has been beefed up only recently. If you have to understand or modify any of the CPAN modules, you have to know XS.

Both tools provide significant degrees of freedom to compensate for most deficiencies, so my advice is to pick one and go with it.