home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Chapter 14  TOC  Chapter 16

Chapter 15. Advanced Internet Topics

15.1 "Surfing on the Shoulders of Giants"

This chapter concludes our look at Python Internet programming by exploring a handful of Internet-related topics and packages. We've covered many Internet topics in the previous five chapters -- socket basics, client and server-side scripting tools, and programming full-blown web sites with Python. Yet we still haven't seen many of Python's standard built-in Internet modules in action. Moreover, there is a rich collection of third-party extensions for scripting the Web with Python that we have not touched on at all.

In this chapter, we explore a grab-bag of additional Internet-related tools and third-party extensions of interest to Python Internet developers. Along the way, we meet larger Internet packages, such as HTMLgen, JPython, Zope, PSP, Active Scripting, and Grail. We'll also study standard Python tools useful to Internet programmers, including Python's restricted execution mode, XML support, COM interfaces, and techniques for implementing proprietary servers. In addition to their practical uses, these systems demonstrate just how much can be achieved by wedding a powerful object-oriented scripting language such as Python to the Web.

Before we start, a disclaimer: none of these topics is presented in much detail here, and undoubtedly some interesting Internet systems will not be covered at all. Moreover, the Internet evolves at lightning speed, and new tools and techniques are certain to emerge after this edition is published; indeed, most of the systems in this chapter appeared in the five years after the first edition of this book was written, and the next five years promise to be just as prolific. As always, the standard moving-target caveat applies: read the Python library manual's Internet section for details we've skipped, and stay in touch with the Python community at http://www.python.org for information about extensions not covered due to a lack of space or a lack of clairvoyance.

15.2 Zope: A Web Publishing Framework

Zope is an open source web-application server and toolkit, written in and customizable with Python. It is a server-side technology that allows web designers to implement sites and applications by publishing Python object hierarchies on the Web. With Zope, programmers can focus on writing objects, and let Zope handle most of the underlying HTTP and CGI details. If you are interested in implementing more complex web sites than the form-based interactions we've seen in the last three chapters, you should investigate Zope: it can obviate many of the tasks that web scripters wrestle with on a daily basis.

Sometimes compared to commercial web toolkits such as ColdFusion, Zope is made freely available over the Internet by a company called Digital Creations and enjoys a large and very active development community. Indeed, many attendees at a recent Python conference were attracted by Zope, which had its own conference track. The use of Zope has spread so quickly that many Pythonistas now look to it as Python's "killer application" -- a system so good that it naturally pushes Python into the development spotlight. At the least, Zope offers a new, higher-level way of developing sites for the Web, above and beyond raw CGI scripting.[1]

15.2.1 Zope Components

Zope began life as a set of tools (part of which was named "Bobo") placed in the public domain by Digital Creations. Since then, it has grown into a large system with many components, a growing body of add-ons (called "products" in Zope parlance), and a fairly steep learning curve. We can't do it any sort of justice in this book, but since Zope is one of the most popular Python-based applications at this writing, I'd be remiss if I didn't provide a few details here.

In terms of its core components, Zope includes the following parts:

Zope Object Request Broker (ORB)

At the heart of Zope, the ORB dispatches incoming HTTP requests to Python objects and returns results to the requestor, working as a perpetually running middleman between the HTTP CGI world and your Python objects. The Zope ORB is described further in the next section.

HTML document templates

Zope provides a simple way to define web pages as templates, with values automatically inserted from Python objects. Templates allow an object's HTML representation to be defined independently of the object's implementation. For instance, values of attributes in a class instance object may be automatically plugged into a template's text by name. Template coders need not be Python coders, and vice versa.

Object database

To record data persistently, Zope comes with a full object-oriented database system for storing Python objects. The Zope object database is based on the Python pickle serialization module we'll meet in the next part of this book, but adds support for transactions, lazy object fetches (sometimes called delayed evaluation), concurrent access, and more. Objects are stored and retrieved by key, much as they are with Python's standard shelve module, but classes must subclass an imported Persistent superclass, and object stores are instances of an imported PickleDictionary object. Zope starts and commits transactions at the start and end of HTTP requests.

Zope also includes a management framework for administrating sites, as well as a product API used to package components. Zope ships with these and other components integrated into a whole system, but each part can be used on its own as well. For instance, the Zope object database can be used in arbitrary Python applications by itself.

15.2.2 What's Object Publishing?

If you're like me, the concept of publishing objects on the Web may be a bit vague at first glance, but it's fairly simple in Zope: the Zope ORB automatically maps URLs requested by HTTP into calls on Python objects. Consider the Python module and function in Example 15-1 .

Example 15-1. PP2E\Internet\Other\messages.py
"A Python module published on the Web by Zope"
 
def greeting(size='brief', topic='zope'):
    "a published Python function"
    return 'A %s %s introduction' % (size, topic)

This is normal Python code, of course, and says nothing about Zope, CGI, or the Internet at large. We may call the function it defines from the interactive prompt as usual:

C:\...\PP2E\Internet\Other>python
>>> import messages
>>> messages.greeting(  )
'A brief zope introduction'
 
>>> messages.greeting(size='short')
'A short zope introduction'
 
>>> messages.greeting(size='tiny', topic='ORB')
'A tiny ORB introduction'

But if we place this module file, along with Zope support files, in the appropriate directory on a server machine running Zope, it automatically becomes visible on the Web. That is, the function becomes a published object -- it can be invoked through a URL, and its return value becomes a response page. For instance, if placed in a cgi-bin directory on a server called myserver.net, the following URLs are equivalent to the three calls above:

http://www.myserver.net/cgi-bin/messages/greeting
http://www.myserver.net/cgi-bin/messages/greeting?size=short
http://www.myserver.net/cgi-bin/messages/greeting?size=tiny&topic=ORB

When our function is accessed as a URL over the Web this way, the Zope ORB performs two feats of magic:

·         The URL is automatically translated into a call to the Python function. The first part of the URL after the directory path (messages) names the Python module, the second (greeting) names a function or other callable object within that module, and any parameters after the ? become keyword arguments passed to the named function.

·         After the function runs, its return value automatically appears in a new page in your web browser. Zope does all the work of formatting the result as a valid HTTP response.

In other words, URLs in Zope become remote function calls, not just script invocations. The functions (and methods) called by accessing URLs are coded in Python, and may live at arbitrary places on the Net. It's as if the Internet itself becomes a Python namespace, with one module directory per server.

Zope is a server-side technology based on objects, not text streams; the main advantage of this scheme is that the details of CGI input and output are handled by Zope, while programmers focus on writing domain objects, not on text generation. When our function is accessed with a URL, Zope automatically finds the referenced object, translates incoming parameters to function call arguments, runs the function, and uses its return value to generate an HTTP response. In general, a URL like:

http://servername/dirpath/module/object1/object2/method?arg1=val1&arg2=val2

is mapped by the Zope ORB running on servername into a call to a Python object in a Python module file installed in dirpath, taking the form:

module.object1.object2.method(arg1=val1, arg2=val2)

The return value is formatted into an HTML response page sent back to the client requestor (typically a browser). By using longer paths, programs can publish complete hierarchies of objects; Zope simply uses Python's generic object-access protocols to fetch objects along the path.

As usual, a URL like those listed here can appear as the text of a hyperlink, typed manually into a web browser, or used in an HTTP request generated by a program (e.g., using Python's urllib module in a client-side script). Parameters are listed at the end of these URLs directly, but if you post information to this URL with a form instead, it works the same way:

<form action="http://www.myserver.net/cgi-bin/messages/greeting" method=POST>
    Size:  <input type=text name=size>
    Topic: <input type=text name=topic value=zope>
    <input type=submit>
</form>

Here, the action tag references our function's URL again; when the user fills out this form and presses its submit button, inputs from the form sent by the browser magically show up as arguments to the function again. These inputs are typed by the user, not hardcoded at the end of a URL, but our published function doesn't need to care. In fact, Zope recognizes a variety of parameter sources and translates them all into Python function or method arguments: form inputs, parameters at the end of URLs, HTTP headers and cookies, CGI environment variables, and more.

This just scratches the surface of what published objects can do, though. For instance, published functions and methods can use the Zope object database to save state permanently, and Zope provides many more advanced tools such as debugging support, precoded HTTP servers for use with the ORB, and finer-grained control over responses to URL requestors.

For all things Zope, visit http://www.zope.org. There, you'll find up-to-date releases, as well as documentation ranging from tutorials to references to full-blown Zope example sites. Also see this book's CD (view CD-ROM content online at http://examples.oreilly.com/python2) for a copy of the Zope distribution, current as of the time we went to press.

Python creator Guido van Rossum and his Pythonlabs team of core Python developers have moved from BeOpen to Digital Creations, home of the Zope framework introduced here. Although Python itself remains an open source system, Guido's presence at Digital Creations is seen as a strategic move that will foster future growth of both Zope and Python.

15.3 HTMLgen: Web Pages from Objects

One of the things that makes CGI scripts complex is their inherent dependence on HTML: they must embed and generate legal HTML code to build user interfaces. These tasks might be easier if the syntax of HTML were somehow removed from CGI scripts and handled by an external tool.

HTMLgen is a third-party Python tool designed to fill this need. With it, programs build web pages by constructing trees of Python objects that represent the desired page and "know" how to format themselves as HTML. Once constructed, the program asks the top of the Python object tree to generate HTML for itself, and out comes a complete, legally formatted HTML web page.

Programs that use HTMLgen to generate pages need never deal with the syntax of HTML; instead, they can use the higher-level object model provided by HTMLgen and trust it to do the formatting step. HTMLgen may be used in any context where you need to generate HTML. It is especially suited for HTML generated periodically from static data, but can also be used for HTML creation in CGI scripts (though its use in the CGI context incurs some extra speed costs). For instance, HTMLgen would be ideal if you run a nightly job to generate web pages from database contents. HTMLgen can also be used to generate documents that don't live on the Web at all; the HTML code it produces works just as well when viewed offline.

15.3.1 A Brief HTMLgen Tutorial

We can't investigate HTMLgen in depth here, but let's look at a few simple examples to sample the flavor of the system. HTMLgen is shipped as a collection of Python modules that must be installed on your machine; once it's installed, you simply import objects from the HTMLgen module corresponding to the tag you wish to generate, and make instances:

C:\Stuff\HTMLgen\HTMLgen>python
>>> from HTMLgen import *
>>> p = Paragraph("Making pages from objects is easy\n")
>>> p
<HTMLgen.Paragraph instance at 7dbb00>
>>> print p
<P>Making pages from objects is easy
</P>

Here, we make a HTMLgen.Paragraph object (a class instance), passing in the text to be formatted. All HTMLgen objects implement __str__ methods and can emit legal HTML code for themselves. When we print the Paragraph object, it emits an HTML paragraph construct. HTMLgen objects also define append methods, which do the right thing for the object type; Paragraphs simply add appended text to the end of the text block:

>>> p.append("Special < characters > are & escaped")
>>> print p
<P>Making pages from objects is easy
Special &lt; characters &gt; are &amp; escaped</P>

Notice that HTMLgen escaped the special characters (e.g., &lt; means <) so that they are legal HTML; you don't need to worry about writing either HTML or escape codes yourself. HTMLgen has one class for each HTML tag; here is the List object at work, creating an ordered list:

>>> choices = ['python', 'tcl', 'perl']
>>> print List(choices)
<UL>
<LI>python
<LI>tcl
<LI>perl
</UL>

In general, HTMLgen is smart about interpreting data structures you pass to it. For instance, embedded sequences are automatically mapped to the HTML code for displaying nested lists:

>>> choices = ['tools', ['python', 'c++'], 'food', ['spam', 'eggs']]
>>> l = List(choices)
>>> print l
<UL>
<LI>tools
    <UL>
    <LI>python
    <LI>c++
    </UL>
<LI>food
    <UL>
    <LI>spam
    <LI>eggs
    </UL>
</UL>

Hyperlinks are just as easy: simply make and print an Href object with the link target and text. (The text argument can be replaced by an image, as we'll see later in Example 15-3.)

>>> h = Href('http://www.python.org', 'python')
>>> print h
<A HREF="http://www.python.org">python</A>

To generate HTML for complete pages, we create one of the HTML document objects, append its component objects, and print the document object. HTMLgen emits a complete page's code, ready to be viewed in a browser:

>>> d = SimpleDocument(title='My doc')
>>> p = Paragraph('Web pages made easy')
>>> d.append(p)
>>> d.append(h)
>>> print d
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
 
<!-- This file generated using Python HTMLgen module. -->
<HEAD>
  <META NAME="GENERATOR" CONTENT="HTMLgen 2.2.2">
        <TITLE>My doc</TITLE>
</HEAD>
<BODY>
<P>Web pages made easy</P>
 
<A HREF="http://www.python.org">python</A>
 
</BODY> </HTML>

There are other kinds of document classes, including a SeriesDocument that implements a standard layout for pages in a series. SimpleDocument is simple indeed: it's essentially a container for other components, and generates the appropriate wrapper HTML code. HTMLgen also provides classes such as Table, Form , and so on, one for each kind of HTML construct.

Naturally, you ordinarily use HTMLgen from within a script, so you can capture the generated HTML in a file or send it over an Internet connection in the context of a CGI application (remember, printed text goes to the browser in the CGI script environment). The script in Example 15-2 does roughly what we just did interactively, but saves the printed text in a file.

Example 15-2. PP2E\Internet\Other\htmlgen101.py
import sys
from HTMLgen import *
 
p = Paragraph('Making pages from objects is easy.\n')
p.append('Special < characters > are & escaped')
 
choices = ['tools', ['python', 'c++'], 'food', ['spam', 'eggs']]
l = List(choices)
 
s = SimpleDocument(title="HTMLgen 101")
s.append(Heading(1, 'Basic tags'))
s.append(p)
s.append(l)
s.append(HR(  ))
s.append(Href('http://www.python.org', 'Python home page'))
 
if len(sys.argv) == 1:
    print s                 # send html to sys.stdout or real file
else:
    open(sys.argv[1], 'w').write(str(s))

This script also uses the HR object to format a horizontal line, and Heading to insert a header line. It either prints HTML to the standard output stream (if no arguments are listed) or writes HTML to an explicitly named file; the str built-in function invokes object __str__ methods just as print does. Run this script from the system command line to make a file, using one of the following:

C:\...\PP2E\Internet\Other>python htmlgen101.py > htmlgen101.html
C:\...\PP2E\Internet\Other>python htmlgen101.py htmlgen101.html

Either way, the script's output is a legal HTML page file, which you can view in your favorite browser by typing the output filename in the address field or clicking on the file in your file explorer. Either way, it will look a lot like Figure 15-1.

Figure 15-1. Viewing htmlgen101.py output in a browser

figs/ppy2_1501.gif

See file htmlgen101.html in the examples distribution if you wish to inspect the HTML generated to describe this page directly (it looks much like the prior document's output). Example 15-3 shows another script that does something less hardcoded: it constructs a web page to display its own source code.

Example 15-3. PP2E\Internet\Other\htmlgen101-b.py
import sys
from HTMLgen import *
d = SimpleDocument(title="HTMLgen 101 B")
 
# show this script
text = open('htmlgen101-b.py', 'r').read(  )
d.append(Heading(1, 'Source code')) 
d.append(Paragraph( PRE(text) )) 
 
# add gif and links 
site  = 'http://www.python.org'
gif   = 'PythonPoweredSmall.gif'
image = Image(gif, alt='picture', align='left', hspace=10, border=0)
 
d.append(HR(  ))
d.append(Href(site, image))
d.append(Href(site, 'Python home page'))
 
if len(sys.argv) == 1:
    print d  
else:
    open(sys.argv[1], 'w').write(str(d))

We use the PRE object here to specify preformatted text, and the Image object to generate code to display a GIF file on the generated page. Notice that HTML tag options such as alt and align are specified as keyword arguments when making HTMLgen objects. Running this script and pointing a browser at its output yields the page shown in Figure 15-2; the image at the bottom is also a hyperlink, because it was embedded inside an Href object.

Figure 15-2. Viewing htmlgen101-b.py output in a browser

figs/ppy2_1502.gif

And that (along with a few nice advanced features) is all there is to using HTMLgen. Once you become familiar with it, you can construct web pages by writing Python code, without ever needing to manually type HTML tags again. Of course, you still must write code with HTMLgen instead of using a drag-and-drop page layout tool, but that code is incredibly simple and supports the addition of more complex programming logic where needed to construct pages dynamically.

In fact, now that you're familiar with HTMLgen, you'll see that many of the HTML files shown earlier in this book could have been simplified by recoding them to use HTMLgen instead of direct HTML code. The earlier CGI scripts could have used HTMLgen as well, albeit with additional speed overheads -- printing text directly is faster than generating it from object trees, though perhaps not significantly so (CGI scripts are generally bound to network speeds, not CPU speed).

HTMLgen is open source software, but it is not a standard part of Python and must therefore be installed separately. You can find a copy of HTMLgen on this book's CD (see http://examples.oreilly.com/python2), but the Python web site should have its current location and version. Once installed, simply add the HTMLgen path to your PYTHONPATH variable setting to gain access to its modules. For more documentation about HTMLgen, see the package itself: its html subdirectory includes the HTMLgen manual in HTML format.

15.4 JPython ( Jython): Python for Java

JPython (recently renamed "Jython") is an entirely distinct implementation of the Python programming language that allows programmers to use Python as a scripting component in Java-based applications. In short, JPython makes Python code look like Java, and consequently offers a variety of technology options inherited from the Java world. With JPython, Python code may be run as client-side applets in web browsers, as server-side scripts, and in a variety of other roles. JPython is distinct from other systems mentioned in this section in terms of its scope: while it is based on the core Python language we've seen in this book, it actually replaces the underlying implementation of that language rather than augmenting it.[2]

This section briefly explores JPython and highlights some of the reasons you may or may not want to use it instead of the standard Python implementation. Although JPython is primarily of interest to programmers writing Java-based applications, it underscores integration possibilities and language definition issues that merit the attention of all Python users. Because JPython is Java-centric, you need to know something about Java development to make the most sense of JPython, and this book doesn't pretend to teach that in the next few pages. For more details, interested readers should consult other materials, including JPython documentation at http://www.jython.org.

The JPython port is now called "Jython." Although you are likely to still see it called by its original JPython name on the Net (and in this book) for some time, the new Jython title will become more common as time goes by.

15.4.1 A Quick Introduction to JPython

Functionally speaking, JPython is a collection of Java classes that run Python code. It consists of a Python compiler, written in Java, that translates Python scripts to Java bytecodes so they can be executed by a Java virtual machine -- the runtime component that executes Java programs and is used by major web browsers. Moreover, JPython automatically exposes all Java class libraries for use in Python scripts. In a nutshell, here's what comes with the JPython system:

Python-to-Java-bytecode compiler

JPython always compiles Python source code into Java bytecode and passes it to a Java virtual machine (JVM) runtime engine to be executed. A command-line compiler program, jpythonc, is also able to translate Python source code files into Java .class and .jar files, which can then be used as Java applets, beans, servlets, and so on. To the JVM, Python code run through JPython looks the same as Java code. Besides making Python code work on a JVM, JPython code also inherits all aspects of the Java runtime system, including Java's garbage collection and security models. jpythonc also imposes Java source file class rules.

Access to Java class libraries (extending )

JPython uses Java's reflection API (runtime type information) to expose all available Java class libraries to Python scripts. That is, Python programs written for the JPython system can call out to any resident Java class automatically simply by importing it. The Python-to-Java interface is completely automatic and remarkably seamless -- Java class libraries appear as though they are coded in Python. Import statements in JPython scripts may refer to either JPython modules or Java class libraries. For instance, when a JPython script imports java.awt, it gains access to all the tools available in the awt library. JPython internally creates a "dummy" Python module object to serve as an interface to awt at import time. This dummy module consists of hooks for dispatching calls from JPython code to Java class methods and automatically converting datatypes between Java and Python representations as needed. To JPython scripts, Java class libraries look and feel exactly like normal Python modules (albeit with interfaces defined by the Java world).

Unified object model

JPython objects are actually Java objects internally. In fact, JPython implements Python types as instances of a Java PyObject class. By contrast, C Python classes and types are still distinct in the current release. For instance, in JPython, the number 123 is an instance of the PyInteger Java class, and you can specify things like [].__class__ since all objects are class instances. That makes data mapping between languages simple: Java can process Python objects automatically, because they are Java objects. JPython automatically converts types between languages according to a standard type map as needed to call out to Java libraries, and selects among overloaded Java method signatures.

API for running Python from Java (embedding )

JPython also provides interfaces that allow Java programs to execute JPython code. As for embedding in C and C++, this allows Java applications to be customized by bits of dynamically written JPython code. For instance, JPython ships with a Java PythonInterpreter class, which allows Java programs to create objects that function as Python namespaces for running Python code. Each PythonInterpreter object is roughly a Python module, with methods such as exec(a string of Python code), execfile(a Python filename), and get and set methods for assigning Python global variables. Because Python objects are really instances of a Java PyObject class, an enclosing Java layer can access and process data created by Python code naturally.

Interactive Python command line

Like the standard Python implementation, JPython comes with an interactive command line that runs code immediately after it is typed. JPython's jpython program is equivalent to the python executable we've been using in this book; without arguments, it starts an interactive session. Among other things, this allows JPython programmers to import and test class components actually written in Java. This ability alone is compelling enough to interest many Java programmers.

Interface automations

Java libraries are somewhat easier to use in JPython code than in Java. That's because JPython automates some of the coding steps Java implies. For instance, callback handlers for Java GUI libraries may be simple Python functions, even though Java coders need to provide methods in fully specified classes (Java does not have first-class function objects). JPython also makes Java class data members accessible as both Python attribute names (object.name) and object constructor keyword arguments (name=value); such Python syntax is translated into calls to getName and setName accessor methods in Java classes. We'll see these automation tricks in action in the following examples. You don't have to use any of these (and they may confuse Java programmers at first glance), but they further simplify coding in JPython, and give Java class libraries a more Python-like flavor.

The net effect of all this is that JPython allows us to write Python programs that can run on any Java-aware machine -- in particular, in the context of most web browsers. More importantly, because Python programs are translated into Java bytecodes, JPython provides an incredibly seamless and natural integration between the two languages. Both walk and talk in terms of the Java model, so calls across language boundaries are trivial. With JPython's approach, it's even possible to subclass a Java class in Python and vice versa.

So why go to all this trouble to mix Python into Java environments? The most obvious answer is that JPython makes Java components easier to use: JPython scripts are typically a fraction of the size of their Java equivalents, and much less complex. More generally, the answer is really the same as it is for C and C++ environments: Python, as an easy-to-use, object-oriented scripting language, naturally complements the Java programming language.

By now, it is clear to most people that Java is too complex to serve as a scripting or rapid-development tool. But this is exactly where Python excels; by adding Python to the mix with JPython, we add a scripting component to Java systems, exactly as we do when integrating Python with C or C++. For instance, we can use JPython to quickly prototype Java systems, test Java classes interactively, and open up Java systems for end-user customization. In general, adding Python to Java development can significantly boost programmer productivity, just as it does for C and C++ systems.

JPython Versus the Python C API

Functionally, JPython is primarily an integration system: it allows us to mix Python with Java components. We also study ways to integrate Python with C and C++ components in the next part of this book. It's worth noting that we need different techniques to integrate Python with Java (such as the JPython compiler), because Java is a somewhat closed system: it prefers an all-Java mix. The C and C++ integration tools are generally less restrictive in terms of language assumptions, and any C-compatible language components will do. Java's strictness is partly due to its security goals, but the net effect is to foster integration techniques that are specific to Java alone.

On the other hand, because Java exposes runtime type information through its reflection API, JPython can largely automate the conversions and dispatching needed to access Java components from Python scripts; Python code simply imports and calls Java components. When mixing Python with C or C++, we must provide a "glue" code layer that integrates the two languages explicitly. Some of this can be automated (with the SWIG system we'll meet later in this text). No glue code is required in JPython, however, because JPython's (and Java's) developers have done all the linkage work already, in a generic fashion. It is also possible to mix in C/C++ components with Java via its native call interface ( JNI), but this can be cumbersome and may cancel out Java's reported portability and security benefits.

15.4.2 A Simple JPython Example

Once a Python program is compiled with JPython, it is all Java: the program is translated to Java bytecodes, it uses Java classes to do its work, and there is no Python left except for the original source code. Because the compiler tool itself is also written in Java, JPython is sometimes called "100% pure Java." That label may be more profound to marketeers than programmers, though, because JPython scripts are still written using standard Python syntax. For instance, Example 15-4 is a legal JPython program, derived from an example originally written by Guido van Rossum.

Example 15-4. PP2E\Internet\Other\jpython.py
############################################
# implement a simple calculator in JPython;
# evaluation runs a full expression all at 
# once using the Python eval(  ) built-in-- 
# JPython's compiler is present at run-time
############################################
 
from java import awt                   # get access to Java class libraries
from pawt import swing                 # they look like Python modules here 
 
labels = ['0', '1', '2', '+',          # labels for calculator buttons
          '3', '4', '5', '-',          # will be used for a 4x4 grid
          '6', '7', '8', '*',
          '9', '.', '=', '/' ]
 
keys = swing.JPanel(awt.GridLayout(4, 4))     # do Java class library magic
display = swing.JTextField(  )                  # Python data auto-mapped to Java
 
def push(event):                              # callback for regular keys
    display.replaceSelection(event.actionCommand)
 
def enter(event):                             # callback for the '=' key
    display.text = str(eval(display.text))    # use Python eval(  ) to run expr
    display.selectAll(  )
 
for label in labels:                          # build up button widget grid
    key = swing.JButton(label)                # on press, invoke Python funcs
    if label == '=':
        key.actionPerformed = enter
    else:
        key.actionPerformed = push
    keys.add(key)
 
panel = swing.JPanel(awt.BorderLayout(  ))      # make a swing panel
panel.add("North", display)                   # text plus key grid in middle
panel.add("Center", keys)
swing.test(panel)                             # start in a GUI viewer

The first thing you should notice is that this is genuine Python code -- JPython scripts use the same core language that we've been using all along in this book. That's good news, both because Python is such an easy language to use and because you don't need to learn a new, proprietary scripting language to use JPython. It also means that all of Python's high-level language syntax and tools are available. For example, in this script, the Python eval built-in function is used to parse and evaluate constructed expressions all at once, saving us from having to write an expression evaluator from scratch.

15.4.3 Interface Automation Tricks

The previous calculator example also illustrates two interface automations performed by JPython: function callback and attribute mappings. Java programmers may have already noticed that this example doesn't use classes. Like standard Python and unlike Java, JPython supports but does not impose OOP. Simple Python functions work fine as callback handlers. In Example 15-4, assigning key.actionPerformed to a Python function object has the same effect as registering an instance of a class that defines a callback handler method:

def push(event):
    ...
key = swing.JButton(label)
key.actionPerformed = push

This is noticeably simpler than the more Java-like:

class handler(awt.event.ActionListener):
    def actionPerformed(self, event):
        ...
key = swing.JButton(label)
key.addActionListener(handler(  ))

JPython automatically maps Python functions to the Java class method callback model. Java programmers may now be wondering why we can assign to something named key.actionPerformed in the first place. JPython's second magic feat is to make Java data members look like simple object attributes in Python code. In abstract terms, JPython code of the form:

X = Object(argument)
X.property = value + X.property

is equivalent to the more traditional and complex Java style:

X = Object(argument)
X.setProperty(value + X.getProperty(  ))

That is, JPython automatically maps attribute assignments and references to Java accessor method calls by inspecting Java class signatures (and possibly Java BeanInfo files if used). Moreover, properties can be assigned with keyword arguments in object constructor calls, such that:

X = Object(argument, property=value)

is equivalent to both this more traditional form:

X = Object(argument)
X.setProperty(value)

as well as the following, which relies on attribute name mapping:

X = Object(argument)
X.property = value

We can combine both callback and property automation for an even simpler version of the callback code snippet:

def push(event):
    ...
key = swing.JButton(label, actionPerformed=push)

You don't need to use these automation tricks, but again, they make JPython scripts simpler, and that's most of the point behind mixing Python with Java.

15.4.4 Writing Java Applets in JPython

I would be remiss if I didn't include a brief example of JPython code that more directly masquerades as a Java applet: code that lives on a server machine but is downloaded to and run on the client machine when its Internet address is referenced. Most of the magic behind this is subclassing the appropriate Java class in a JPython script, demonstrated in Example 15-5.

Example 15-5. PP2E\Internet\Other\jpython-applet.py
#######################################
# a simple java applet coded in Python
#######################################
 
from java.applet import Applet                            # get java superclass
 
class Hello(Applet):
    def paint(self, gc):                                  # on paint callback
        gc.drawString("Hello applet world", 20, 30)       # draw text message
 
if __name__ == '__main__':                                # if run standalone
    import pawt                                           # get java awt lib
    pawt.test(Hello(  ))                                    # run under awt loop

The Python class in this code inherits all the necessary applet protocol from the standard Java Applet superclass, so there is not much new to see here. Under JPython, Python classes can always subclass Java classes, because Python objects really are Java objects when compiled and run. The Python-coded paint method in this script will be automatically run from the Java AWT event loop as needed; it simply uses the passed-in gc user-interface handle object to draw a text message.

If we use JPython's jpythonc command-line tool to compile this into a Java .class file and properly store that file on a web server, it can then be used exactly like applets written in Java. Because most web browsers include a JVM, this means that such Python scripts may be used as client-side programs that create sophisticated user-interface devices within the browser, and so on.

15.4.5 JPython Trade-offs

Depending on your background, though, the somewhat less good news about JPython is that even though the calculator and applet scripts discussed here are straight Python code, the libraries they use are different than what we've seen so far. In fact, the library calls employed are radically different. The calculator, for example, relies primarily on imported Java class libraries, not standard Python libraries. You really need to understand Java's awt and swing libraries to make sense of its code, and this library skew between language implementations becomes more acute as programs grow larger. The applet example is even more Java-bound: it depends both on Java user-interface libraries and Java applet protocols.

If you are already familiar with Java libraries, this isn't an issue at all, of course. But because most of the work performed by realistic programs is done by using libraries, the fact that most JPython code relies on very different libraries makes compatibility with standard Python less potent than it may seem at first glance. To put that more strongly, apart from very trivial core language examples, many JPython programs won't run on the standard Python interpreter, and many standard Python programs won't work under JPython.

Generally, JPython presents a number of trade-offs, partly due to its relative immaturity as of this writing. I want to point out up front that JPython is indeed an excellent Java scripting tool -- arguably the best one available, and most of its trade-offs are probably of little or no concern to Java developers. For instance, if you are coming to JPython from the Java world, the fact that Java libraries are at the heart of JPython scripts may be more asset than downside. But if you are presented with a choice between the standard and Java-based Python language implementations, some of JPython's implications are worth knowing about:

JPython is not yet fully compatible with the standard Python language

At this writing, JPython is not yet totally compatible with the standard Python language, as defined by the original C implementation. In subtle ways, the core Python language itself works differently in JPython. For example, until very recently, assigning file-like objects to the standard input sys.stdin failed, and exceptions were still strings, not class objects. The list of incompatibilities (viewable at http://www.jython.org) will likely shrink over time, but will probably never go away completely. Moreover, new language features are likely to show up later in JPython than in the standard C-based implementation.

JPython requires programmers to learn Java development too

Language syntax is only one aspect of programming. The library skew mentioned previously is just one example of JPython's dependence on the Java system. Not only do you need to learn Java libraries to get real work done in JPython, but you also must come to grips with the Java programming environment in general. Many standard Python libraries have been ported to JPython, and others are being adopted regularly. But major Python tools such as Tkinter GUIs may show up late or never in JPython (and instead are replaced with Java tools).[3] In addition, many core Python library features cannot be supported in JPython, because they would violate Java's security constraints. For example, the os.system call for running shell commands may never become available in JPython.

JPython applies only where a JVM is installed or shipped

You need the Java runtime to run JPython code. This may sound like a non-issue given the pervasiveness of the Internet, but I have very recently worked in more than one company for which delivering applications to be run on JVMs was not an option. Simply put, there was no JVM to be found at the customer's site. In such scenarios, JPython is either not an option, or will require you to ship a JVM with your application just to run your compiled JPython code. Shipping the standard Python system with your products is completely free; shipping a JVM may require licensing and fees. This may become less of a concern as robust open source JVMs appear. But if you wish to use JPython today and can't be sure that your clients will be able to run your systems in Java-aware browsers (or other JVM components), you should consider the potential costs of shipping a Java runtime system with your products.[4]

JPython doesn't support Python extension modules written in C or C++

At present, no C or C++ extension modules written to work with the C Python implementation will work with JPython. This is a major impediment to deploying JPython outside the scope of applications run in a browser. To date, the half-million-strong Python user community has developed thousands of extensions written for C Python, and these constitute much of the substance of the Python development world. JPython's current alternative is to instead expose Java class libraries and ask programmers to write new extensions in Java. But this dismisses a vast library of prior and future Python art. In principle, C extensions could be supported by Java's native call interface, but it's complex, has not been done, and can negate Java portability and security.

JPython is noticeably slower than C Python

Today, Python code generally runs slower under the JPython implementation. How much slower depends on what you test, which JVM you use to run your test, whether a just-in-time ( JIT) compiler is available, and which tester you cite. Posted benchmarks have run the gamut from 1.7 times slower than C Python, to 10 times slower, and up to 100 times slower. Regardless of the exact number, the extra layer of logic JPython requires to map Python to the Java execution model adds speed overheads to an already slow JVM and makes it unlikely that JPython will ever be as fast as the C Python implementation. Given that C Python is already slower than compiled languages like C, the additional slowness of JPython makes it less useful outside the realm of Java scripting. Furthermore, the Swing GUI library used by JPython scripts is powerful, but generally considered to be the slowest and largest of all Python GUI options. Given that Python's Tkinter library is a portable and standard GUI solution, Java's proprietary user-interface tools by themselves are probably not reason enough to use the JPython implementation.

JPython is less robust than C Python

At this writing, JPython is substantially more buggy than the standard C implementation of the language. This is certainly due to its younger age and smaller user base and varies from JVM to JVM, but you are more likely to hit snags in JPython. In contrast, C Python has been amazingly bug-free since its introduction in 1990.

JPython may be less portable than C Python

It's also worth noting that as of this writing, the core Python language is far more portable than Java (despite marketing statements to the contrary). Because of that, deploying standard Python code with the Java-based JPython implementation may actually lessen its portability. Naturally, this depends on the set of extensions you use, but standard Python runs today on everything from handheld PDAs and PCs to Cray supercomputers and IBM mainframes.

Some incompatibilities between JPython and standard Python can be very subtle. For instance, JPython inherits all of the Java runtime engine's behavior, including Java security constraints and garbage collection. Java garbage collection is not based on standard Python's reference count scheme, and therefore can automatically collect cyclic objects.[5] It also means that some common Python programming idioms won't work. For example, it's typical in Python to code file-processing loops in this form:

for filename in bigfilenamelist:
    text = open(filename).read(  )
    dostuffwith(text)  

That works because files are automatically closed when garbage-collected in standard Python, and we can be sure that the file object returned by the open call will be immediately garbage collected (it's a temporary, so there are no more references as soon as we call read). It won't work in JPython, though, because we can't be sure when the temporary file object will be reclaimed. To avoid running out of file descriptors, we usually need to code this differently for JPython:

for filename in bigfilenamelist:
    file = open(filename)
    text = file.read(  )
    dostuffwith(text)
    file.close(  )  

You may face a similar implementation mismatch if you assume that output files are immediately closed: open(name,'w').write(bytes) collects and closes the temporary file object and hence flushes the bytes out to the file under the standard C implementation of Python only, while JPython instead collects the file object at some arbitrary time in the future. In addition to such file-closing concerns, Python __del__ class destructors are never called in JPython, due to complications associated with object termination.

15.4.6 Picking Your Python

Because of concerns such as those just mentioned, the JPython implementation of the Python language is probably best used only in contexts where Java integration or web browser interoperability are crucial design concerns. You should always be the judge, of course, but the standard C implementation seems better suited to most other Python applications. Still, that leaves a very substantial domain to JPython -- almost all Java systems and programmers can benefit from adding JPython to their tool sets.

JPython allows programmers to write programs that use Java class libraries in a fraction of the code and complexity required by Java-coded equivalents. Hence, JPython excels as an extension language for Java-based systems, especially those that will run in the context of web browsers. Because Java is a standard component of most web browsers, JPython scripts will often run automatically without extra install steps on client machines. Furthermore, even Java-coded applications that have nothing to do with the Web can benefit from JPython's ease of use; its seamless integration with Java class libraries makes JPython simply the best Java scripting and testing tool available today.

For most other applications, though, the standard Python implementation, possibly integrated with C and C++ components, is probably a better design choice. The resulting system will likely run faster, cost less to ship, have access to all Python extension modules, be more robust and portable, and be more easily maintained by people familiar with standard Python.

On the other hand, I want to point out again that the trade-offs listed here are mostly written from the Python perspective; if you are a Java developer looking for a scripting tool for Java-based systems, many of these detriments may be of minor concern. And to be fair, some of JPython's problems may be addressed in future releases; for instance, its speed will probably improve over time. Yet even as it exists today, JPython clearly makes an ideal extension-language solution for Java-based applications, and offers a much more complete Java scripting solution than those currently available for other scripting languages.[6]

For more details, see the JPython package included on this book's CD (see http://examples.oreilly.com/python2), and consult the JPython home page, currently maintained at http://www.jython.org. At least one rumor has leaked concerning an upcoming JPython book as well, so check http://www.python.org for developments on this front. See also the sidebar later in this chapter about the new Python implementation for the C#/.NET environment on Windows. It seems likely that there will be three Pythons to choose from very soon (not just two), and perhaps more in the future. All will likely implement the same core Python language we've used in this text, but may emphasize alternative integration schemes, application domains, development environments, and so on.

15.5 Grail: A Python-Based Web Browser

I briefly mentioned the Grail browser near the start of Chapter 10. Many of Python's Internet tools date back to and reuse the work that went into Grail, a full-blown Internet web browser that:

·         Is written entirely in Python

·         Uses the Tkinter GUI API to implement its user interface and render pages

·         Downloads and runs Python/Tkinter scripts as client-side applets

As mentioned earlier, Grail was something of a proof-of-concept for using Python to code large-scale Internet applications. It implements all the usual Internet protocols and works much like common browsers such as Netscape and Internet Explorer. Grail pages are implemented with the Tk text widgets that we met in the GUI part of this book.

More interestingly, the Grail browser allows applets to be written in Python. Grail applets are simply bits of Python code that live on a server but are run on a client. If an HTML document references a Python class and file that live on a server machine, Grail automatically downloads the Python code over a socket and runs it on the client machine, passing it information about the browser's user interface. The downloaded Python code may use the passed-in browser context information to customize the user interface, add new kinds of widgets to it, and perform arbitrary client-side processing on the local machine. Roughly speaking, Python applets in Grail serve the same purposes as Java applets in common Internet browsers: they perform client-side tasks that are too complex or slow to implement with other technologies such as server-side CGI scripts and generated HTML.

15.5.1 A Simple Grail Applet Example

Writing Grail applets is remarkably straightforward. In fact, applets are really just Python/Tkinter programs; with a few exceptions, they don't need to "know" about Grail at all. Let's look at a short example; the code in Example 15-6 simply adds a button to the browser, which changes its appearance each time it's pressed (its bitmap is reconfigured in the button callback handler).

There are two components to this page definition: an HTML file and the Python applet code it references. As usual, the grail.html HTML file that follows describes how to format the web page when the HTML's URL address is selected in a browser. But here, the APP tag also specifies a Python applet (class) to be run by the browser. By default, the Python module is assumed to have the same name as the class and must be stored in the same location (URL directory) as the HTML file that references it. Additional APP tag options can override the applet's default location.

Example 15-6. PP2E\Internet\Other\grail.html
<HEAD>
<TITLE>Grail Applet Test Page</TITLE>
</HEAD>
<BODY>
<H1>Test an Applet Here!</H1>
Click this button!
<APP CLASS=Question>
</BODY>

The applet file referenced by the HTML is a Python script that adds widgets to the Tkinter-based Grail browser. Applets are simply classes in Python modules. When the APP tag is encountered in the HTML, the Grail browser downloads the Question.py source code module (Example 15-7) and makes an instance of its Question class, passing in the browser widget as the master (parent). The master is the hook that lets applets attach new widgets to the browser itself; applets extend the GUI constructed by the HTML in this way.

Example 15-7. PP2E\Internet\Other\Question.py
# Python applet file: Question.py
# in the same location (URL) as the html file
# that references it; adds widgets to browser;
 
from Tkinter import *
 
class Question:                          # run by grail?
    def __init__(self, parent):          # parent=browser
        self.button = Button(parent, 
                             bitmap='question',
                             command=self.action)
        self.button.pack(  )
 
    def action(self):
        if self.button['bitmap'] == 'question':
            self.button.config(bitmap='questhead')
        else:
            self.button.config(bitmap='question')
 
if __name__ == '__main__':
    root = Tk(  )                    # run standalone?
    button = Question(root)        # parent=Tk: top-level
    root.mainloop(  )

Notice that nothing in this class is Grail- or Internet-specific; in fact, it can be run (and tested) standalone as a Python/Tkinter program. Figure 15-3 is what it looks like if run standalone on Windows (with a Tk application root object as the master); when run by Grail (with the browser/page object as the master), the button appears as part of the web page instead. Either way, its bitmap changes on each press.

Figure 15-3. Running a Grail applet standalone

figs/ppy2_1503.gif

In effect, Grail applets are simply Python modules that are linked into HTML pages by using the APP tag. The Grail browser downloads the source code identified by an APP tag and runs it locally on the client during the process of creating the new page. New widgets added to the page (like the button here) may run Python callbacks on the client later, when activated by the user.

Applets interact with the user by creating one or more arbitrary Tk widgets. Of course, the previous example is artificial; but notice that the button's callback handler could do anything we can program in Python: updating persistent information, popping up new user interaction dialogs, calling C extensions, etc. However, by working in concert with Python's restricted execution mode (discussed later) applets can be prevented from performing potentially unsafe operations, like opening local files and talking over sockets.

Figure 15-4 shows a screen shot of Grail in action, hinting at what's possible with Python code downloaded to and run on a client. It shows the animated "game of life" demo; everything you see here is implemented using Python and the Tkinter GUI interface. To run the demo, you need to install Python with the Tk extension and download the Grail browser to run locally on your machine or copy it off the CD. Then point your browser to a URL where any Grail demo lives.

Figure 15-4. A Grail applet demo

figs/ppy2_1504.gif

Having said all that, I should add that Grail is no longer formally maintained, and is now used primarily for research purposes (Guido never intended for Grail to put Netscape or Microsoft out of business). You can still get it for free (find it at http://www.python.org) and use it for surfing the Web or experimenting with alternative web browser concepts, but it is not the active project it was a few years ago.

If you want to code web browser applets in Python, the more common approach today is to use the JPython system described previously to compile your scripts into Java applet bytecode files, and use Java libraries for your scripts' user-interface portions. Embedding Python code in HTML with the Active Scripting extension described later in this chapter is yet another way to integrate client-side code.

Alas, this advice may change over time too. For instance, if Tkinter is ever ported to JPython, you will be able to build GUIs in applet files with Tkinter, rather than with Java class libraries. In fact, as I wrote this, an early release of a complete Java JNI implementation of the Python built-in _tkinter module (which allows JPython scripts to import and use the Tkinter module in the standard Python library) was available on the Net at http://jtkinter.sourceforge.net. Whether this makes Tkinter a viable GUI option under JPython or not, all current approaches are subject to change. Grail, for instance, was a much more prominent tool when the first edition of this book was written. As ever, be sure to keep in touch with developments in the Python community at http://www.python.org; clairvoyance isn't all it's cracked up to be.

15.6 Python Restricted Execution Mode

In prior chapters, I've been careful to point out the dangers of running arbitrary Python code that was shipped across the Internet. There is nothing stopping a malicious user, for instance, from sending a string such as os.system('rm *') in a form field where we expect a simple number; running such a code string with the built-in eval function or exec statement may, by default, really work -- it might just delete all the files in the server or client directory where the calling Python script runs!

Moreover, a truly malicious user can use such hooks to view or download password files, and otherwise access, corrupt, or overload resources on your machine. Alas, where there is a hole, there is probably a hacker. As I've cautioned, if you are expecting a number in a form, you should use simpler string conversion tools such as int or string.atoi instead of interpreting field contents as Python program syntax with eval.

But what if you really want to run Python code transmitted over the Net? For instance, you may wish to put together a web-based training system that allows users to run code from a browser. It is possible to do this safely, but you need to use Python's restricted execution mode tools when you ask Python to run the code. Python's restricted execution mode support is provided in two standard library modules, rexec and bastion. rexec is the primary interface to restricted execution, while bastion can be used to restrict and monitor access to object attributes.

On Unix systems, you can also use the standard resource module to limit things like CPU time and memory consumption while the code is running. Python's library manual goes into detail on these modules, but let's take a brief look at rexec here.

15.6.1 Using rexec

The restricted execution mode implemented by rexec is optional -- by default, all Python code runs with full access to everything available in the Python language and library. But when we enable restricted mode, code executes in what is commonly called a "sandbox" model -- access to components on the local machine is limited. Operations that are potentially unsafe are either disallowed or must be approved by code you can customize by subclassing. For example, the script in Example 15-8 runs a string of program code in a restricted environment and customizes the default rexec class to restrict file access to a single, specific directory.

Example 15-8. PP2E\Internet\Other\restricted.py
#!/usr/bin/python
import rexec, sys
Test = 1
if sys.platform[:3] == 'win':
    SafeDir = r'C:\temp'
else:
    SafeDir = '/tmp/'
 
def commandLine(prompt='Input (ctrl+z=end) => '):
    input = ''
    while 1:
        try:
            input = input + raw_input(prompt) + '\n'
        except EOFError:
            break
    print # clear for Windows
    return input
 
if not Test:
    import cgi                         # run on the web? - code from form
    form  = cgi.FieldStorage(  )         # else input interactively to test
    input = form['input'].value
else:
    input = commandLine(  )
 
# subclass to customize default rules: default=write modes disallowed
class Guard(rexec.RExec):
    def r_open(self, name, mode='r', bufsz=-1):
        if name[:len(SafeDir)] != SafeDir:
            raise SystemError, 'files outside %s prohibited' % SafeDir
        else:
            return open(name, mode, bufsz)
 
# limit system resources (not available on Windows)
if sys.platform[:3] != 'win':
    import resource            # at most 5 cpu seconds
    resource.setrlimit(resource.RLIMIT_CPU, (5, 5))
 
# run code string safely
guard = Guard(  )
guard.r_exec(input)      # ask guard to check and do opens

When we run Python code strings with this script on Windows, safe code works as usual, and we can read and write files that live in the C:\temp directory, because our custom Guard class's r_open method allows files with names beginning with "C:\temp" to proceed. The default r_open in rexec.RExec allows all files to be read, but all write requests fail. Here, we type code interactively for testing, but it's exactly as if we received this string over the Internet in a CGI script's form field:

C:\...\PP2E\Internet\Other>python restricted.py
Input (ctrl+z=end) => x = 5
Input (ctrl+z=end) => for i in range(x): print 'hello%d' % i,
Input (ctrl+z=end) => hello0 hello1 hello2 hello3 hello4
 
C:\...\PP2E\Internet\Other>python restricted.py
Input (ctrl+z=end) => open(r'C:\temp\rexec.txt', 'w').write('Hello rexec\n')
Input (ctrl+z=end) =>
 
C:\...\PP2E\Internet\Other>python restricted.py
Input (ctrl+z=end) => print open(r'C:\temp\rexec.txt', 'r').read(  )
Input (ctrl+z=end) => Hello rexec

On the other hand, attempting to access files outside the allowed directory will fail in our custom class, as will inherently unsafe things such as opening sockets, which rexec always makes out of bounds by default:

C:\...\PP2E\Internet\Other>python restricted.py
Input (ctrl+z=end) => open(r'C:\stuff\mark\hack.txt', 'w').write('BadStuff\n')
Input (ctrl+z=end) => Traceback (innermost last):
  File "restricted.py", line 41, in ?
    guard.r_exec(input)      # ask guard to check and do opens
  File "C:\Program Files\Python\Lib\rexec.py", line 253, in r_exec
    exec code in m.__dict__
  File "<string>", line 1, in ?
  File "restricted.py", line 30, in r_open
    raise SystemError, 'files outside %s prohibited' % SafeDir
SystemError: files outside C:\temp prohibited
 
C:\...\PP2E\Internet\Other>python restricted.py
Input (ctrl+z=end) => open(r'C:\stuff\mark\secret.py', 'r').read(  )
Input (ctrl+z=end) => Traceback (innermost last):
  File "restricted.py", line 41, in ?
    guard.r_exec(input)      # ask guard to check and do opens
  File "C:\Program Files\Python\Lib\rexec.py", line 253, in r_exec
    exec code in m.__dict__
  File "<string>", line 1, in ?
  File "restricted.py", line 30, in r_open
    raise SystemError, 'files outside %s prohibited' % SafeDir
SystemError: files outside C:\temp prohibited
 
C:\...\PP2E\Internet\Other>python restricted.py
Input (ctrl+z=end) => from socket import *
Input (ctrl+z=end) => s = socket(AF_INET, SOCK_STREAM)
Input (ctrl+z=end) => Traceback (innermost last):
  File "restricted.py", line 41, in ?
    guard.r_exec(input)      # ask guard to check and do opens
  ...part ommitted...  
  File "C:\Program Files\Python\Lib\ihooks.py", line 324, in load_module
    exec code in m.__dict__
  File "C:\Program Files\Python\Lib\plat-win\socket.py", line 17, in ?
    _realsocketcall = socket
NameError: socket

And what of that nasty rm * problem? It's possible in normal Python mode like everything else, but not when running in restricted mode. Python makes some potentially dangerous attributes of the os module, such as system (for running shell commands), disallowed in restricted mode:

C:\temp>python
>>> import os
>>> os.system('ls -l rexec.txt')
-rwxrwxrwa   1 0        0             13 May  4 15:45 rexec.txt
0
>>>
C:\temp>python %X%\Part2\internet\other\restricted.py
Input (ctrl+z=end) => import os
Input (ctrl+z=end) => os.system('rm *.*')
Input (ctrl+z=end) => Traceback (innermost last):
  File "C:\PP2ndEd\examples\Part2\internet\other\restricted.py", line 41, in ?
    guard.r_exec(input)      # ask guard to check and do opens
  File "C:\Program Files\Python\Lib\rexec.py", line 253, in r_exec
    exec code in m.__dict__
  File "<string>", line 2, in ?
AttributeError: system

Internally, restricted mode works by taking away access to certain APIs (imports are controlled, for example) and changing the __builtins__ dictionary in the module where the restricted code runs to reference a custom and safe version of the standard __builtin__ built-in names scope. For instance, the custom version of name __builtins_ _.open references a restricted version of the standard file open function. rexec also keeps customizable lists of safe built-in modules, safe os and sys module attributes, and more. For the rest of this story, see the Python library manual.

Restricted execution mode is not necessarily tied to Internet scripting. It can be useful any time you need to run Python code of possibly dubious origin. For instance, we will use Python's eval and exec built-ins to evaluate arithmetic expressions and input commands in a calculator program later in the book. Because user input is evaluated as executable code in this context, there is nothing preventing a user from knowingly or unknowingly entering code that can do damage when run (e.g., they might accidentally type Python code that deletes files). However, the risk of running raw code strings becomes more prevalent in applications that run on the Web, since they are inherently open to both use and abuse. Although JPython inherits the underlying Java security model, pure Python systems such as Zope, Grail, and custom CGI scripts can all benefit from restricted execution of strings sent over the Net.

15.7 XML Processing Tools

Python ships with XML parsing support in its standard library and plays host to a vigorous XML special-interest group. XML (eXtended Markup Language) is a tag-based markup language for describing many kinds of structured data. Among other things, it has been adopted in roles such as a standard database and Internet content representation by many companies. As an object-oriented scripting language, Python mixes remarkably well with XML's core notion of structured document interchange, and promises to be a major player in the XML arena.

XML is based upon a tag syntax familiar to web page writers. Python's xmllib library module includes tools for parsing XML. In short, this XML parser is used by defining a subclass of an XMLParser Python class, with methods that serve as callbacks to be invoked as various XML structures are detected. Text analysis is largely automated by the library module. This module's source code, file xmllib.py in the Python library, includes self-test code near the bottom that gives additional usage details. Python also ships with a standard HTML parser, htmllib, that works on similar principles and is based upon the sgmllib SGML parser module.

Unfortunately, Python's XML support is still evolving, and describing it is well beyond the scope of this book. Rather than going into further details here, I will instead point you to sources for more information:

Standard library

First off, be sure to consult the Python library manual for more on the standard library's XML support tools. At the moment, this includes only the xmllib parser module, but may expand over time.

PyXML SIG package

At this writing, the best place to find Python XML tools and documentation is at the XML SIG (Special Interest Group) web page at http://www.python.org (click on the "SIGs" link near the top). This SIG is dedicated to wedding XML technologies with Python, and publishes a free XML tools package distribution called PyXML. That package contains tools not yet part of the standard Python distribution, including XML parsers implemented in both C and Python, a Python implementation of SAX and DOM (the XML Document Object Model), a Python interface to the Expat parser, sample code, documentation, and a test suite.

Third-party tools

You can also find free, third-party Python support tools for XML on the Web by following links at the XML SIGs web page. These include a DOM implementation for CORBA environments (4DOM) that currently supports two ORBs (ILU and Fnorb) and much more.

Documentation

As I wrote these words, a book dedicated to XML processing with Python was on the eve of its publication; check the books list at http://www.python.org or your favorite book outlet for details.

Given the rapid evolution of XML technology, I wouldn't wager on any of these resources being up to date a few years after this edition's release, so be sure to check Python's web site for more recent developments on this front.

In fact, the XML story changed substantially between the time I wrote this section and when I finally submitted it to O'Reilly. In Python 2.0, some of the tools described here as the PyXML SIG package have made their way into a standard xml module package in the Python library. In other words, they ship and install with Python itself; see the Python 2.0 library manual for more details. O'Reilly has a book in the works on this topic called Python and XML.

15.8 Windows Web Scripting Extensions

Although this book doesn't cover the Windows-specific extensions available for Python in detail, a quick look at Internet scripting tools available to Windows programmers is in order here. On Windows, Python can be used as a scripting language for both the Active Scripting and Active Server Pages systems, which provide client- and server-side control of HTML-based applications. More generally, Python programs can also take the role of COM and DCOM clients and servers on Windows.

You should note that at this point in time, everything in this section works only on Microsoft tools, and HTML embedding runs only under Internet Explorer (on the client) and Internet Information Server (on the server). If you are interested in portability, other systems in this chapter may address your needs better (see JPython's client-side applets, PSP's server-side scripting support, and Zope's server-side object publishing model). On the other hand, if portability isn't a concern, the following techniques provide powerful ways to script both sides of a web conversation.

15.8.1 Active Scripting: Client-Side Embedding

Active Scripting is a technology that allows scripting languages to communicate with hosting applications. The hosting application provides an application-specific object model API, which exposes objects and functions for use in the scripting language programs.

In one if its more common roles, Active Scripting provides support that allows scripting language code embedded in HTML pages to communicate with the local web browser through an automatically exposed object model API. Internet Explorer, for instance, utilizes Active Scripting to export things such as global functions and user-interface objects for use in scripts embedded in HTML. With Active Scripting, Python code may be embedded in a web page's HTML between special tags; such code is executed on the client machine and serves the same roles as embedded JavaScript and VBScript.

15.8.1.1 Active Scripting basics

Unfortunately, embedding Python in client-side HTML works only on machines where Python is installed and Internet Explorer is configured to know about the Python language (by installing the win32all extension package discussed in a moment). Because of that, this technology doesn't apply to most of the browsers in cyberspace today. On the other hand, if you can configure the machines on which a system is to be delivered, this is a nonissue.

Before we get into a Python example, let's look at the way standard browser installations handle other languages embedded in HTML. By default, IE (Internet Explorer) knows about JavaScript (really, Microsoft's Jscript implementation of it) and VBScript (a Visual Basic derivative), so you can embed both of those languages in any delivery scenario. For instance, the HTML file in Example 15-9 embeds JavaScript code, the default IE scripting language on my PC.

Example 15-9. PP2E\Internet\Other\activescript-js.html
<HTML>
<BODY>
<H1>Embedded code demo: JavaScript</H1>
<SCRIPT>
 
// popup 3 alert boxes while this page is
// being constructed on client side by IE;
// JavaScript is the default script language,
// and alert is an automatically exposed name
 
function message(i) {
    if (i == 2) {
        alert("Finished!");
    } 
    else {
        alert("A JavaScript-generated alert => " + i);
    }
}
 
for (count = 0; count < 3; count += 1) {
    message(count);
}
 
</SCRIPT>
</BODY></HTML>

All the text between the <SCRIPT> and </SCRIPT> tags in this file is JavaScript code. Don't worry about its syntax -- this book isn't about JavaScript, and we'll see a simpler Python equivalent in a moment. The important thing to know is how this code is used by the browser.

When a browser detects a block of code like this while building up a new page, it locates the appropriate interpreter, tells the interpreter about global object names, and passes the code to the interpreter for execution. The global names become variables in the embedded code and provide links to browser context. For instance, the name alert in the code block refers to a global function that creates a message box. Other global names refer to objects that give access to the browser's user interface: window objects, document objects, and so on.

This HTML file can be run on the local machine by clicking on its name in a file explorer. It can also be stored on a remote server and accessed via its URL in a browser. Whichever way you start it, three pop-up alert boxes created by the embedded code appear during page construction. Figure 15-5 shows one under IE.

Figure 15-5. IE running embedded JavaScript code

figs/ppy2_1505.gif

The VBScript (Visual Basic) version of this example appears in Example 15-10, so you can compare and contrast.[7] It creates three similar pop-ups when run, but the windows say "VBScript" everywhere. Notice the Language option in the <SCRIPT> tag here; it must be used to declare the language to IE (in this case, VBScript) unless your embedded scripts speak in its default tongue. In the JavaScript version in Example 15-9, Language wasn't needed, because JavaScript was the default language. Other than this declaration, IE doesn't care what language you insert between <SCRIPT> and </SCRIPT>; in principle, Active Scripting is a language-neutral scripting engine.

Example 15-10. PP2E\Internet\Other\activescript-vb.html
<HTML>
<BODY>
<H1>Embedded code demo: VBScript</H1>
<SCRIPT Language=VBScript>
 
' do the same but with embedded VBScript;
' the Language option in the SCRIPT tag
' tells IE which interpreter to use
 
sub message(i)
    if i = 2 then 
        alert("Finished!")
    else
        alert("A VBScript-generated alert => " & i)
    end if
end sub
 
for count = 0 to 2 step 1
    message(count)
next
 
</SCRIPT>
</BODY></HTML>

So how about putting Python code in that page, then? Alas, we need to do a bit more first. Although IE is language-neutral in principle, it does support some languages better than others, at least today. Moreover, other browsers may be more rigid and not support the Active Scripting concept at all.

For example, on my machine and with my installed browser versions (IE 5, Netscape 4), the previous JavaScript example works on both IE and Netscape, but the Visual Basic version works only on IE. That is, IE directly supports VBScript and JavaScript, while Netscape handles only JavaScript. Neither browser as installed can run embedded Python code, even though Python itself is already installed on my machine. There's more to do before we can replace the JavaScript and VBScript code in our HTML pages with Python.

15.8.1.2 Teaching IE about Python

To make the Python version work, you must do more than simply installing Python on your PC. You must also register Python with IE. Luckily, this is mostly an automatic process, thanks to the work of the Python Windows extension developers; you merely need to install a package of Windows extensions.

Here's how this works. The tool to perform the registration is part of the Python Win32 extensions, which are not included in the standard Python system. To make Python known to IE, you must:

1.      First install the standard Python distribution your PC (you should have done this already by now -- simply double-click the Python self-installer).

2.      Then download and install the win32all package separately from http://www.python.org (you can also find it at http://examples.oreilly.com/python2).[8]

The win32all package includes the win32COM extensions for Python, plus the PythonWin IDE (a simple interface for editing and running Python programs, written with the MFC interfaces in win32all) and lots of other Windows-specific tools not covered in this book. The relevant point here is that installing win32all automatically registers Python for use in IE. If needed, you can also perform this registration manually by running the following Python script file located in the win32 extensions package: python\win32comext\axscript\client\pyscript.py.

Once you've registered Python with IE this way, Python code embedded in HTML works just like our JavaScript and VBScript examples -- IE presets Python global names to expose its object model and passes the embedded code to your Python interpreter for execution. Example 15-11 shows our alerts example again, programmed with embedded Python code.

Example 15-11. PP2E\Internet\Other\activescript-py.html
<HTML>
<BODY>
<H1>Embedded code demo: Python</H1>
<SCRIPT Language=Python>
 
# do the same but with python, if configured;
# embedded python code shows three alert boxes
# as page is loaded; any Python code works here,
# and uses auto-imported global funcs and objects
 
def message(i):
    if i == 2:
        alert("Finished!")
    else:
        alert("A Python-generated alert => %d" % i)
 
for count in range(3): message(count)
 
</SCRIPT>
</BODY></HTML>

Figure 15-6 shows one of the three pop-ups you should see when you open this file in IE after installing the win32all package (you can simply click on the file's icon in Windows' file explorer to open it). Note that the first time you access this page, IE may need to load Python, which could induce an apparent delay on slower machines; later accesses generally start up much faster because Python has already been loaded.

Figure 15-6. IE running embedded Python code

figs/ppy2_1506.gif

Regrettably, this example still works only on IE Version 4 and higher, and not on the Netscape browser on my machine (and reportedly fails on Netscape 6 and Mozilla as well). In other words (at least today and barring new browser releases), not only is Active Scripting a Windows-only technology, but using it as a client-side web browser tool for Python works only on machines where Python is installed and registered to IE, and even then only under IE.

It's also worth knowing that even when you do get Python to work under IE, your Python code runs in restricted mode, with limited access to machine resources (e.g., your code can't open sockets -- see the rexec module description earlier in this chapter for details). That's probably what you want when running code downloaded over the Net, and can be changed locally (the implementation is coded in Python), but it limits the utility and scope of your Python scripts embedded in HTML.

The good news is that this does work -- with a simple configuration step, Python code can be embedded in HTML and be made to run under IE, just like JavaScript and VBScript. For many applications, the Windows and IE-only constraint is completely acceptable. Active Scripting is a straightforward way to add client-side Python scripting for web browsers, especially when you can control the target delivery environment. For instance, machines running on an Intranet within a company may have well-known configurations. In such scenarios, Active Scripting lets developers apply all the power of Python in their client-side scripts.

15.8.2 Active Server Pages: Server-Side Embedding

Active Server Pages (ASPs) use a similar model: Python code is embedded in the HTML that defines a web page. But ASP is a server-side technology; embedded Python code runs on the server machine and uses an object-based API to dynamically generate portions of the HTML that is ultimately sent back to the client-side browser. As we saw in the last three chapters, Python server-side CGI scripts embed and generate HTML, and deal with raw inputs and output streams. By contrast, server-side ASP scripts are embedded in HTML and use a higher-level object model to get their work done.

Just like client-side Active Scripting, ASP requires you to install Python and the Python Windows win32all extensions package. But because ASP runs embedded code on the server, you need to configure Python only on one machine. Like CGI scripts in general, this generally makes Python ASP scripting much more widely applicable, as you don't need Python support on every client. Unlike CGI scripts, however, ASP requires you to run Microsoft's IIS (Internet Information Server) today.

15.8.2.1 A short ASP example

We can't discuss ASP in any real detail here, but here's an example of what an ASP file looks like when Python code is embedded:

<HTML><BODY>
<SCRIPT RunAt=Server Language=Python>
#
# code here is run at the server 
#
</SCRIPT>
</BODY></HTML>

As before, code may be embedded inside SCRIPT tag pairs. This time, we tell ASP to run the code at the server with the RunAt option; if omitted, the code and its tags are passed through to the client and run by IE (if configured properly). ASP also recognizes code enclosed in <% and %> delimiters and allows a language to be specified for the entire page. This form is more handy if there are multiple chunks of code in a page, as shown in Example 15-12.

Example 15-12. PP2E\Internet\Other\asp-py.asp
<HTML><BODY>
<%@ Language=Python %>
 
<%
#
# Python code here, using global names Request (input), Response (output), etc.
#
Response.Write("Hello ASP World from URL %s" % 
                          Request.ServerVariables("PATH_INFO"))
%>
</BODY></HTML>

However the code is marked, ASP executes it on the server after passing in a handful of named objects that the code may use to access input, output and server context. For instance, the automatically imported Request and Response objects give access to input and output context. The code here calls a Response.Write method to send text back to the browser on the client (much like a print statement in a simple Python CGI script), as well as Request.ServerVariables to access environment variable information. To make this script run live, you'll need to place it in the proper directory on a server machine running IIS with ASP support.

15.8.3 The COM Connection

At their core, both IE and IIS are based on the COM (Component Object Model) integration system -- they implement their object APIs with standard COM interfaces and look to the rest of the world like any other COM object. From a broader perspective, Python can be used as both a scripting and implementation language for any COM object. Although the COM mechanism used to run Python code embedded within HTML is automated and hidden, it can also be employed explicitly to make Python programs take the role of both COM clients and servers. COM is a general integration technology and is not strictly tied to Internet scripting, but a brief introduction here might help demystify some of the Active Scripting magic behind HTML embedding.

15.8.3.1 A brief introduction to COM

COM is a Microsoft technology for language-neutral component integration. It is sometimes marketed as ActiveX, partially derived from a system called OLE, and is the technological heart of the Active Scripting system we met earlier.[9] COM also sports a distributed extension known as DCOM that allows communicating objects to be run on remote machines. Implementing DCOM often simply involves running through Windows registry configuration steps to associate servers with machines on which they run.

Operationally, COM defines a standard way for objects implemented in arbitrary languages to talk to each other, using a published object model. For example, COM components can be written in and used by programs written in Visual Basic, Visual C++, Delphi, PowerBuilder, and Python. Because the COM indirection layer hides the differences between all the languages involved, it's possible for Visual Basic to use an object implemented in Python and vice versa.

Moreover, many software packages register COM interfaces to support end-user scripting. For instance, Microsoft Excel publishes an object model that allows any COM-aware scripting language to start Excel and programmatically access spreadsheet data. Similarly, Microsoft Word can be scripted through COM to automatically manipulate documents. COM's language-neutrality means that programs written in any programming language with a COM interface, including Visual Basic and Python, can be used to automate Excel and Word processing.

Of most relevance to this chapter, Active Scripting also provides COM objects that allow scripts embedded in HTML to communicate with Microsoft's Internet Explorer (on the client) and Internet Information Server (on the server). Both systems register their object models with Windows such that they can be invoked from any COM-aware language. For example, when Internet Explorer extracts and executes Python code embedded in HTML, some Python variable names are automatically preset to COM object components that give access to IE context and tools (alert in Example 15-11). Calls to such components from Python code are automatically routed through COM back to IE.

15.8.3.2 Python COM clients

With the win32all Python extension package installed, though, we can also write Python programs that serve as registered COM servers and clients, even if they have nothing to do with the Internet at all. For example, the Python program in Example 15-13 acts as a client to the Microsoft Word COM object.

Example 15-13. PP2E\Internet\Other\Com\comclient.py
####################################################################
# a COM client coded in Python: talk to MS-Word via its COM object
# model; uses either dynamic dispatch (run-time lookup/binding), 
# or the static and faster type-library dispatch if makepy.py has 
# been run; install the windows win32all extensions package to use 
# this interface; Word runs hidden unless Visible is set to 1 (and
# Visible lets you watch, but impacts interactive Word sessions);
####################################################################
 
from sys import argv
docdir = 'C:\\temp\\'
if len(argv) == 2: docdir = argv[1]              # ex: comclient.py a:\
 
from win32com.client import Dispatch             # early or late binding
word  = Dispatch('Word.Application')             # connect/start word
word.Visible = 1                                 # else word runs hidden
 
# create and save new doc file
newdoc = word.Documents.Add(  )                    # call word methods
spot   = newdoc.Range(0,0)
spot.InsertBefore('Hello COM client world!')     # insert some text
newdoc.SaveAs(docdir + 'pycom.doc')              # save in doc file
newdoc.SaveAs(docdir + 'copy.doc')   
newdoc.Close(  )
 
# open and change a doc file
olddoc = word.Documents.Open(docdir + 'copy.doc')
finder = word.Selection.Find
finder.text = 'COM'
finder.Execute(  )
word.Selection.TypeText('Automation')
olddoc.Close(  )
 
# and so on: see Word's COM interface specs

This particular script starts Microsoft Word -- known as Word.Application to scripting clients -- if needed, and converses with it through COM. That is, calls in this script are automatically routed from Python to Microsoft Word and back. This code relies heavily on calls exported by Word, which are not described in this book. Armed with documentation for Word's object API, though, we could use such calls to write Python scripts that automate document updates, insert and replace text, create and print documents, and so on.

For instance, Figure 15-7 shows the two Word .doc files generated when the previous script is run on Windows: both are new files, and one is a copy of the other with a text replacement applied. The interaction that occurs while the script runs is more interesting: because Word's Visible attribute is set to 1, you can actually watch Word inserting and replacing text, saving files, etc., in response to calls in the script. (Alas, I couldn't quite figure out how to paste a movie clip in this book.)

Figure 15-7. Word files generated by Python COM client

figs/ppy2_1507.gif

In general, Python COM client calls may be dispatched either dynamically by run- time look-ups in the Windows registry, or statically using type libraries created by running a Python utility script at development time (makepy.py). These dispatch modes are sometimes called late and early dispatch binding, respectively. Dynamic (late) dispatch skips a development step but is slower when clients are running, due to all the required look-ups.[10]

Luckily, we don't need to know which scheme will be used when we write client scripts. The Dispatch call used in Example 15-13 to connect to Word is smart enough to use static binding if server type libraries exist, or dynamic binding if they do not. To force dynamic binding and ignore any generated type libraries, replace the first line with this:

from win32com.client.dynamic import Dispatch     # always late binding

However calls are dispatched, the Python COM interface performs all the work of locating the named server, looking up and calling the desired methods or attributes, and converting Python datatypes according to a standard type map as needed. In the context of Active Scripting, the underlying COM model works the same way, but the server is something like IE or IIS (not Word), the set of available calls differs, and some Python variables are preassigned to COM server objects. The notions of "client" and "server" can become somewhat blurred in these scenarios, but the net result is similar.

15.8.3.3 Python COM servers

Python scripts can also be deployed as COM servers, and provide methods and attributes that are accessible to any COM-aware programming language or system. This topic is too complex to cover well here, but exporting a Python object to COM is mostly just a matter of providing a set of class attributes to identify the server and utilizing the proper win32com registration utility calls. Example 15-14 is a simple COM server coded in Python as a class.

Example 15-14. PP2E\Internet\Other\Com\comserver.py
################################################################
# a COM server coded in Python; the _reg_ class attributes 
# give registry parameters, and others list methods and attrs;
# for this to work, you must install Python and the win32all 
# package, this module file must live on your Python path,
# and the server must be registered to COM (see code at end);
# run pythoncom.CreateGuid(  ) to make your own _reg_clsid_ key;
################################################################
 
import sys
from   win32com.server.exception import COMException         # what to raise 
import win32com.server.util                                  # server tools
globhellos = 0
 
class MyServer:
 
    # com info settings
    _reg_clsid_      = '{1BA63CC0-7CF8-11D4-98D8-BB74DD3DDE3C}'
    _reg_desc_       = 'Example Python Server'
    _reg_progid_     = 'PythonServers.MyServer'              # external name
    _reg_class_spec_ = 'comserver.MyServer'                  # internal name
    _public_methods_ = ['Hello', 'Square']
    _public_attrs_   = ['version']
 
    # python methods
    def __init__(self):  
        self.version = 1.0
        self.hellos  = 0
    def Square(self, arg):                                   # exported methods
        return arg ** 2
    def Hello(self):                                         # global variables
        global globhellos                                    # retain state, but
        globhellos  = globhellos  + 1                        # self vars don't
        self.hellos = self.hellos + 1
        return 'Hello COM server world [%d, %d]' % (globhellos, self.hellos)
 
# registration functions
def Register(pyclass=MyServer):
    from win32com.server.register import UseCommandLine
    UseCommandLine(pyclass)
def Unregister(classid=MyServer._reg_clsid_):
    from win32com.server.register import UnregisterServer
    UnregisterServer(classid)
 
if __name__ == '__main__':       # register server if file run or clicked
    Register(  )                   # unregisters if --unregister cmd-line arg

As usual, this Python file must be placed in a directory in Python's module search path. Besides the server class itself, the file includes code at the bottom to automatically register and unregister the server to COM when the file is run:

·         To register a server, simply call the UseCommandLine function in the win32com.server.register package and pass in the Python server class. This function uses all the special class attribute settings to make the server known to COM. The file is set to automatically call the registration tools if it is run by itself (e.g., when clicked in a file explorer).

·         To unregister a server, simply pass an --unregister argument on the command line when running this file. When run this way, the script automatically calls UseCommandLine again to unregister the server; as its name implies, this function inspects command-line arguments and knows to do the right thing when --unregister is passed. You can also unregister servers explicitly with the UnregisterServer call demonstrated near the end of this script, though this is less commonly used.

Perhaps the more interesting part of this code, though, is the special class attribute assignments at the start of the Python class. These class annotations can provide server registry settings (the _reg_ attributes), accessibility constraints (the _public_ names), and more. Such attributes are specific to the Python COM framework, and their purpose is to configure the server.

For example, the _reg_class_spec_ is simply the Python module and class names separated by a period. If set, the resident Python interpreter uses this attribute to import the module and create an instance of the Python class it defines, when accessed by a client.[11]

Other attributes may be used to identify the server in the Windows registry. The _reg_clsid_ attribute, for instance, gives a globally unique identifier (GUID) for the server and should vary in every COM server you write. In other words, don't use the value in this script. Instead, do what I did to make this ID, and paste the result returned on your machine into your script:[12]

A:\>python
>>> import pythoncom
>>> pythoncom.CreateGuid(  )
<iid:{1BA63CC0-7CF8-11D4-98D8-BB74DD3DDE3C}>

GUIDs are generated by running a tool shipped with the Windows extensions package; simply import and call the pythoncom.CreateGuid( ) function and insert the returned text in the script. Windows uses the ID stamped into your network card to come up with a complex ID that is likely to be unique across servers and machines. The more symbolic Program ID string, _reg_progid_, can be used by clients to name servers too, but is not as likely to be unique.

The rest of the server class is simply pure-Python methods, which implement the exported behavior of the server; that is, things to be called from clients. Once this Python server is annotated, coded, and registered, it can be used in any COM-aware language. For instance, programs written in Visual Basic, C++, Delphi, and Python may access its public methods and attributes through COM; of course, other Python programs can also simply import this module, but the point of COM is to open up components for even wider reuse.[13]

15.8.3.3.1 Using the Python server from a Python client

Let's put this Python COM server to work. The Python script in Example 15-15 tests the server two ways: first by simply importing and calling it directly, and then by employing Python's client-side COM interfaces shown earlier to invoke it less directly. When going through COM, the PythonServers.MyServer symbolic program ID we gave the server (by setting class attribute _reg_progid_ ) can be used to connect to this server from any language (including Python).

Example 15-15. PP2E\Internet\Other\Com\comserver-test.py
################################################################
# test the Python-coded COM server from Python two ways
################################################################
 
def testViaPython(  ):                                 # test without com
    from comserver import MyServer                   # use Python class name
    object = MyServer(  )                              # works as for any class
    print object.Hello(  )
    print object.Square(8)
    print object.version
 
def testViaCom(  ):
    from win32com.client import Dispatch             # test via client-side com
    server = Dispatch('PythonServers.MyServer')      # use Windows registry name
    print server.Hello(  )                             # call public methods
    print server.Square(12)
    print server.version                             # access attributes
 
if __name__ == '__main__':
    testViaPython(  )                                  # test module, server
    testViaCom(  )                                     # com object retains state
    testViaCom(  )

If we've properly configured and registered the Python COM server, we can talk to it by running this Python test script. In the following, we run the server and client files from an MS-DOS console box (though they can usually be run by mouse clicks as well). The first command runs the server file by itself to register the server to COM; the second executes the test script to exercise the server both as an imported module (testViaPython) and as a server accessed through COM (testViaCom):

A:\>python comserver.py
Registered: PythonServers.MyServer
 
A:\>python comserver-test.py
Hello COM server world [1, 1]
64
1.0
Hello COM server world [2, 1]
144
1.0
Hello COM server world [3, 1]
144
1.0
 
A:\>python comserver.py --unregister
Unregistered: PythonServers.MyServer

Notice the two numbers at the end of the Hello output lines: they reflect current values of a global variable and a server instance attribute. Global variables in the server's module retain state as long as the server module is loaded; by contrast, each COM Dispatch (and Python class) call makes a new instance of the server class, and hence new instance attributes. The third command unregisters the server in COM, as a cleanup step. Interestingly, once the server has been unregistered, it's no longer usable, at least not through COM:

A:\>python comserver-test.py
Hello COM server world [1, 1]
64
1.0
Traceback (innermost last):
  File "comserver-test.py", line 21, in ?
    testViaCom(  )                                     # com object retains
  File "comserver-test.py", line 14, in testViaCom
    server = Dispatch('PythonServers.MyServer')      # use Windows register
  ...more deleted...
pywintypes.com_error: (-2147221005, 'Invalid class string', None, None)
15.8.3.3.2 Using the Python server from a VB client

The comserver-test.py script just listed demonstrates how to use a Python COM server from a Python COM client. Once we've created and registered a Python COM server, though, it's available to any language that sports a COM interface. For instance, Example 15-16 shows the sort of code we write to access the Python server from Visual Basic. Clients coded in other languages (e.g., Delphi or Visual C++) are analogous, but syntax and instantiation calls may vary.

Example 15-16. PP2E\Internet\Other\Com\comserver-test.bas
Sub runpyserver(  )
    ' use python server from vb client
    ' alt-f8 in word to start macro editor
    Set server = CreateObject("PythonServers.MyServer")
    hello1 = server.hello(  )
    square = server.square(32)
    pyattr = server.Version
    hello2 = server.hello(  )
    sep = Chr(10)
    Result = hello1 & sep & square & sep & pyattr & sep & hello2
    MsgBox Result
End Sub

The real trick (at least for someone as naive about VB as this author) is how to make this code go. Because VB is embedded in Microsoft Office products such as Word, one approach is to test this code in the context of those systems. Try this: start Word, press Alt and F8 together, and you'll wind up in the Word macro dialog. There, enter a new macro name, press Create, and you'll find yourself in a development interface where you can paste and run the VB code just shown.

I don't teach VB tools in this book, so you'll need to consult other documents if this fails on your end. But it's fairly simple once you get the knack -- running the VB code in this context produces the Word pop-up box in Figure 15-8, showing the results of VB calls to our Python COM server. Global variable and instance attribute values at the end of both Hello reply messages are the same this time, because we make only one instance of the Python server class: in VB, by calling CreateObject, with the program ID of the desired server.

Figure 15-8. VB client running Python COM server

figs/ppy2_1508.gif

But because we've now learned how to embed VBScript in HTML pages, another way to kick off the VB client code is to put it in a web page and rely on IE to launch it for us. The bulk of the HTML file in Example 15-17 is the same as the Basic file shown previously, but tags have been added around the code to make it a bona fide web page.

Example 15-17. PP2E\Internet\Other\Com\comserver-test-vbs.html
<HTML><BODY>
<P>Run Python COM server from VBScript embedded in HTML via IE</P>
<SCRIPT Language=VBScript>
 
Sub runpyserver(  )
    ' use python server from vb client
    ' alt-f8 in word to start macro editor
    Set server = CreateObject("PythonServers.MyServer")
    hello1 = server.hello(  )
    square = server.square(9)
    pyattr = server.Version
    hello2 = server.hello(  )
    sep = Chr(10)
    Result = hello1 & sep & square & sep & pyattr & sep & hello2
    MsgBox Result
End Sub
 
runpyserver(  )
 
</SCRIPT>
</BODY></HTML>

There is an incredible amount of routing going on here, but the net result is similar to running the VB code by itself. Clicking on this file starts Internet Explorer (assuming it is registered to handle HTML files), which strips out and runs the embedded VBScript code, which in turn calls out to the Python COM server. That is, IE runs VBScript code that runs Python code -- a control flow spanning three systems, an HTML file, a Python file, and the IE implementation. With COM, it just works. Figure 15-9 shows IE in action running the HTML file above; the pop-up box is generated by the embedded VB code as before.

Figure 15-9. IE running a VBScript client running a Python COM server

figs/ppy2_1509.gif

If your client code runs but generates a COM error, make sure that the win32all package has been installed, that the server module file is in a directory on Python's path, and that the server file has been run by itself to register the server with COM. If none of that helps, you're probably already beyond the scope of this text. (Please see additional Windows programming resources for more details.)

15.8.3.4 The bigger COM picture

So what does writing Python COM servers have to do with the Internet motif of this chapter? After all, Python code embedded in HTML simply plays the role of COM client to IE or IIS systems that usually run locally. Besides showing how such systems work their magic, I've presented this topic here because COM, at least in its grander world view, is also about communicating over networks.

Although we can't get into details in this text, COM's distributed extensions make it possible to implement Python-coded COM servers to run on machines that are arbitrarily remote from clients. Although largely transparent to clients, COM object calls like those in the preceding client scripts may imply network transfers of arguments and results. In such a configuration, COM may be used as a general client/server implementation model and a replacement for technologies such as RPC (Remote Procedure Calls).

For some applications, this distributed object approach may even be a viable alternative to Python's other client and server-side scripting tools we've studied in this part of the book. Moreover, even when not distributed, COM is an alternative to the lower-level Python/C integration techniques we'll meet later in this book.

Once its learning curve is scaled, COM is a straightforward way to integrate arbitrary components and provides a standardized way to script and reuse systems. However, COM also implies a level of dispatch indirection overhead and is a Windows-only solution at this writing. Because of that, it is generally not as fast or portable as some of the other client/server and C integration schemes discussed in this book. The relevance of such trade-offs varies per application.

As you can probably surmise, there is much more to the Windows scripting story than we cover here. If you are interested in more details, O'Reilly's Python Programming on Win32 provides an excellent presentation of these and other Windows development topics. Much of the effort that goes into writing scripts embedded in HTML involves using the exposed object model APIs, which are deliberately skipped in this book; see Windows documentation sources for more details.

The New C# Python Compiler

Late-breaking news: a company called ActiveState (http://www.activestate.com) announced a new compiler for Python after this chapter was completed. This system (tentatively titled Python.NET) is a new, independent Python language implementation like the JPython system described earlier in this chapter, but compiles Python scripts for use in the Microsoft C# language environment and .NET framework (a software component system based on XML that fosters cross-language interoperability). As such, it opens the door to other Python web scripting roles and modes in the Windows world.

If successful, this new compiler system promises to be the third Python implementation (with JPython and the standard C implementation) and an exciting development for Python in general. Among other things, the C#-based port allows Python scripts to be compiled to binary .exe files and developed within the Visual Studio IDE. As in the JPython Java-based implementation, scripts are coded using the standard Python core language presented in this text, and translated to be executed by the underlying C# system. Moreover, .NET interfaces are automatically integrated for use in Python scripts: Python classes may subclass, act as, and use .NET components.

Also like JPython, this new alternative implementation of Python has a specific target audience and will likely prove to be of most interest to developers concerned with C# and .NET framework integration. ActiveState also plans to roll out a whole suite of Python development products besides this new compiler; be sure to watch the Python and ActiveState web sites for more details.

15.9 Python Server Pages

Though still somewhat new at this writing, Python Server Pages (PSP) is a server-side technology that embeds JPython code inside HTML. PSP is a Python-based answer to other server-side embedded scripting approaches.

The PSP scripting engine works much like Microsoft's Active Server Pages (ASP, described earlier) and Sun's Java Server Pages ( JSP) specification. At the risk of pushing the acronym tolerance envelope, PSP has also been compared to PHP, a server-side scripting language embedded in HTML. All of these systems, including PSP, embed scripts in HTML and run them on the server to generate the response stream sent back to the browser on the client; scripts interact with an exposed object model API to get their work done. PSP is written in pure Java, however, and so is portable to a wide variety of platforms (ASP applications can be run only on Microsoft platforms).

PSP uses JPython as its scripting language, reportedly a vastly more appropriate choice for scripting web sites than the Java language used in Java Server Pages. Since JPython code is embedded under PSP, scripts have access to the large number of Python and JPython tools and add-ons from within PSPs. In addition, scripts may access all Java libraries, thanks to JPython's Java integration support.

We can't cover PSP in detail here; but for a quick look, Example 15-18, adapted from an example in the PSP documentation, illustrates the structure of PSPs.

Example 15-18. PP2E\Internet\Other\hello.psp
$[
# Generate a simple message page with the client's IP address
]$
<HTML><HEAD>
<TITLE>Hello PSP World</TITLE>
</HEAD>
<BODY>
$[include banner.psp]$
<H1>Hello PSP World</H1>
<BR>
$[
Response.write("Hello from PSP, %s." % (Request.server["REMOTE_ADDR"]) )
]$
<BR>
</BODY></HTML>

A page like this would be installed on a PSP-aware server machine and referenced by URL from a browser. PSP uses $[ and ]$ delimiters to enclose JPython code embedded in HTML; anything outside these pairs is simply sent to the client browser, while code within these markers is executed. The first code block here is a JPython comment (note the # character); the second is an include statement that simply inserts another PSP file's contents.

The third piece of embedded code is more useful. As in Active Scripting technologies, Python code embedded in HTML uses an exposed object API to interact with the execution context -- in this case, the Response object is used to write output to the client's browser (much like a print in a CGI script), and Request is used to access HTTP headers for the request. The Request object also has a params dictionary containing GET and POST input parameters, as well as a cookies dictionary holding cookie information stored on the client by a PSP application.

Notice that the previous example could have just as easily been implemented with a Python CGI script using a Python print statement, but PSP's full benefit becomes clearer in large pages that embed and execute much more complex JPython code to produce a response.

PSP runs as a Java servlet and requires the hosting web site to support the Java Servlet API, all of which is beyond the scope of this text. For more details about PSP, visit its web site, currently located at http://www.ciobriefings.com/psp, but search http://www.python.org for other links if this one changes over time.

15.10 Rolling Your Own Servers in Python

Most of the Internet modules we looked at in the last few chapters deal with client-side interfaces such as FTP and POP, or special server-side protocols such as CGI that hide the underlying server itself. If you want to build servers in Python by hand, you can do so either manually or by using higher-level tools.

15.10.1 Coding Solutions

We saw the sort of code needed to build servers manually in Chapter 10. Python programs typically implement servers either by using raw socket calls with threads, forks, or selects to handle clients in parallel, or by using the SocketServer module.

In either case, to serve requests made in terms of higher-level protocols such as FTP, NNTP, and HTTP, you must listen on the protocol's port and add appropriate code to handle the protocol's message conventions. If you go this route, the client-side protocol modules in Python's standard library can help you understand the message conventions used. You may also be able to uncover protocol server examples in the Demos and Tools directories of the Python source distribution and on the Net at large (search http://www.python.org). See prior chapters for more details on writing socket-based servers.

As a higher-level interface, Python also comes with precoded HTTP web protocol server implementations, in the form of three standard modules. BaseHTTPServer implements the server itself; this class is derived from the standard SocketServer.TCPServer class. SimpleHTTPServer and CGIHTTPServer implement standard handlers for incoming HTTP requests; the former handles simple web page file requests, while the latter also runs referenced CGI scripts on the server machine by forking processes.

For example, to start a CGI-capable HTTP server, simply run Python code like that shown in Example 15-19 on the server machine.

Example 15-19. PP2E\Internet\Other\webserver.py
#!/usr/bin/python
############################################
# implement a HTTP server in Python which 
# knows how to run server-side CGI scripts;
# change root dir for your server machine
############################################
 
import os
from BaseHTTPServer import HTTPServer
from CGIHTTPServer  import CGIHTTPRequestHandler
os.chdir("/home/httpd/html")                           # run in html root dir
srvraddr = ("", 80)                                    # my hostname, portnumber
srvrobj  = HTTPServer(srvraddr, CGIHTTPRequestHandler)
srvrobj.serve_forever(  )                                # run as perpetual demon

This assumes that you have appropriate permissions to run such a script, of course; see the Python library manual for more details on precoded HTTP server and request handler modules. Once you have your server running, you can access it in any web browser or by using either the Python httplib module, which implements the client side of the HTTP protocol, or the Python urllib module, which provides a file-like interface to data fetched from a named URL address (see the urllib examples in Chapter 11Chapter 11, and Chapter 13, and use a URL of the form "http://..." to access HTTP documents).

15.10.2 Packaged Solutions

Finally, you can deploy full-blown, open source, and Python-friendly web servers and tools that are freely available on the Net. These may change over time too, but here are a few current options:

Medusa, asyncore

The Medusa system (http://www.nightmare.com/medusa) is an architecture for building long-running, high-performance network servers in Python, and is used in several mission-critical systems. Beginning in Python 1.5.2, the core of Medusa is now standard in Python, in the form of the asyncore and asynchat library modules. These standard modules may be used by themselves to build high-performance network servers, based on an asynchronous, multiplexing, single-process model. They use an event loop built using the select system call presented in Chapter 10 of this book to provide concurrency without spawning threads or processes, and are well-suited to handling short-lived transactions. See the Python library for details. The complete Medusa system (not shipped with Python) also provides precoded HTTP and FTP servers; it is free for noncommercial use, but requires a license otherwise.

Zope

If you are doing any server-side work at all, be sure to consider the Zope open source web application server, described earlier in this chapter and at http://www.zope.org. Zope provides a full-featured web framework that implements an object model that is well beyond standard server-side CGI scripting. The Zope world has also developed full-blown servers (e.g., Zserver).

Mailman

If you are looking for email list support, be sure to explore the GNU mailing list manager, otherwise known as Mailman. Written in Python, Mailman provides a robust, quick, and feature-rich email discussion list tool. Mailman allows users to subscribe over the Web, supports web-based administration, and provides mail-to-news gateways and integrated spam prevention (spam of the junk mail variety, that is). At this time, http://www.list.org is the place to find more Mailman details.

Apache

If you are adventurous, you may be interested in the highly configurable Apache open source web server. Apache is one of the dominant servers used on the Web today, despite its free nature. Among many other things, it supports running Python server-side scripts in a variety of modes; see the site http://www.apache.org for details on Apache itself.

PyApache

If you use Apache, also search the Python web site for information on the PyApache Apache server module (sometimes called mod_pyapache), which embeds a Python interpreter inside Apache to speed up the process of launching Python server-side scripts. CGI scripts are passed to the embedded interpreter directly, avoiding interpreter startup costs. PyApache also opens up the possibility of scripting Apache's internal components.

mod_python

As I wrote this chapter, another package for embedding Python within the Apache web server appeared on the open source landscape: mod_python, available at http://www.modpython.org. According to its release notes, mod_python also allows Python to be embedded in Apache, with a substantial boost in performance and added flexibility. The beta release announcement for this system appeared on comp.lang.python the very week that this section was written, so check the Web for its current status.

Be sure to watch http://www.python.org for new developments on the server front, as well as late-breaking advances in Python web scripting techniques in general.

[1] Over the years, observers have also pointed to other systems as possible Python "killer applications," including Grail, Python's COM support on Windows, and JPython. I hope they're all right, and fully expect new killers to arise after this edition is published. But at the time that I write this, Zope is attracting an astonishing level of interest among both developers and investors. [back]

[2] At this writing, JPython is the second implementation of the Python language. By contrast, the standard, original implementation of Python is sometimes now referred to as "CPython," because it is implemented in ANSI C. Among other things, the JPython implementation is driving a clearer definition of the Python language itself, independent of a particular implementation's effects. A new Python implementation for Microsoft's C#/.NET environment is also on the way (see later in this chapter) and may further drive a definition of what it means to be Python. [back]

[3] But see the note at the end of the later section on Grail; an early port of Tkinter for JPython is already available on the Net. [back]

[4] Be sure you can get a JVM to develop those products too! Installing JPython on Windows 98 while writing this book proved painful, not because of JPython, but because I also had to come to grips with Java commands to run during installation, and track down and install a JVM other than the one provided by Microsoft. Depending on your platform, you may be faced with JPython's Java-dependence even before you type your first line of code. [back]

[5] But Python 2.0's garbage collector can now collect cyclic objects too. See the 2.0 release notes and Appendix A. [back]

[6] Other scripting languages have addressed Java integration by reimplementing a Java virtual machine in the underlying scripting language or by integrating their original C implementations with Java using the Java native call interface. Neither approach is anywhere near as seamless and powerful as generating real Java bytecode. [back]

[7] Again, feel free to ignore most of this example's syntax. I'm not going to teach either JavaScript or VBScript syntax in this book, nor will I tell you which of the three versions of this example is clearer (though you can probably guess my preference). The first two versions are included partly for comparison by readers with a web development background. [back]

[8] However, you may not need to download the win32all package. The ActivePython Python distribution available from ActiveState (http://www.activestate.com), for example, comes with the Windows extensions package. [back]

[9] Roughly, OLE (Object Linking and Embedding) was a precursor to COM, and Active Scripting is just a technology that defines COM interfaces for activities such as passing objects to arbitrary programming language interpreters by name. Active Scripting is not much more than COM itself with a few extensions, but acronym- and buzzword-overload seem to run rampant in the Windows development world. [back]

[10] Actually, makepy can also be executed at runtime now, so you may no longer need to manually run it during development. See the makepy documentation available in the latest Windows extensions package for breaking details. [back]

 

[11] But note that the _reg_class_spec_ attribute is no longer strictly needed, and not specifying it avoids a number of PYTHONPATH issues. Because such settings are prone to change, you should always consult the latest Windows extensions package reference manuals for details on this and other class annotation attributes. [back]

[13] But you should be aware of a few type rules. In Python 1.5.2, Python-coded COM servers must be careful to use a fixed number of function arguments, and convert passed-in strings with the str built-in function. The latter of these constraints arises because COM passes strings as Unicode strings. Because Python 1.6 and 2.0 now support both Unicode and normal strings, though, this constraint should disappear soon. When using COM as a client (i.e., code that calls COM), you may pass a string or Unicode object, and the conversion is done automatically; when coding a COM server (i.e., code called by COM), strings are always passed in as Unicode objects. [back]

[12] The A:/> prompt shows up here only because I copied the COM scripts to a floppy so I could run them on a machine with the win32all extension installed. You should be able to run from the directory where these scripts live in the examples tree. [back]

Chapter 14  TOC  Chapter 16