17.1 Testing

In this chapter, I distinguish between two rather different kinds of testing: unit testing and system testing. Testing is a rich and important field, and even more distinctions could be drawn, but my goal is to focus on the issues of most immediate importance to software developers.

17.1.1 Unit Testing and System Testing

Unit testing means writing and running tests to exercise a single module or an even smaller unit, such as a class or function. System testing (also known as functional testing) involves running an entire program with known inputs. Some classic books on testing draw the distinction between white-box testing, done with knowledge of a program's internals, and black-box testing, done from the outside. This classic viewpoint parallels the modern one of unit versus system testing.

Unit and system testing serve different goals. Unit testing proceeds apace with development; you can and should test each unit as you're developing it. Indeed, one modern approach is known as test-first coding: for each feature that your program must have, you first write unit tests, and only then do you proceed to write code that implements the feature. Test-first coding seems a strange approach, but it has several advantages. For example, it ensures that you won't omit unit tests for some feature. Further, test-first coding is helpful because it urges you to focus first on what tasks a certain function, class, or method should accomplish, and to deal only afterwards with implementing that function, class, or method. In order to test a unit, which may depend on other units not yet fully developed, you often have to write stubs, which are fake implementations of various units' interfaces that give known and correct responses in cases needed to test other units.

System testing comes afterwards, since it requires the system to exist with some subset of system functionality believed to be in working condition. System testing provides a sanity check: given that each module in the program works properly (passes unit tests), does the whole program work? If each unit is okay but the system as a whole is not, there is a problem with integration between units. For this reason, system testing is also known as integration testing.

System testing is similar to running the system in production use except that you fix the inputs in advance, so any problems you find are easy to reproduce. The cost of failure in system testing is lower than in production use, since outputs from system testing are not used to make decisions, control external systems, and so on. Rather, outputs from system testing are systematically compared with the outputs that the system should produce given the known inputs. The purpose of the whole procedure is to find discrepancies between what the program should do and what the program actually does in a cheap and reproducible way.

Failures discovered by system testing, just like system failures in production use, reveal defects in unit tests as well as defects in the code. Unit testing may have been insufficient; a module's unit tests may have failed to exercise all needed functionality of that module. In this case, the unit tests clearly need to be beefed up.

More often, failures in system testing reveal communication problems within the development team: a module may correctly implement a certain interface functionality, but another module expects different functionality. This kind of problem (an integration problem in the strict sense) is harder to pinpoint in unit testing. In good development practice, unit tests must run often, so it is crucial that they run fast. It's therefore essential that each unit can assume other units are working correctly and as expected.

Unit tests that are run in reasonably late stages of development can reveal integration problems if the system architecture is hierarchical, a common and reasonable organization. In such an architecture, lower-level modules depend on no others (except perhaps library modules, which you can assume to be correct), and thus their unit tests, if complete, suffice to assure correctness. Higher-level modules depend on lower-level ones, and therefore also depend on correct team communication about what interfaces each module expects and supplies. Running complete unit tests on higher-level modules, using the true lower-level modules rather than stubs, automatically exercises the interface between modules, as well as the higher-level modules' own code.

Unit tests for higher-level modules are thus run in two ways. You run the tests with stubs for the lower levels during the early stages of development when the lower-level modules are not yet ready, or, later, when you need to check correctness of the higher levels only. During later stages of development, you also regularly run the higher-level modules' unit tests using the true lower-level modules. In this way, you check the correctness of the whole subsystem, from the higher levels downwards.

System testing is similar to running the program in normal ways. You need special support only to ensure that known inputs are supplied and that outputs are captured for comparison with expected outputs. This is easy for programs whose I/O uses files, but terribly hard for programs whose I/O relies on a GUI, network, or other communication with independent external entities. To simulate such external entities and make them predictable and entirely observable, platform-dependent infrastructure is generally necessary.

Another useful piece of supporting infrastructure for system testing is a testing framework that automates the running of system tests, including logging of successes and failures. Such a framework can also help testers prepare sets of known inputs and corresponding expected outputs.

Both free and commercial programs for these purposes exist, but they are not dependent on what programming languages are used in the system under test. As mentioned, system testing is akin to what was classically known as black-box testing—testing independent of the implementation of the system under test, and therefore, in particular, of the programming languages used for implementation. Instead, testing frameworks usually depend on the operating system platform on which they run, since the tasks they perform are platform-dependent: running programs with given inputs, capturing their outputs, and particularly simulating and capturing GUI, network, and other interprocess communication I/O. Since frameworks for system testing depend on the platform and not on programming languages, I do not cover them further in this book.

17.1.2 The doctest Module

The doctest module has the primary purpose of letting you create good usage examples in your code's docstrings, by checking that the examples do in fact produce the results that your docstrings show for them.

As you're developing a module, keep the docstrings up to date, and gradually enrich them with examples. Each time part of the module (e.g., a function) is ready, or even partially ready, make it a habit to add examples to the docstrings. Import the module into an interactive session, and interactively use the parts you just developed in order to provide examples with a mix of typical cases, limit cases, and failing cases. For this specific purpose only, use from module import * so that your examples don't prefix module. to each name the module supplies. Copy and paste the text of the interactive session into the docstring in your favorite editor, adjust any mistakes, and you're almost done.

Your documentation is now enriched with examples, and readers will have an easier time following it, assuming you chose a good mix of examples and seasoned it wisely with non-example text. Make sure you have docstrings, with examples, for your module as a whole, and for each function, class, and method that the module exports. You may skip functions, classes, and methods whose names start with _, since, as their names indicate, they're meant to be private implementation details; doctest by default ignores them, and so should most readers of your module's sources.

Examples that don't match the way your code works are worse than useless. Documentation and comments are useful only if they match reality. Docstrings and comments often get out of date as code changes, and then they become misinformation, hampering rather than helping any reader of the source. Better to have no comments and docstrings at all than to have ones that lie. doctest can help, at least, with the examples in your docstrings. A failing doctest run will often prompt you to review the whole docstring that contains the failing examples, thus reminding you to keep the docstring's text updated, too.

At the end of your module's source, insert the following small snippet:

if _ _name_ _ =  = '_ _main_ _':
    import doctest, sys
    doctest.testmod(sys.modules[_ _name_ _])

This code calls function testmod of module doctest on your module when you run your module as the main program. testmod examines all relevant docstrings (the module docstring, and docstrings of all public functions, public classes, and public methods of public classes). In each docstring, testmod finds all examples (by looking for occurrences of the interpreter prompt '>>> ', possibly preceded by whitespace) and runs each example. testmod checks that each example's results are equal to the output given in the docstring right after the example. In the case of exceptions, testmod ignores the traceback, but checks that the expected and observed error messages are equal.

When everything goes right, testmod terminates silently. Otherwise, it outputs detailed messages about examples that failed, showing expected and actual output. Example 17-1 shows a typical example of doctest at work on a module mod.py.

Example 17-1. Using doctest

"""
This module supplies a single function reverseWords that reverses
a string by words.

>>> reverseWords('four score and seven years')
'years seven and score four'
>>> reverseWords('justoneword')
'justoneword'
>>> reverseWords('')
''

You must call reverseWords with one argument, and it must be a string:

>>> reverseWords(  )
Traceback (most recent call last):
    ...
TypeError: reverseWords(  ) takes exactly 1 argument (0 given)
>>> reverseWords('one', 'another')
Traceback (most recent call last):
    ...
TypeError: reverseWords(  ) takes exactly 1 argument (2 given)
>>> reverseWords(1)
Traceback (most recent call last):
    ...
AttributeError: 'int' object has no attribute 'split'
>>> reverseWords(u'however, unicode is all right too')
u'too right all is unicode however,'

As a side effect, reverseWords eliminates any redundant spacing:

>>> reverseWords('with   redundant   spacing')
'spacing redundant with'

"""
def reverseWords(astring):
    words = astring.split(  )
    words.reverse(  )
    return ' '.join(words)
if _ _name_ _=  ='_ _main_ _':
    import doctest, sys
    doctest.testmod(sys.modules[_ _name_ _])

I have snipped the tracebacks from the docstring, as is commonly done, since doctest ignores them and they add nothing to the explanatory value of each failing case. Apart from this, the docstring is the copy and paste of an interactive session, with the addition of some explanatory text and empty lines for readability. Save this source as mod.py, and then run it with python mod.py. It produces no output, meaning that all examples work just right. Also try python mod.py -v to get an account of all tests tried and a verbose summary at the end. Finally, try altering the example results in the module docstring, making them incorrect, to see the messages doctest provides for errant examples.

doctest is not meant for general-purpose unit testing, but can nevertheless be a convenient tool for the purpose. The recommended way to do unit testing in Python is with module unittest, covered in the next section. However, unit testing with doctest can be easier and faster to set up, since it requires little more than copy and paste from an interactive session. If you need to maintain a module that lacks unit tests, retrofitting such tests into the module with doctest may be a reasonable compromise. It's certainly better to have doctest-based unit tests than not to have any unit tests at all, as might otherwise happen should you decide that setting up tests properly with unittest would take you too long.

If you do decide to use doctest for unit testing, don't cram extra tests into your module's docstrings. That would damage the docstrings by making them too long and hard to read. Keep in the docstrings the right amount and kind of examples, strictly for explanatory purposes, just as if unit testing was not in the picture. Instead, put the extra tests into a global variable of your module, a dictionary named _ _test_ _. The keys in _ _test_ _ are strings used as arbitrary test names, and the corresponding values are strings that doctest picks up and uses just as it uses docstrings. The values in _ _test_ _ may also be function and class objects, in which case doctest examines their docstrings for tests to run. This is also a convenient way to run doctest on objects with private names, which doctest skips by default.

17.1.3 The unittest Module

The unittest module is the Python version of a unit-testing framework originally developed by Kent Beck for Smalltalk. Similar and equally widespread versions of the same framework also exist for other programming languages (e.g., the JUnit package for Java).

To use unittest, you don't put your testing code in the same source file as the tested module, but instead write a separate test module per module being tested. A popular convention is to name the test module the same as the module being tested, with a prefix such as 'test_', and put it in a subdirectory named test of the directory where you keep your sources. For example, the test module for mod.py can be test/test_mod.py. You need a simple and consistent naming convention to make it easy for you to write and maintain auxiliary scripts that find and run all unit tests for a package.

Separation between a module's source code and its unit-testing code lets you refactor the module more easily, including possibly recoding its functionality in C, without perturbing the unit-testing code. Knowing that test_mod.py stays intact, whatever changes you make to mod.py, enhances your confidence that passing the tests in test_mod.py indicates that mod.py still works correctly after the changes.

A unit-testing module defines one or more subclasses of unittest's TestCase class. Each subclass may define a single test case by overriding method runTest. Better yet, the subclass may define one or more test cases, not by overriding runTest, but rather by defining test-case methods, which are methods that are callable without arguments and whose names start with test. The subclass may also override methods setUp, which the framework calls to prepare a new instance for each test case, and tearDown, which the framework calls to clean things up after each test case. Each test-case method calls methods of class TestCase whose names start with assert, in order to express the conditions that the test must meet. unittest runs the test-case methods within a TestCase subclass in arbitrary order, running setUp just before each test case and tearDown just after each test case.

unittest provides other facilities, such as grouping test cases into test suites, and other more advanced functionality. You do not need such extras unless you're defining a custom unit-testing framework or, at the very least, structuring complicated testing procedures for equally complicated packages. In almost all cases, the concepts and details covered in this section are sufficient to perform effective and systematic unit testing. Example 17-2 shows how to use unittest to provide unit tests for the module mod.py of Example 17-1. For illustration purposes, this example uses unittest to perform exactly the same tests that Example 17-1 encoded as examples in docstrings using doctest.

Example 17-2. Using unittest

""" This module tests function reverseWords provided by module mod.py. """
import unittest
import mod

class ModTest(unittest.TestCase):

    def testNormalCase(self):
        self.assertEqual(mod.reverseWords('four score and seven years'),
            'years seven and score four')

    def testSingleWord(self):
        self.assertEqual(mod.reverseWords('justoneword'), 'justoneword')

    def testEmpty(self):
        self.assertEqual(mod.reverseWords(''), '')

    def testRedundantSpacing(self):
        self.assertEqual(mod.reverseWords('with   redundant   spacing'),
            'spacing redundant with')

    def testUnicode(self):
        self.assertEqual(mod.reverseWords(u'unicode is all right too'),
            u'too right all is unicode')

    def testExactlyOneArgument(self):
        self.assertRaises(TypeError, mod.reverseWords)
        self.assertRaises(TypeError, mod.reverseWords, 'one', 'another')

    def testMustBeString(self):
        self.assertRaises((AttributeError,TypeError), mod.reverseWords, 1)

if _ _name_ _=  ='_ _main_ _':
    unittest.main(  )

Running this module with python test_mod.py is by default a bit more verbose, than using python mod.py to run doctest, as in Example 17-1. test_mod.py outputs a single . for each test-case method it runs, then a separator line of dashes, and finally a summary line, such as "Ran 7 tests in 0.110s", and a final line of "OK" if every test was indeed okay.

Each test-case method makes one or more calls to methods whose names start with assert (or their synonyms whose names start with fail). Here, we have only one test-case method in which we make two such calls, method testExactly1Argument. In more complicated cases, such multiple calls to assert methods from a single test-case method can be quite common.

Even in a case as simple as this, one minor aspect shows that, for unit testing, unittest is more powerful and flexible than doctest. In method testMustBeString, we pass as the first argument to assertRaises a pair of exception classes, meaning we accept either kind of exception. test_mod.py therefore accepts as valid different implementations of mod.py. It accepts the implementation in Example 17-1, which tries calling method split on its argument, and therefore raises AttributeError when called with an argument that is not a string. However, it also accepts a different hypothetical implementation, one that raises TypeError instead when called with an argument of the wrong type. It would be possible to code this testing functionality with doctest, but it would be awkward and non-obvious, while unittest makes it simple and natural.

This kind of flexibility is crucial for real-life unit tests, which essentially act as executable specifications for their modules. You could, pessimistically, view the need for flexibility as indicating that the interface of the code we're testing is not well defined. However, it's best to view the interface as being defined with a useful amount of flexibility for the implementer: under circumstance X (argument of invalid type passed to function reverseWords, in this example), either of two things (raising AttributeError or TypeError) is allowed to happen.

Thus, implementations with either of the different behaviors can be correct, and the implementer can choose between them on the basis of such considerations as performance and clarity. By viewing unit tests as executable specifications for their modules (the modern view, and the basis of test-first coding) rather than as white-box tests strictly constrained to a specific implementation (as in some traditional taxonomies of testing), the tests become a more vital component of the software development process.

17.1.3.1 The TestCase class

With unittest, you write test cases by subclassing class TestCase and adding methods, callable without arguments, whose names start with test. Such test-case methods, in turn, call methods that your subclass inherits from TestCase, whose names start with assert (or their synonyms, whose names start with fail), to indicate conditions that must hold for the test to succeed.

Class TestCase also defines two methods that your subclass can optionally override in order to group actions to perform right before and right after each test-case method runs. This doesn't exhaust TestCase's functionality, but you won't need the rest unless you're developing testing frameworks or performing some similarly advanced task. The frequently called methods in a TestCase instance t are the following.

assert_, failUnless

t.assert_(condition,msg=None)

Fails and outputs msg if condition is false, otherwise does nothing. The underscore in the name is needed because assert is a Python keyword. failUnless is a synonym.

assertEqual, failUnlessEqual

t.assertEqual(first,second,msg=None)

Fails and outputs msg if first!=second, otherwise does nothing. failUnlessEqual is a synonym.

assertNotEqual, failIfEqual

t.assertNotEqual(first,second,msg=None)

Fails and outputs msg if first= =second, otherwise does nothing. failIfEqual is a synonym.

assertRaises, failUnlessRaises

t.assertRaises(exceptionSpec,callable,*args)

Calls callable(*args). Fails if the call doesn't raise any exception. If the call raises an exception not meeting exceptionSpec, assertRaises propagates the exception. If the call raises an exception meeting exceptionSpec, assertRaises does nothing. exceptionSpec can be an exception class or a tuple of classes, just like the first argument to the except clause of a try/except statement. failUnlessRaises is a synonym.

fail

t.fail(msg=None)

Fails unconditionally and outputs msg.

failIf

t.failIf(condition, msg=None)

Fails and outputs msg if condition is true, otherwise does nothing.

setUp

t.setUp(  )

The framework calls t.setUp( ) just before calling a test-case method. The implementation in TestCase does nothing. This method is provided in order to let your subclass override it if it needs to perform some preparation for each test.

tearDown

t.tearDown(  )

The framework calls t.tearDown( ) just after calling a test-case method. The implementation in TestCase does nothing. This method is provided in order to let your subclass override it if it needs to perform some cleanup after each test.

17.1.3.2 Unit tests dealing with large amounts of data

Unit tests must be fast, since they are run frequently during development. Therefore, it's best to unit-test each aspect of your modules' functionality on small amounts of data when possible. This makes each unit test faster, and also lets you conveniently embed all needed data in the test's source code. When you test a function that reads from or writes to a file object, in particular, you normally use an instance of class cStringIO (covered in Chapter 10) to simulate a file object while holding the data in memory.

However, in some rare cases, it may be impossible to fully exercise a module's functionality without supplying and/or comparing data in quantities larger than can be reasonably embedded in a test's source code. In such cases, your unit test will have to rely on auxiliary external data files to hold the data it needs to supply to the module it tests, and/or the data it needs to compare to the tested module's output. Even then, you're generally better off reading the data into instances of cStringIO rather than directing the tested module to perform actual disk I/O. Similarly, I suggest you generally use stubs to test modules meant to interact with other external entities, such as a database, a GUI, or some other program over a network. It's easier for you to control all aspects of the test when using stubs rather than real external entities. Also, to reiterate, the speed at which you can run tests is important, and it's invariably faster to perform simulated operations in stubs, rather than real operations.