17.1 Testing
In this
chapter, I distinguish between two rather different kinds of testing:
unit testing and system testing. Testing is a rich and important
field, and even more distinctions could be drawn, but my goal is to
focus on the issues of most immediate importance to software
developers.
17.1.1 Unit Testing and System Testing
Unit
testing means
writing and running tests to exercise a single module or an even
smaller unit, such as a class or function.
System testing (also known
as functional testing) involves running an entire program with known
inputs. Some classic books on testing draw the distinction between
white-box testing, done with knowledge of a
program's internals, and black-box
testing, done from the outside. This classic viewpoint
parallels the modern one of unit versus system testing.
Unit and
system testing serve different goals. Unit testing proceeds apace
with development; you can and should test each unit as
you're developing it. Indeed, one modern approach is
known as test-first coding:
for each feature that your program must have, you first write unit
tests, and only then do you proceed to write code that implements the
feature. Test-first coding seems a strange approach, but it has
several advantages. For example, it ensures that you
won't omit unit tests for some feature. Further,
test-first coding is helpful because it urges you to focus first on
what tasks a certain function, class, or method should accomplish,
and to deal only afterwards with implementing that function, class,
or method. In order to test a unit, which may depend on other units
not yet fully developed, you often have to write
stubs, which are fake implementations of various
units' interfaces that give known and correct
responses in cases needed to test other units.
System testing comes afterwards, since it requires the system to
exist with some subset of system functionality believed to be in
working condition. System testing provides a sanity check: given that
each module in the program works properly (passes unit tests), does
the whole program work? If each unit is okay but the system as a
whole is not, there is a problem with integration between units. For
this reason, system testing is also known as integration testing.
System testing is similar to running the system in production use
except that you fix the inputs in advance, so any problems you find
are easy to reproduce. The cost of failure in system testing is lower
than in production use, since outputs from system testing are not
used to make decisions, control external systems, and so on. Rather,
outputs from system testing are systematically compared with the
outputs that the system should produce given the known inputs. The
purpose of the whole procedure is to find discrepancies between what
the program should do and what the program actually does in a cheap
and reproducible way.
Failures discovered by system testing, just like system failures in
production use, reveal defects in unit tests as well as defects in
the code. Unit testing may have been insufficient; a
module's unit tests may have failed to exercise all
needed functionality of that module. In this case, the unit tests
clearly need to be beefed up.
More often, failures in system testing reveal communication problems
within the development team: a module may correctly implement a
certain interface functionality, but another module expects different
functionality. This kind of problem (an integration problem in the
strict sense) is harder to pinpoint in unit testing. In good
development practice, unit tests must run often, so it is crucial
that they run fast. It's therefore essential that
each unit can assume other units are working correctly and as
expected.
Unit tests that are run in reasonably late stages of development can
reveal integration problems if the system architecture is
hierarchical, a common and reasonable organization. In such an
architecture, lower-level modules depend on no others (except perhaps
library modules, which you can assume to be correct), and thus their
unit tests, if complete, suffice to assure correctness. Higher-level
modules depend on lower-level ones, and therefore also depend on
correct team communication about what interfaces each module expects
and supplies. Running complete unit tests on higher-level modules,
using the true lower-level modules rather than stubs, automatically
exercises the interface between modules, as well as the higher-level
modules' own code.
Unit tests for higher-level modules are thus run in two ways. You run
the tests with stubs for the lower levels during the early stages of
development when the lower-level modules are not yet ready, or,
later, when you need to check correctness of the higher levels only.
During later stages of development, you also regularly run the
higher-level modules' unit tests using the true
lower-level modules. In this way, you check the correctness of the
whole subsystem, from the higher levels downwards.
System testing is similar to running the program in normal ways. You
need special support only to ensure that known inputs are supplied
and that outputs are captured for comparison with expected outputs.
This is easy for programs whose I/O uses files, but terribly hard for
programs whose I/O relies on a GUI, network, or other communication
with independent external entities. To simulate such external
entities and make them predictable and entirely observable,
platform-dependent infrastructure is generally necessary.
Another useful piece of supporting infrastructure for system testing
is a testing framework that automates the
running of system tests, including logging of successes and failures.
Such a framework can also help testers prepare sets of known inputs
and corresponding expected outputs.
Both free and commercial programs for these purposes exist, but they
are not dependent on what programming languages are used in the
system under test. As mentioned, system testing is akin to what was
classically known as black-box testing—testing independent of
the implementation of the system under test, and therefore, in
particular, of the programming languages used for implementation.
Instead, testing frameworks usually depend on the operating system
platform on which they run, since the tasks they perform are
platform-dependent: running programs with given inputs, capturing
their outputs, and particularly simulating and capturing GUI,
network, and other interprocess communication I/O. Since frameworks
for system testing depend on the platform and not on programming
languages, I do not cover them further in this book.
17.1.2 The doctest Module
The doctest module has
the primary purpose of letting you create good usage examples in your
code's docstrings, by checking that the examples do
in fact produce the results that your docstrings show for them.
As you're developing a module, keep the docstrings
up to date, and gradually enrich them with examples. Each time part
of the module (e.g., a function) is ready, or even partially ready,
make it a habit to add examples to the docstrings. Import the module
into an interactive session, and interactively use the parts you just
developed in order to provide examples with a mix of typical cases,
limit cases, and failing cases. For this specific purpose only, use
from module
import * so that your examples
don't prefix
module. to each name
the module supplies. Copy and paste the text of the interactive
session into the docstring in your favorite editor, adjust any
mistakes, and you're almost done.
Your documentation is now enriched with examples, and readers will
have an easier time following it, assuming you chose a good mix of
examples and seasoned it wisely with non-example text. Make sure you
have docstrings, with examples, for your module as a whole, and for
each function, class, and method that the module exports. You may
skip functions, classes, and methods whose names start with
_, since, as their names indicate,
they're meant to be private implementation details;
doctest by default ignores them, and so should
most readers of your module's sources.
Examples that don't match the way your code works
are worse than useless. Documentation and comments are useful only if
they match reality. Docstrings and comments often get out of date as
code changes, and then they become misinformation, hampering rather
than helping any reader of the source. Better to have no comments and
docstrings at all than to have ones that lie.
doctest can help, at least, with the examples in
your docstrings. A failing doctest run will often
prompt you to review the whole docstring that contains the failing
examples, thus reminding you to keep the docstring's
text updated, too.
At the end of your module's source, insert the
following small snippet:
if _ _name_ _ = = '_ _main_ _':
import doctest, sys
doctest.testmod(sys.modules[_ _name_ _])
This code calls function testmod of module
doctest on your module when you run your module as
the main program. testmod examines all relevant
docstrings (the module docstring, and docstrings of all public
functions, public classes, and public methods of public classes). In
each docstring, testmod finds all examples (by
looking for occurrences of the interpreter prompt
'>>> ', possibly
preceded by whitespace) and runs each example.
testmod checks that each
example's results are equal to the output given in
the docstring right after the example. In the case of exceptions,
testmod ignores the traceback, but checks that the
expected and observed error messages are equal.
When everything goes right, testmod terminates
silently. Otherwise, it outputs detailed messages about examples that
failed, showing expected and actual output. Example 17-1 shows a typical example of
doctest at work on a module
mod.py.
Example 17-1. Using doctest
"""
This module supplies a single function reverseWords that reverses
a string by words.
>>> reverseWords('four score and seven years')
'years seven and score four'
>>> reverseWords('justoneword')
'justoneword'
>>> reverseWords('')
''
You must call reverseWords with one argument, and it must be a string:
>>> reverseWords( )
Traceback (most recent call last):
...
TypeError: reverseWords( ) takes exactly 1 argument (0 given)
>>> reverseWords('one', 'another')
Traceback (most recent call last):
...
TypeError: reverseWords( ) takes exactly 1 argument (2 given)
>>> reverseWords(1)
Traceback (most recent call last):
...
AttributeError: 'int' object has no attribute 'split'
>>> reverseWords(u'however, unicode is all right too')
u'too right all is unicode however,'
As a side effect, reverseWords eliminates any redundant spacing:
>>> reverseWords('with redundant spacing')
'spacing redundant with'
"""
def reverseWords(astring):
words = astring.split( )
words.reverse( )
return ' '.join(words)
if _ _name_ _= ='_ _main_ _':
import doctest, sys
doctest.testmod(sys.modules[_ _name_ _])
I have snipped the tracebacks from the docstring, as is commonly
done, since doctest ignores them and they add
nothing to the explanatory value of each failing case. Apart from
this, the docstring is the copy and paste of an interactive session,
with the addition of some explanatory text and empty lines for
readability. Save this source as mod.py, and
then run it with python mod.py.
It produces no output, meaning that all examples work just right.
Also try python mod.py
-v to get an account of all tests tried and a
verbose summary at the end. Finally, try altering the example results
in the module docstring, making them incorrect, to see the messages
doctest provides for errant examples.
doctest is not meant for general-purpose unit
testing, but can nevertheless be a convenient tool for the purpose.
The recommended way to do unit testing in Python is with module
unittest, covered in the next section. However,
unit testing with doctest can be easier and faster
to set up, since it requires little more than copy and paste from an
interactive session. If you need to maintain a module that lacks unit
tests, retrofitting such tests into the module with
doctest may be a reasonable compromise.
It's certainly better to have
doctest-based unit tests than not to have any unit
tests at all, as might otherwise happen should you decide that
setting up tests properly with unittest would take
you too long.
If you do decide to use doctest for unit testing,
don't cram extra tests into your
module's docstrings. That would damage the
docstrings by making them too long and hard to read. Keep in the
docstrings the right amount and kind of examples, strictly for
explanatory purposes, just as if unit testing was not in the picture.
Instead, put the extra tests into a global variable of your module, a
dictionary named _ _test_ _. The keys in
_ _test_ _ are strings used as arbitrary test
names, and the corresponding values are strings that
doctest picks up and uses just as it uses
docstrings. The values in _ _test_ _ may also be
function and class objects, in which case doctest
examines their docstrings for tests to run. This is also a convenient
way to run doctest on objects with private names,
which doctest skips by default.
17.1.3 The unittest Module
The unittest module is
the Python version of a unit-testing framework originally developed
by Kent Beck for Smalltalk. Similar and equally widespread versions
of the same framework also exist for other programming languages
(e.g., the JUnit package for Java).
To use unittest, you don't put
your testing code in the same source file as the tested module, but
instead write a separate test module per module being tested. A
popular convention is to name the test module the same as the module
being tested, with a prefix such as 'test_', and
put it in a subdirectory named test of the
directory where you keep your sources. For example, the test module
for mod.py can be
test/test_mod.py. You need a simple and
consistent naming convention to make it easy for you to write and
maintain auxiliary scripts that find and run all unit tests for a
package.
Separation between a module's source code and its
unit-testing code lets you refactor the module more easily, including
possibly recoding its functionality in C, without perturbing the
unit-testing code. Knowing that test_mod.py
stays intact, whatever changes you make to
mod.py, enhances your confidence that passing
the tests in test_mod.py indicates that
mod.py still works correctly after the changes.
A unit-testing module defines one or more subclasses of
unittest's
TestCase class. Each subclass may define a single
test case by overriding
method runTest. Better yet, the subclass may
define one or more test cases, not by overriding
runTest, but rather by defining
test-case methods, which
are methods that are callable without arguments and whose names start
with test. The subclass may also override methods
setUp, which the framework calls to prepare a new
instance for each test case, and tearDown, which
the framework calls to clean things up after each test case. Each
test-case method calls methods of class TestCase
whose names start with assert, in order to express
the conditions that the test must meet. unittest
runs the test-case methods within a TestCase
subclass in arbitrary order, running setUp just
before each test case and tearDown just after each
test case.
unittest provides other facilities, such as
grouping test cases into test suites, and other more advanced
functionality. You do not need such extras unless
you're defining a custom unit-testing framework or,
at the very least, structuring complicated testing procedures for
equally complicated packages. In almost all cases, the concepts and
details covered in this section are sufficient to perform effective
and systematic unit testing. Example 17-2 shows how
to use unittest to provide unit tests for the
module mod.py of Example 17-1.
For illustration purposes, this example uses
unittest to perform exactly the same tests that
Example 17-1 encoded as examples in docstrings using
doctest.
Example 17-2. Using unittest
""" This module tests function reverseWords provided by module mod.py. """
import unittest
import mod
class ModTest(unittest.TestCase):
def testNormalCase(self):
self.assertEqual(mod.reverseWords('four score and seven years'),
'years seven and score four')
def testSingleWord(self):
self.assertEqual(mod.reverseWords('justoneword'), 'justoneword')
def testEmpty(self):
self.assertEqual(mod.reverseWords(''), '')
def testRedundantSpacing(self):
self.assertEqual(mod.reverseWords('with redundant spacing'),
'spacing redundant with')
def testUnicode(self):
self.assertEqual(mod.reverseWords(u'unicode is all right too'),
u'too right all is unicode')
def testExactlyOneArgument(self):
self.assertRaises(TypeError, mod.reverseWords)
self.assertRaises(TypeError, mod.reverseWords, 'one', 'another')
def testMustBeString(self):
self.assertRaises((AttributeError,TypeError), mod.reverseWords, 1)
if _ _name_ _= ='_ _main_ _':
unittest.main( )
Running this module with python
test_mod.py is by default a bit more verbose, than
using python mod.py to run
doctest, as in Example 17-1.
test_mod.py outputs a single
. for each test-case method it runs, then a
separator line of dashes, and finally a summary line, such as
"Ran 7 tests in 0.110s", and a
final line of "OK" if every test
was indeed okay.
Each test-case method makes one or more calls to methods whose names
start with assert (or their synonyms whose names
start with fail). Here, we have only one test-case
method in which we make two such calls, method
testExactly1Argument. In more complicated cases,
such multiple calls to assert methods from a single test-case method
can be quite common.
Even in a case as simple as this, one minor aspect shows that, for
unit testing, unittest is more powerful and
flexible than doctest. In method
testMustBeString, we pass as the first argument to
assertRaises a pair of exception classes, meaning
we accept either kind of exception. test_mod.py
therefore accepts as valid different implementations of
mod.py. It accepts the implementation in Example 17-1, which tries calling method
split on its argument, and therefore raises
AttributeError when called with an argument that
is not a string. However, it also accepts a different hypothetical
implementation, one that raises TypeError instead
when called with an argument of the wrong type. It would be possible
to code this testing functionality with doctest,
but it would be awkward and non-obvious, while
unittest makes it simple and natural.
This kind of flexibility is crucial for real-life unit tests, which
essentially act as executable specifications for their modules. You
could, pessimistically, view the need for flexibility as indicating
that the interface of the code we're testing is not
well defined. However, it's best to view the
interface as being defined with a useful amount of flexibility for
the implementer: under circumstance X
(argument of invalid type passed to function
reverseWords, in this example), either of two
things (raising AttributeError or
TypeError) is allowed to happen.
Thus, implementations with either of the different behaviors can be
correct, and the implementer can choose between them on the basis of
such considerations as performance and clarity. By viewing unit tests
as executable specifications for their modules (the modern view, and
the basis of test-first coding) rather than as white-box tests
strictly constrained to a specific implementation (as in some
traditional taxonomies of testing), the tests become a more vital
component of the software development process.
17.1.3.1 The TestCase class
With
unittest, you write test cases by subclassing
class TestCase and adding methods, callable
without arguments, whose names start with test.
Such test-case methods, in turn, call methods that your subclass
inherits from TestCase, whose names start with
assert (or their synonyms, whose names start with
fail), to indicate conditions that must hold for
the test to succeed.
Class TestCase also defines two methods that your
subclass can optionally override in order to group actions to perform
right before and right after each test-case method runs. This
doesn't exhaust
TestCase's functionality, but you
won't need the rest unless you're
developing testing frameworks or performing some similarly advanced
task. The frequently called methods in a TestCase
instance t are the following.
t.assert_(condition,msg=None)
|
|
Fails and outputs msg if
condition is false, otherwise does
nothing. The underscore in the name is needed because
assert is a Python keyword.
failUnless is a synonym.
assertEqual, failUnlessEqual |
|
t.assertEqual(first,second,msg=None)
|
|
Fails and outputs
msg if
first!=second,
otherwise does nothing. failUnlessEqual is a
synonym.
assertNotEqual, failIfEqual |
|
t.assertNotEqual(first,second,msg=None)
|
|
Fails and outputs msg if
first=
=second, otherwise does nothing.
failIfEqual is a synonym.
assertRaises, failUnlessRaises |
|
t.assertRaises(exceptionSpec,callable,*args)
|
|
Calls
callable(*args).
Fails if the call doesn't raise any exception. If
the call raises an exception not meeting
exceptionSpec,
assertRaises propagates the exception. If the call
raises an exception meeting exceptionSpec,
assertRaises does nothing.
exceptionSpec can be an exception class or
a tuple of classes, just like the first argument to the
except clause of a
try/except statement.
failUnlessRaises is a synonym.
Fails unconditionally and outputs msg.
t.failIf(condition, msg=None)
|
|
Fails and outputs msg if
condition is true, otherwise does nothing.
The framework calls t.setUp(
) just before calling a test-case method. The
implementation in TestCase does nothing. This
method is provided in order to let your subclass override it if it
needs to perform some preparation for each test.
The framework calls t.tearDown(
) just after calling a test-case method. The implementation
in TestCase does nothing. This method is provided
in order to let your subclass override it if it needs to perform some
cleanup after each test.
17.1.3.2 Unit tests dealing with large amounts of data
Unit tests must be fast, since they are
run frequently during development. Therefore, it's
best to unit-test each aspect of your modules'
functionality on small amounts of data when possible. This makes each
unit test faster, and also lets you conveniently embed all needed
data in the test's source code. When you test a
function that reads from or writes to a file object, in particular,
you normally use an instance of class cStringIO
(covered in Chapter 10) to simulate a file object
while holding the data in memory.
However, in some rare cases, it may be impossible to fully exercise a
module's functionality without supplying and/or
comparing data in quantities larger than can be reasonably embedded
in a test's source code. In such cases, your unit
test will have to rely on auxiliary external data files to hold the
data it needs to supply to the module it tests, and/or the data it
needs to compare to the tested module's output. Even
then, you're generally better off reading the data
into instances of cStringIO rather than directing
the tested module to perform actual disk I/O. Similarly, I suggest
you generally use stubs to test modules meant to interact with other
external entities, such as a database, a GUI, or some other program
over a network. It's easier for you to control all
aspects of the test when using stubs rather than real external
entities. Also, to reiterate, the speed at which you can run tests is
important, and it's invariably faster to perform
simulated operations in stubs, rather than real operations.
|