4.10 Functions

Most statements in a typical Python program are organized into functions. A function is a group of statements that executes upon request. Python provides many built-in functions and allows programmers to define their own functions. A request to execute a function is known as a function call. When a function is called, it may be passed arguments that specify data upon which the function performs its computation. In Python, a function always returns a result value, either None or a value that represents the results of its computation. Functions defined within class statements are also called methods. Issues specific to methods are covered in Chapter 5; the general coverage of functions in this section, however, also applies to methods.

In Python, functions are objects (values) and are handled like other objects. Thus, you can pass a function as an argument in a call to another function. Similarly, a function can return another function as the result of a call. A function, just like any other object, can be bound to a variable, an item in a container, or an attribute of an object. Functions can also be keys into a dictionary. For example, if you need to quickly find a function's inverse given the function, you could define a dictionary whose keys and values are functions and then make the dictionary bidirectional (using some functions from module math, covered in Chapter 15):

inverse = {sin:asin, cos:acos, tan:atan, log:exp}
for f in inverse.keys(  ): inverse[inverse[f]] = f

The fact that functions are objects in Python is often expressed by saying that functions are first-class objects.

4.10.1 The def Statement

The def statement is the most common way to define a function. def is a single-clause compound statement with the following syntax:

def function-name(parameters): 
    statement(s)

function-name is an identifier. It is a variable that gets bound (or rebound) to the function object when def executes.

parameters is an optional list of identifiers, called formal parameters or just parameters, that are used to represent values that are supplied as arguments when the function is called. In the simplest case, a function doesn't have any formal parameters, which means the function doesn't take any arguments when it is called. In this case, the function definition has empty parentheses following function-name.

When a function does take arguments, parameters contains one or more identifiers, separated by commas (,). In this case, each call to the function supplies values, known as arguments, that correspond to the parameters specified in the function definition. The parameters are local variables of the function, as we'll discuss later in this section, and each call to the function binds these local variables to the corresponding values that the caller supplies as arguments.

The non-empty sequence of statements, known as the function body, does not execute when the def statement executes. Rather, the function body executes later, each time the function is called. The function body can contain zero or more occurrences of the return statement, as we'll discuss shortly.

Here's an example of a simple function that returns a value that is double the value passed to it:

def double(x):
    return x*2

4.10.2 Parameters

Formal parameters that are simple identifiers indicate mandatory parameters. Each call to the function must supply a corresponding value (argument) for each mandatory parameter.

In the comma-separated list of parameters, zero or more mandatory parameters may be followed by zero or more optional parameters, where each optional parameter has the syntax:

identifier=expression

The def statement evaluates the expression and saves a reference to the value returned by the expression, called the default value for the parameter, among the attributes of the function object. When a function call does not supply an argument corresponding to an optional parameter, the call binds the parameter's identifier to its default value for that execution of the function.

Note that the same object, the default value, gets bound to the optional parameter whenever the caller does not supply a corresponding argument. This can be tricky when the default value is a mutable object and the function body alters the parameter. For example:

def f(x, y=[  ]):
    y.append(x)
    return y
print f(23)                # prints: [23]
prinf f(42)                # prints: [23,42]

The second print statement prints [23,42] because the first call to f altered the default value of y, originally an empty list [ ], by appending 23 to it. If you want y to be bound to a new empty list object each time f is called with a single argument, use the following:

def f(x, y=None):
    if y is None: y = [  ]
    y.append(x)
    return y
print f(23)                # prints: [23]
prinf f(42)                # prints: [42]

At the end of the formal parameters, you may optionally use either or both of the special forms *identifier1 and **identifier2. If both are present, the one with two asterisks must be last. *identifier1 indicates that any call to the function may supply extra positional arguments, while **identifier2 specifies that any call to the function may supply extra named arguments (positional and named arguments are covered later in this chapter). Every call to the function binds identifier1 to a tuple whose items are the extra positional arguments (or the empty tuple, if there are none). identifier2 is bound to a dictionary whose items are the names and values of the extra named arguments (or the empty dictionary, if there are none). Here's how to write a function that accepts any number of arguments and returns their sum:

def sum(*numbers):
    result = 0
    for number in numbers: result += number
    return result
print sum(23,42)           # prints: 65

The ** form also lets you construct a dictionary with string keys in a more readable fashion than with the standard dictionary creation syntax:

def adict(**kwds): return kwds
print adict(a=23, b=42)    # prints: {'a':23, 'b':42}

Note that the body of function adict is just one simple statement, and therefore we can exercise the option to put it on the same line as the def statement. Of course, it would be just as correct (and arguably more readable) to code function adict using two lines instead of one:

def adict(**kwds):
    return kwds

4.10.3 Attributes of Function Objects

The def statement defines some attributes of a function object. The attribute func_name, also accessible as _ _name_ _, is a read-only attribute (trying to rebind or unbind it raises a runtime exception) that refers to the identifier used as the function name in the def statement. The attribute func_defaults, which you may rebind or unbind, refers to the tuple of default values for the optional parameters (or the empty tuple, if the function has no optional parameters).

Another function attribute is the documentation string, also known as a docstring. You may use or rebind a function's docstring attribute as either func_doc or _ _doc_ _. If the first statement in the function body is a string literal, the compiler binds that string as the function's docstring attribute. A similar rule applies to classes (see Chapter 5) and modules (see Chapter 7). Docstrings most often span multiple physical lines, and are therefore normally specified in triple-quoted string literal form. For example:

def sum(*numbers):
    '''Accept arbitrary numerical arguments and return their sum.

    The arguments are zero or more numbers.  The result is their sum.'''

    result = 0
    for number in numbers: result += number
    return result

Documentation strings should be part of any Python code you write. They play a role similar to that of comments in any programming language, but their applicability is wider since they are available at runtime. Development environments and other tools may use docstrings from function, class, and module objects to remind the programmer how to use those objects. The doctest module (covered in Chapter 17) makes it easy to check that the sample code in docstrings is accurate and correct.

To make your docstrings as useful as possible, you should respect a few simple conventions. The first line of a docstring should be a concise summary of the function's purpose, starting with an uppercase letter and ending with a period. It should not mention the function's name, unless the name happens to be a natural-language word that comes naturally as part of a good, concise summary of the function's operation. If the docstring is multiline, the second line should be empty, and the following lines should form one or more paragraphs, separated by empty lines, describing the function's expected arguments, preconditions, return value, and side effects (if any). Further explanations, bibliographical references, and usage examples (to be checked with doctest) can optionally follow toward the end of the docstring.

In addition to its predefined attributes, a function object may be given arbitrary attributes. To create an attribute of a function object, bind a value to the appropriate attribute references in an assignment statement after the def statement has executed. For example, a function could count how many times it is called:

def counter(  ):
    counter.count += 1
    return counter.count
counter.count = 0

Note that this is not common usage. More often, when you want to group together some state (data) and some behavior (code), you should use the object-oriented mechanisms covered in Chapter 5. However, the ability to associate arbitrary attributes with a function can sometimes come in handy.

4.10.4 The return Statement

The return statement in Python is allowed only inside a function body, and it can optionally be followed by an expression. When return executes, the function terminates and the value of the expression is returned. A function returns None if it terminates by reaching the end of its body or by executing a return statement that has no expression.

As a matter of style, you should not write a return statement without an expression at the end of a function body. If some return statements in a function have an expression, all return statements should have an expression. return None should only be written explicitly to meet this style requirement. Python does not enforce these stylistic conventions, but your code will be clearer and more readable if you follow them.

4.10.5 Calling Functions

A function call is an expression with the following syntax:

function-object(arguments)

function-object may be any reference to a function object; it is most often the function's name. The parentheses denote the function-call operation itself. arguments, in the simplest case, is a series of zero or more expressions separated by commas (,), giving values for the function's corresponding formal parameters. When a function is called, the parameters are bound to these values, the function body executes, and the value of the function-call expression is whatever the function returns.

4.10.5.1 The semantics of argument passing

In traditional terms, all argument passing in Python is by value. For example, if a variable is passed as an argument, Python passes to the function the object (value) to which the variable currently refers, not the variable itself. Thus, a function cannot rebind the caller's variables. However, if a mutable object is passed as an argument, the function may request changes to that object since Python passes the object itself, not a copy. Rebinding a variable and mutating an object are totally different concepts in Python. For example:

def f(x, y):
    x = 23
    y.append(42)
a = 77
b = [99]
f(a, b)
print a, b                # prints: 77 [99, 42]

The print statement shows that a is still bound to 77. Function f's rebinding of its parameter x to 23 has no effect on f's caller, and in particular on the binding of the caller's variable, which happened to be used to pass 77 as the parameter's value. However, the print statement also shows that b is now bound to [99,42]. b is still bound to the same list object as before the call, but that object has mutated, as f has appended 42 to that list object. In either case, f has not altered the caller's bindings, nor can f alter the number 77, as numbers are immutable. However, f can alter a list object, as list objects are mutable. In this example, f does mutate the list object that the caller passes to f as the second argument by calling the object's append method.

4.10.5.2 Kinds of arguments

Arguments that are just expressions are called positional arguments. Each positional argument supplies the value for the formal parameter that corresponds to it by position (order) in the function definition.

In a function call, zero or more positional arguments may be followed by zero or more named arguments with the following syntax:

identifier=expression

The identifier must be one of the formal parameter names used in the def statement for the function. The expression supplies the value for the formal parameter of that name.

A function call must supply, via either a positional or a named argument, exactly one value for each mandatory parameter, and zero or one value for each optional parameter. For example:

def divide(divisor, dividend): return dividend // divisor
print divide(12,94)                         # prints: 7
print divide(dividend=94, divisor=12)       # prints: 7

As you can see, the two calls to divide are equivalent. You can pass named arguments for readability purposes when you think that identifying the role of each argument and controlling the order of arguments enhances your code's clarity.

A more common use of named arguments is to bind some optional parameters to specific values, while letting other optional parameters take their default values:

def f(middle, begin='init', end='finis'): return begin+middle+end
print f('tini', end='')                     # prints: inittini

Thanks to named argument end='', the caller can specify a value, the empty string '', for f's third parameter, end, and still let f's second parameter, begin, use its default value, the string 'init'.

At the end of the arguments in a function call, you may optionally use either or both of the special forms *seq and **dict. If both are present, the one with two asterisks must be last. *seq passes the items of seq to the function as positional arguments (after the normal positional arguments, if any, that the call gives with the usual simple syntax). seq may be any sequence or iterable. **dict passes the items of dict to the function as named arguments, where dict must be a dictionary whose keys are all strings. Each item's key is a parameter name, and the item's value is the argument's value.

Sometimes you want to pass an argument of the form *seq or **dict when the formal parameters use similar forms, as described earlier under Section 4.10.2. For example, using the function sum defined in that section (and shown again here), you may want to print the sum of all the values in dictionary d. This is easy with *seq:

def sum(*numbers):
    result = 0
    for number in numbers: result += number
    return result
print sum(*d.values(  ))

However, you may also pass arguments of the form *seq or **dict when calling a function that does not use similar forms in its formal parameters.

4.10.6 Namespaces

A function's formal parameters, plus any variables that are bound (by assignment or by other binding statements) in the function body, comprise the function's local namespace, also known as local scope. Each of these variables is called a local variable of the function.

Variables that are not local are known as global variables (in the absence of nested definitions, which we'll discuss shortly). Global variables are attributes of the module object, as covered in Chapter 7. If a local variable in a function has the same name as a global variable, whenever that name is mentioned in the function body, the local variable, not the global variable, is used. This idea is expressed by saying that the local variable hides the global variable of the same name throughout the function body.

4.10.6.1 The global statement

By default, any variable that is bound within a function body is a local variable of the function. If a function needs to rebind some global variables, the first statement of the function must be:

global identifiers

where identifiers is one or more identifiers separated by commas (,). The identifiers listed in a global statement refer to the global variables (i.e., attributes of the module object) that the function needs to rebind. For example, the function counter that we saw in Section 4.10.3 could be implemented using global and a global variable rather than an attribute of the function object as follows:

_count = 0
def counter(  ):
    global _count
    _count += 1
    return _count

Without the global statement, the counter function would raise an UnboundLocalError exception because _count would be an uninitialized (unbound) local variable. Note also that while the global statement does enable this kind of programming, it is neither elegant nor advisable. As I mentioned earlier, when you want to group together some state and some behavior, the object-oriented mechanisms covered in Chapter 5 are typically the best approach.

You don't need global if the function body simply uses a global variable, including changing the object bound to that variable if the object is mutable. You need to use a global statement only if the function body rebinds a global variable. As a matter of style, you should not use global unless it's strictly necessary, as its presence will cause readers of your program to assume the statement is there for some useful purpose.

4.10.6.2 Nested functions and nested scopes

A def statement within a function body defines a nested function, and the function whose body includes the def is known as an outer function to the nested one. Code in a nested function's body may access (but not rebind) local variables of an outer function, also known as free variables of the nested function. This nested-scope access is automatic in Python 2.2 and later. To request nested-scope access in Python 2.1, the first statement of the module must be:

from _ _future_ _ import nested_scopes

The simplest way to let a nested function access a value is often not to rely on nested scopes, but rather to explicitly pass that value as one of the function's arguments. The argument's value can be bound when the nested function is defined by using the value as the default for an optional argument. For example:

def percent1(a, b, c):                # works with any version
    def pc(x, total=a+b+c): return (x*100.0) / total
    print "Percentages are ", pc(a), pc(b), pc(c)

Here's the same functionality using nested scopes:

def percent2(a, b, c):                # needs 2.2 or "from future import"
    def pc(x): return (x*100.0) / (a+b+c)
    print "Percentages are", pc(a), pc(b), pc(c)

In this specific case, percent1 has a slight advantage: the computation of a+b+c happens only once, while percent2's inner function pc repeats the computation three times. However, if the outer function were rebinding its local variables between calls to the nested function, repeating this computation might be an advantage. It's therefore advisable to be aware of both approaches, and choose the most appropriate one case by case.

A nested function that accesses values from outer local variables is known as a closure. The following example shows how to build a closure without nested scopes (using a default value):

def make_adder_1(augend):             # works with any version
    def add(addend, _augend=augend): return addend+_augend
    return add

Here's the same closure functionality using nested scopes:

def make_adder_2(augend):             # needs 2.2 or "from future import"
    def add(addend): return addend+augend
    return add

Closures are an exception to the general rule that the object-oriented mechanisms covered in Chapter 5 are the best way to bundle together data and code. When you need to construct callable objects, with some parameters fixed at object construction time, closures can be simpler and more effective than classes. For example, the result of make_adder_1(7) is a function that accepts a single argument and adds 7 to that argument (the result of make_adder_2(7) behaves in just the same way). You can also express the same idea as lambda x: x+7, using the lambda form covered in the next section. A closure is a "factory" for any member of a family of functions distinguished by some parameters, such as the value of argument augend in the previous examples, and this may often help you avoid code duplication.

4.10.7 lambda Expressions

If a function body contains a single return expression statement, you may choose to replace the function with the special lambda expression form:

lambda parameters: expression

A lambda expression is the anonymous equivalent of a normal function whose body is a single return statement. Note that the lambda syntax does not use the return keyword. You can use a lambda expression wherever you would use a reference to a function. lambda can sometimes be handy when you want to use a simple function as an argument or return value. Here's an example that uses a lambda expression as an argument to the built-in filter function:

aList = [1,2,3,4,5,6,7,8,9]
low = 3
high = 7
filter(lambda x,l=low,h=high: h>x>l, aList)     # returns: [4, 5, 6]

As an alternative, you can always use a local def statement that gives the function object a name. You can then use this name as the argument or return value. Here's the same filter example using a local def statement:

aList = [1,2,3,4,5,6,7,8,9]
low = 3
high = 7
def test(value, l=low, h=high):
    return h>value>l
filter(test, aList)                             # returns: [4, 5, 6]

4.10.8 Generators

When the body of a function contains one or more occurrences of the keyword yield, the function is called a generator. When a generator is called, the function body does not execute. Instead, calling the generator returns a special iterator object that wraps the function body, the set of its local variables (including its parameters), and the current point of execution, which is initially the start of the function.

When the next method of this iterator object is called, the function body executes up to the next yield statement, which takes the form:

yield expression

When a yield statement executes, the function is frozen with its execution state and local variables intact, and the expression following yield is returned as the result of the next method. On the next call to next, execution of the function body resumes where it left off, again up to the next yield statement. If the function body ends or executes a return statement, the iterator raises a StopException to indicate that the iterator is finished. Note that return statements in a generator cannot contain expressions, as that is a syntax error.

yield is always a keyword in Python 2.3 and later. In Python 2.2, to make yield a keyword in a source file, use the following line as the first statement in the file:

from _ _future_ _ import generators

In Python 2.1 and earlier, you cannot define generators.

Generators are often handy ways to build iterators. Since the most common way to use an iterator is to loop on it with a for statement, you typically call a generator like this:

for avariable in somegenerator(arguments):

For example, say that you want a sequence of numbers counting up from 1 to N and then down to 1 again. A generator helps:

def updown(N):
    for x in xrange(1,N): yield x
    for x in xrange(N,0,-1): yield x
for i in updown(3): print i                   # prints: 1 2 3 2 1

Here is a generator that works somewhat like the built-in xrange function, but returns a sequence of floating-point values instead of a sequence of integers:

def frange(start, stop, step=1.0):
    while start < stop:
        yield start
        start += step

frange is only somewhat like xrange, because, for simplicity, it makes arguments start and stop mandatory, and silently assumes step is positive (by default, like xrange, frange makes step equal to 1).

Generators are more flexible than functions that return lists. A generator may build an iterator that returns an infinite stream of results that is usable only in loops that terminate by other means (e.g., via a break statement). Further, the generator-built iterator performs lazy evaluation: the iterator computes each successive item only when and if needed, just in time, while the equivalent function does all computations in advance and may require large amounts of memory to hold the results list. Therefore, in Python 2.2 and later, if all you need is the ability to iterate on a computed sequence, it is often best to compute the sequence in a generator, rather than in a function that returns a list. If the caller needs a list that contains all the items produced by a generator G(arguments), the caller can use the following code:

resulting_list = list(G(arguments))

4.10.9 Recursion

Python supports recursion (i.e., a Python function can call itself), but there is a limit to how deep the recursion can be. By default, Python interrupts recursion and raises a RecursionLimitExceeded exception (covered in Chapter 6) when it detects that the stack of recursive calls has gone over a depth of 1,000. You can change the recursion limit with function setrecursionlimit of module sys, covered in Chapter 8.

However, changing this limit will still not give you unlimited recursion; the absolute maximum limit depends on the platform, particularly on the underlying operating system and C runtime library, but it's typically a few thousand. When recursive calls get too deep, your program will crash. Runaway recursion after a call to setrecursionlimit that exceeds the platform's capabilities is one of the very few ways a Python program can crash—really crash, hard, without the usual safety net of Python's exception mechanisms. Therefore, be wary of trying to fix a program that is getting RecursionLimitExceeded exceptions by raising the recursion limit too high with setrecursionlimit. Most often, you'd be better advised to look for ways to remove the recursion or, at least, to limit the depth of recursion that your program needs.