7.2 Module Loading
Module-loading operations rely on
attributes of the built-in sys module (covered in
Chapter 8). The module-loading process described
here is carried out by built-in function _ _import_
_. Your code can call _ _import_ _
directly, with the module name string as an argument. _
_import_ _ returns the module object or raises
ImportError if the import
fails.
To import a module named
M, _ _import_ _ first
checks dictionary sys.modules, using string
M as the key. When key
M is in the dictionary, _
_import_ _ returns the corresponding value as the requested
module object. Otherwise, _ _import_ _ binds
sys.modules[M]
to a new empty module object with a _ _name_ _ of
M, then looks for the right way to
initialize (load) the module, as covered in Section 7.2.2 later in this section.
Thanks to this mechanism, the loading operation takes place only the
first time a module is imported in a given run of the program. When a
module is imported again, the module is not reloaded, since
_ _import_ _ finds and returns the
module's entry in sys.modules.
Thus, all imports of a module after the first one are extremely fast
because they're just dictionary lookups.
7.2.1 Built-in Modules
When a
module is loaded, _ _import_ _ first checks
whether the module is built-in. Built-in modules are listed in tuple
sys.builtin_module_names, but rebinding that tuple
does not affect module loading. A built-in module, like any other
Python extension, is initialized by calling the
module's initialization function. The search for
built-in modules also finds frozen modules and modules in
platform-specific locations (e.g., resources on the Mac, the Registry
in Windows).
7.2.2 Searching the Filesystem for a Module
If module
M is not built-in or frozen, _
_import_ _ looks for
M's code as a file on the
filesystem. _ _import_ _ looks in the directories
whose names are the items of list sys.path, in
order. sys.path is initialized at program startup,
using environment variable PYTHONPATH (covered in
Chapter 3) if present. The first item in
sys.path is always the directory from which the
main program (script) is loaded. An empty string in
sys.path indicates the current
directory.
Your code can mutate or rebind sys.path, and such
changes affect what directories _ _import_ _
searches to load modules. Changing sys.path does
not affect modules that are already loaded (and thus already listed
in sys.modules) when sys.path
is changed.
If a text file with
extension .pth is found in the
PYTHONHOME directory at startup, its contents are
added to sys.path, one item per line.
.pth files can also contain blank lines and
comment lines starting with the character #, as
Python ignores any such lines. .pth files can
also contain import statements, which Python
executes, but no other kinds of statements.
When looking for the file
for module M in each directory along
sys.path, Python considers the following
extensions in the order listed:
.pyd and .dll (Windows) or
.so (most Unix-like platforms), which indicate
Python extension modules. (Some Unix dialects use different
extensions; e.g., .sl is the extension used on
HP-UX.)
.py, which indicates pure Python source modules.
.pyc (or .pyo, if Python is
run with option -O), which indicates
bytecode-compiled Python modules.
Upon finding source file
M.py, Python compiles
it to M.pyc (or
M.pyo) unless the
bytecode file is already present, is newer than
M.py, and was
compiled by the same version of Python. Python saves the bytecode
file to the filesystem in the same directory as
M.py (if permissions
on the directory allow writing) so that future runs will not
needlessly recompile. When the bytecode file is newer than the source
file, Python does not recompile the module.
Once Python has the bytecode file, either from having constructed it
by compilation or by reading it from the filesystem, Python executes
the module body to initialize the module object. If the module is an
extension, Python calls the module's initialization
function.
7.2.3 The Main Program
Execution of a Python application normally
starts with a top-level script (also known as the main
program), as explained in Chapter 3.
The main program executes like any other module being loaded except
that Python keeps the bytecode in memory without saving it to disk.
The module name for the main program is always _ _main_
_, both as the _ _name_ _ global
variable (module attribute) and as the key in
sys.modules. You should not normally import the
same .py file that is in use as the main
program. If you do, the module is loaded again, and the module body
is executed once more from the top in a separate module object with a
different _ _name_ _.
Code in a Python module can test whether the module is being used as
the main program by checking if global variable _ _name_
_ equals '_ _main_ _'. The idiom:
if _ _name_ _= ='_ _main_ _':
is often used to guard some code so that it executes only when the
module is run as the main program. If a module is designed only to be
imported, it should normally execute unit tests when it is run as the
main program, as covered in Chapter 17.
7.2.4 The reload Function
As I
explained earlier, Python loads a module only the first time you
import the module during a program run. When you develop
interactively, you need to make sure that your modules are reloaded
each time you edit them (some development environments provide
automatic reloading).
To reload a module, pass the module object (not
the module name) as the only argument to built-in function
reload.
reload(M)
ensures the reloaded version of M is used
by client code that relies on import
M and accesses attributes with the syntax
M.A. However,
reload(M)
has no effect on other references bound to previous values of
M's attributes (e.g.,
with the from statement). In other words,
already-bound variables remain bound as they were, unaffected by
reload.
reload's inability to rebind such
variables is a further incentive to avoid from.
7.2.5 Circular Imports
Python lets you specify circular
imports. For example, you can write a module
a.py that contains import
b, while module b.py contains
import a. In practice, you are
typically better off avoiding circular imports, since circular
dependencies are fragile and hard to manage. If you decide to use a
circular import for some reason, you need to understand how circular
imports work in order to avoid errors in your code.
Say that the main script executes import
a. As discussed earlier, this
import statement creates a new empty module object
as sys.modules['a'], and then the body of module
a starts executing. When a
executes import b, this creates
a new empty module object as sys.modules['b'], and
then the body of module b starts executing. The
execution of a's module body is
now suspended until b's module
body finishes.
Now, when b executes import
a, the import statement finds
sys.modules['a'] already defined and therefore
binds global variable a in module
b to the module object for module
a. Since the execution of
a's module body is currently
suspended, module a may be only partly populated
at this time. If the code in b's
module body tries to access some attribute of module
a that is not yet bound, an error results.
If you do insist on keeping a circular import in some case, you must
carefully manage the order in which each module defines its own
globals, imports the other module, and accesses the globals of the
other module. Generally, you can have greater control on the sequence
in which things happen by grouping your statements into functions and
calling those functions in a controlled order, rather than just
relying on sequential execution of top-level statements in module
bodies. However, removing circular dependencies is almost always
easier than ensuring bomb-proof ordering while keeping such circular
dependencies.
7.2.6 sys.modules Entries
The built-in
_ _import_ _ function never binds anything other
than a module object as a value in sys.modules.
However, if _ _import_ _ finds an entry that is
already in sys.modules, it will try to use that
value, whatever type of object it may be. The
import and from statements rely
on the _ _import_ _ function, so therefore they
too can end up using objects that are not modules. This lets you set
class instances as entries in sys.modules, and
take advantage of features such as their _ _getattr_
_ and _ _setattr_ _ special methods,
covered in Chapter 5. This advanced technique lets
you import module-like objects whose attributes can in fact be
computed on the fly. Here's a trivial toy-like
example:
class TT:
def _ _getattr_ _(self, name): return 23
import sys
sys.modules[_ _name_ _] = TT( )
When you import this code as a module, you get a module-like object
that appears to have any attribute name you try to get from it, and
all attribute names correspond to the integer value 23.
7.2.7 Custom Importers
You can rebind the _
_import_ _ attribute of module _ _builtin_
_ to your own custom importer function by wrapping the
_ _import_ _ function using the technique shown
earlier in this chapter. Such rebinding influences all
import and from statements that
execute after the rebinding. A custom importer must implement the
same interface as the built-in _ _import_ _, and
is often implemented with some help from the functions exposed by
built-in module imp. Custom importer functions are
an advanced and rarely used technique.
|