10.3 File Objects
As discussed earlier in this
chapter, file is a built-in type in Python. With a
file object, you can read and/or write data to a file as seen by the
underlying operating system. Python reacts to any I/O error related
to a file object by raising an instance of built-in exception class
IOError. Errors that cause this exception include
open failing to open or create a file, calling a
method on a file object to which that method doesn't
apply (e.g., calling write on a read-only file
object or calling seek on a non-seekable file),
and I/O errors diagnosed by a file object's methods.
This section documents file objects, as well as some auxiliary
modules that help you access and deal with their
contents.
10.3.1 Creating a File Object with open
You normally create a Python file object
with the built-in open, which has the following
syntax:
open(filename,mode='r',bufsize=-1)
open opens
the file named by filename, which must be
a string that denotes any path to a file. open
returns a Python file object, which is an instance of the built-in
type file. Calling file is just
like calling open, but file was
first introduced in Python 2.2. If you explicitly pass a
mode string, open can
also create filename if the file does not
already exist (depending on the value of
mode, as we'll discuss in
a moment). In other words, despite its name, open
is not limited to opening existing files, but is also able to create
new ones if needed.
10.3.1.1 File mode
mode is a string
that denotes how the file is to be opened (or created).
mode can have the following values:
- 'r'
-
The file must already exist, and it is opened in read-only
mode.
- 'w'
-
The file is opened in write-only mode. The file is truncated and
overwritten if it already exists, or created if it does not
exist.
- 'a'
-
The file is opened in write-only mode. The file is kept intact if it
already exists, and the data you write is appended to
what's already in the file. The file is created if
it does not exist. Calling
f.seek is innocuous,
but has no effect.
- 'r+'
-
The file must already exist and is opened for both reading and
writing, so all methods of f can be
called.
- 'w+'
-
The file is opened for both reading and writing, so all methods of
f can be called. The file is truncated and
overwritten if it already exists, or created if it does not exist.
- 'a+'
-
The file is opened for both reading and writing, so all methods of
f can be called. The file is kept intact
if it already exists, and the data you write is appended to
what's already in the file. The file is created if
it does not exist. Calling
f.seek has no effect if
the next I/O operation on f writes data,
but works normally if the next I/O operation on
f reads data.
10.3.1.2 Binary and text modes
The mode string may also have any of the
values just explained followed by a b or
t. b denotes binary mode, while
t denotes text mode. When the
mode string has neither
b nor t, the default is text
mode (i.e., 'r' is like 'rt',
'w' is like 'wt', and so
on).
On Unix, there is no difference between binary and text modes. On
other platforms, when a file is open in text mode,
'\n' is returned each time the string that is the
value of os.linesep (the line termination string)
is encountered while reading the file. Conversely, a copy of
os.linesep is written each time you write
'\n' to the file.
This widespread convention, originally developed in the C language,
lets you read and write text files on any platform, without worrying
about the platform's line-separation conventions.
However, except on Unix platforms, you do have to know (and tell
Python, by passing the proper mode
argument to open) whether a file is binary or
text. In this chapter, for simplicity, I use \n to
refer to the line termination string, but remember that the string is
in fact os.linesep in files on the filesystem,
translated to and from \n in memory only for files
opened in text mode.
Python
2.3 will introduce a new concept, known as
universal newlines, letting
you open a text file for reading in mode 'u' when
you don't know how line separators are encoded in
the file. This is useful, for example, when you share files across a
network between machines with different operating systems. Mode
'u' guesses what line separator string to use
based on each file's contents. However, mode
'u' is not available in Python 2.2 and earlier.
10.3.1.3 Buffering
bufsize
is an integer that denotes what buffering you request for the file.
When bufsize is less than
0, the operating system's default
is used. Normally, this default is line buffering for files that
correspond to interactive consoles, and some reasonably sized buffer,
such as 8192 bytes, for other files. When
bufsize equals 0, the
file is unbuffered; the effect is as if the file's
buffer were flushed every time you write anything to the file. When
bufsize equals 1, the
file is line-buffered, which means the file's buffer
is flushed every time you write \n to the file.
When bufsize is greater than
1, the file uses a buffer of about
bufsize bytes, rounded up to some
reasonable amount. On some platforms, you can change the buffering
for files that are already open, but there is no cross-platform way
to do this.
10.3.1.4 Sequential and non-sequential access
A file object
f is inherently sequential (i.e., a stream
of bytes). When you read from a file, you get bytes in the sequential
order in which the bytes are present in the file. When you write to a
file, the bytes you write are put in the file in the sequential order
in which you write them.
To allow non-sequential access, the built-in file object keeps track
of its current position (i.e., the position on the underlying file
where the next read or write operation will start transferring data).
When you open a file, the file's initial current
position is at the start of the file. Any call to
f.write on a file
object f opened with a
mode of 'a' or
'a+' always sets
f's current position to
the end of the file before writing data to
f. Whenever you read or write some number
n of bytes on file object
f,
f's current position
advances by n. You can query the current
position by calling
f.tell, and change the
current position by calling
f.seek, both covered in
the next section.
10.3.2 Attributes and Methods of File Objects
A file
object f supplies the attributes and
methods documented in this section.
Closes the file. You can call no other method on
f after
f.close. Multiple calls
to f.close are allowed
and innocuous.
f.closed is a read-only
attribute that is True if
f.close( ) has been
called, otherwise False.
Requests that f's buffer
be written out to the operating system, ensuring that the file as
seen by the system has exactly the contents that
Python's code has written to
f. Depending on the platform and on the
nature of f's underlying
file, f.flush may or
may not be able to ensure the desired effect.
Returns True if
f's file is an
interactive terminal, otherwise False.
Returns an integer, the file descriptor of
f's file at operating
system level. File descriptors were covered in Section 10.2.8 earlier in this chapter.
f.mode is a read-only
attribute that is the value of the mode
string used in the open call that created
f.
f.name is a read-only
attribute that is the value of the
filename string used in the
open call that created
f.
Reads up to size bytes from
f's file and returns them
as a string. read reads and returns less than
size bytes if the file ends before
size bytes are read. When
size is less than 0,
read reads and returns all bytes up to the end of
the file. read returns an empty string only if the
file's current position is at the end of the file or
if size equals 0.
Reads and returns one line from
f's file, up to the end
of line (\n) included. If
size is greater than or equal to
0, readline reads no more than
about size bytes. In this case, the
returned string may not end with \n.
\n may also be absent if
readline reads up to the end of the file without
finding \n. readline returns an
empty string only if the file's current position is
at the end of the file or if size equals
0.
Reads and returns a list of all lines in
f's file, each a string
ending in \n. If
size>0,
readlines stops and returns the list after
collecting data for a total of about size
bytes, rather than reading all the way to the end of the file.
Sets f's current position
to the signed integer byte offset pos from
a reference point. how indicates the
reference point: when how is
0, the reference is the start of the file; when it
is 1, the reference is the current position; and
when it is 2, the reference is the end of the
file. When f is opened in text mode, the
effects of f.seek may
not be as expected, due to the implied translations between
os.linesep and \n. This
troublesome effect does not occur on Unix platforms, nor when
f is opened in binary mode, nor when
f.seek is called with a
pos that is the result of a previous call
to f.tell and
how is 0. When
f is opened in mode 'a'
or 'a+', all data written to
f is appended to the data that is already
in f, regardless of calls to
f.seek.
f.softspace is a
read-write attribute that is used internally by the
print statement to keep track of its own state. A
file object does not alter nor interpret softspace
in any way: it just lets the attribute be freely read and written,
and print takes care of the rest.
Returns f's current
position, an integer offset in bytes from the start of the file.
Truncates f's file. When
size is present, truncates the file to be
at most size bytes. When
size is absent, uses
f.tell( ) as the
file's new size.
Writes the bytes of string str to the file.
Like:
for line in lst: f.write(line) It does not matter whether the strings in sequence
lst are lines: despite its name, method
writelines just writes the strings to the file,
one after another, without alterations or additions.
Like
xreadlines.xreadlines(f),
as covered in Section 10.4.4 later in
this chapter. Method xreadlines will be deprecated
in Python 2.3.
10.3.3 Iteration on File Objects
A file object
f open for text-mode reading supports
iteration. In other words,
iter(f)
returns an iterator whose items are the file's
lines, so that the loop:
for line in f:
iterates on each line of the file. Interrupting such a loop
prematurely (e.g., with break) leaves the
file's current position with an arbitrary value.
Calling methods that modify
f's state, such as
f.seek, during such a
loop has an undefined effect. On the plus side, such a loop has very
good performance, since these specifications allow the loop to use
internal buffering to minimize I/O. Iteration on file objects is
available only in Python 2.2 and later.
10.3.4 File-Like Objects and Polymorphism
An object x is
file-like when it behaves polymorphically to a
file, meaning that a function (or some other subset of a program) can
use x as if x
were a file. Code that uses such an object (known as client code of
that object) typically receives the object as an argument or obtains
it by calling a factory function that returns the object as the
result. If the only method that a client-code function calls on
x is
x.read( ), without
arguments, all that x needs to supply in
order to be file-like for that function is a method
read that is callable without arguments and
returns a string. Other client-code functions, however, may need
x to implement a broader subset of file
object methods. Thus, file-like objects and polymorphism are not
absolute concepts, but are instead relative to demands placed upon an
object by client code.
Polymorphism is a powerful aspect of object-oriented programming, and
file-like objects are an excellent example of polymorphism. A
client-code module that writes to or reads from files can
automatically be reused for data residing elsewhere, as long as the
module does not break polymorphism by the dubious practice of type
testing. When we discussed the built-ins type and
isinstance in Chapter 8, I
mentioned that type testing is often best avoided, since it blocks
the normal polymorphism that Python otherwise supplies. Sometimes you
may have no choice. For example, the marshal
module, covered in Chapter 11, demands real file
objects. Therefore, if your client code needs to use
marshal, your code must also deal with real file
objects, not just file-like ones. However, such situations are rare.
Most often, supporting polymorphism in your client code takes nothing
more than some care in avoiding type
testing.
You can implement a file-like object by coding your own class, as
covered in Chapter 5, and defining the specific
methods needed by client code, such as read. A
file-like object fl need not implement all
the attributes and methods of a true file object
f. If you can determine which methods
client code calls on fl, you can choose to
implement only that subset. For example, when
fl is only meant to be written,
fl doesn't need methods
read, readline, and
readlines.
When you implement a file-like object fl,
make sure that
fl.softspace can be
read and written if you want fl to be
usable by print. You need not alter nor interpret
softspace in any way. Note that this behavior is
the default when you write
fl's class in Python. You
need to take specific care only when
fl's class overrides
special methods _ _getattr_ _ and _
_setattr_ _ or otherwise controls access to its
instances' attributes (e.g., by defining _
_slots_ _) as covered in Chapter 5. For
example, if your class is a new-style class and defines _
_slots_ _, your class must have a slot named
softspace, assuming you want instances of your
class to be usable with the print statement.
If the main reason you want to use a file-like object instead of a
real file object is to keep the data in memory, you can often make
use of modules StringIO and
cStringIO, covered later in this chapter. These
modules supply file-like objects that hold data in memory while
behaving polymorphically to file objects to a wide extent.
|