4.2 Data Types
The operation of a Python program hinges
on the data it handles. All data values in Python are represented by
objects, and each object, or value, has a type.
An object's type determines what operations the
object supports, or, in other words, what operations you can perform
on the data value. The type also determines the
object's attributes and items (if any) and whether
the object can be altered. An object that can be altered is known as
a mutable object, while one
that cannot be altered is an immutable
object. I cover object attributes and items in
detail later in this chapter.
The built-in
type(obj)
accepts any object as its argument and returns the type object that
represents the type of obj. Another
built-in function,
isinstance(obj,type),
returns True if object
obj is represented by type object
type; otherwise, it returns
False (built-in names True and
False were introduced in Python 2.2.1; in older
versions, 1 and 0 are used
instead).
Python has built-in objects for fundamental data types such as
numbers, strings, tuples, lists, and dictionaries, as covered in the
following sections. You can also create user-defined objects, known
as classes, as discussed in detail in Chapter 5.
4.2.1 Numbers
The built-in
number objects in Python support integers (plain and long),
floating-point numbers, and complex numbers. All numbers in Python
are immutable objects, meaning that when you perform an operation on
a number object, you always produce a new number object. Operations
on numbers, called arithmetic operations, are covered later in this
chapter.
Integer
literals can be decimal, octal, or hexadecimal. A decimal literal is
represented by a sequence of digits where the first digit is
non-zero. An octal literal is specified with a 0
followed by a sequence of octal digits (0 to
7). To indicate a hexadecimal literal, use
0x followed by a sequence of hexadecimal digits
(0 to 9 and
A to F, in either upper- or
lowercase). For example:
1, 23, 3493 # Decimal integers
01, 027, 06645 # Octal integers
0x1, 0x17, 0xDA5 # Hexadecimal integers
Any kind of integer literal may be followed by the letter
L or l to denote a long
integer. For instance:
1L, 23L, 99999333493L # Long decimal integers
01L, 027L, 01351033136165L # Long octal integers
0x1L, 0x17L, 0x17486CBC75L # Long hexadecimal integers
Use uppercase L here, not lowercase
l, which may look like the digit
1. The difference between a long integer and a
plain integer is that a long integer has no predefined size limit: it
may be as large as memory allows. A plain integer takes up a few
bytes of memory and has minimum and maximum values that are dictated
by machine architecture. sys.maxint is the largest
available plain integer, while -sys.maxint-1 is
the largest negative one. On typical 32-bit machines,
sys.maxint is 2147483647.
A
floating-point literal is represented by a sequence of decimal digits
that includes a decimal point (.), an exponent
part (an e or E, optionally
followed by + or -, followed by
one or more digits), or both. The leading character of a
floating-point literal cannot be e or
E: it may be any digit or a period
(.) (prior to Python 2.2, a leading
0 had to be immediately followed by a period). For
example:
0., 0.0, .0, 1., 1.0, 1e0, 1.e0, 1.0e0
A Python floating-point value corresponds to a C
double and shares its limits of range and
precision, typically 53 bits of precision on modern platforms.
(Python currently offers no way to find out this range and
precision.)
A complex number is made up of two floating-point values, one each
for the real and imaginary parts. You can access the parts of a
complex object z as read-only attributes
z.real and z.imag. You can
specify an imaginary literal as a floating-point or decimal literal
followed by a j or J:
0j, 0.j, 0.0j, .0j, 1j, 1.j, 1.0j, 1e0j, 1.e0j, 1.0e0j
The j at the end of the literal indicates the
square root of -1, as commonly used in electrical
engineering (some other disciplines use i for this
purpose, but Python has chosen j). There are no
other complex literals; constant complex numbers are denoted by
adding or subtracting a floating-point literal and an imaginary one.
Note that numeric literals do not include a sign: a leading
+ or -, if present, is a
separate operator, as discussed later in this chapter.
4.2.2 Sequences
A
sequence is an ordered container of items,
indexed by non-negative integers. Python provides built-in sequence
types for strings (plain and Unicode), tuples, and lists. Library and
extension modules provide other sequence types, and you can write yet
others yourself (as discussed in Chapter 5).
Sequences can be manipulated in a variety of ways, as discussed later
in this chapter.
4.2.2.1 Strings
A built-in string object
is an ordered collection of characters used to store and represent
text-based information. Strings in Python are
immutable, meaning that when you perform an
operation on a string, you always produce a new string object rather
than mutating the existing string. String objects provide numerous
methods, as discussed in detail in Chapter 9.
A string literal can be quoted or triple-quoted. A quoted string is a
sequence of zero or more characters enclosed in matching quote
characters, single (') or double
("). For example:
'This is a literal string'
"This is another string"
The two different kinds of quotes function identically; having both
allows you to include one kind of quote inside of a string specified
with the other kind without needing to escape them with the backslash
character (\):
'I\'m a Python fanatic' # a quote can be escaped
"I'm a Python fanatic" # this way is more readable
To have a string span multiple lines, you can use a backslash as the
last character of the line to indicate that the next line is a
continuation:
"A not very long string\
that spans two lines" # comment not allowed on previous line
To make the string output on two
lines, you must embed a newline in the string:
"A not very long string\n\
that prints on two lines" # comment not allowed on previous line
Another approach is to use a triple-quoted string, which is enclosed
by matching triplets of quote characters (''' or
"""):
"""An even bigger
string that spans
three lines""" # comments not allowed on previous lines
In a triple-quoted string literal, line breaks in the literal are
preserved as newline characters in the resulting string object.
The only character that cannot be part of a triple-quoted string is
an unescaped backslash, while a quoted string cannot contain an
unescaped backslash, a line-end, and the quote character that
encloses it. The backslash character starts an escape sequence, which
lets you introduce any character in either kind of string.
Python's string escape sequences are listed in Table 4-1.
Table 4-1. String escape sequences
\<newline>
|
End of line is ignored
|
None
|
\\
|
Backslash
|
0x5c
|
\'
|
Single quote
|
0x27
|
\"
|
Double quote
|
0x22
|
\a
|
Bell
|
0x07
|
\b
|
Backspace
|
0x08
|
\f
|
Form feed
|
0x0c
|
\n
|
Newline
|
0x0a
|
\r
|
Carriage return
|
0x0d
|
\t
|
Tab
|
0x09
|
\v
|
Vertical tab
|
0x0b
|
\DDD
|
Octal value DDD
|
As given
|
\xXX
|
Hexadecimal value XX
|
As given
|
\other
|
Any other character
|
0x5c + as given
|
A variant of a string literal is a raw
string. The syntax is the same as for quoted or triple-quoted string
literals, except that an r or R
immediately precedes the leading quote. In raw strings, escape
sequences are not interpreted as in Table 4-1, but
are literally copied into the string, including backslashes and
newline characters. Raw string syntax is handy for strings that
include many backslashes, as in regular expressions (see Chapter 9). A raw string cannot end with an odd number
of backslashes: the last one would be taken as escaping the
terminating quote.
Unicode string literals have the same
syntax as other string literals, plus a u or
U immediately before the leading quote character.
Unicode string literals can use \u followed by
four hexadecimal digits to denote Unicode characters, and can also
include the kinds of escape sequences listed in Table 4-1. Unicode literals can also include the escape
sequence
\N{name},
where name is a standard Unicode name as
per the list at http://www.unicode.org/charts/. For example,
\N{Copyright Sign} indicates a
Unicode copyright sign character (©). Raw Unicode string
literals start with ur, not ru.
Multiple string literals of any kind (quoted, triple-quoted, raw,
Unicode) can be adjacent, with optional whitespace in between. The
compiler concatenates such adjacent string literals into a single
string object. If any literal in the concatenation is Unicode, the
whole result is Unicode. Writing a long string literal in this way
lets you present it readably across multiple physical lines, and
gives you an opportunity to insert comments about parts of the
string. For example:
marypop = ('supercalifragilistic' # Open paren -> logical line continues
'expialidocious') # Indentation ignored in continuation
The result here is a single word of 34 characters.
4.2.2.2 Tuples
A
tuple is an immutable ordered sequence of items.
The items of a tuple are arbitrary objects and may be of different
types. To specify a tuple, use a series of expressions (the
items of the tuple) separated by commas
(,). You may optionally place a redundant comma
after the last item. You may group tuple items with parentheses, but
the parentheses are needed only where the commas would otherwise have
another meaning (e.g., in function calls) or to denote empty or
nested tuples. A tuple with exactly two items is also often called a
pair. To create a tuple of one item (a singleton), add a comma to the
end of the expression. An empty tuple is denoted by an empty pair of
parentheses. Here are some tuples, all enclosed in optional
parentheses:
(100,200,300) # Tuple with three items
(3.14,) # Tuple with one item
( ) # Empty tuple
You can also call the built-in tuple to create a
tuple. For example:
tuple('wow')
This builds a tuple equal to:
('w', 'o', 'w')
tuple( ) without arguments creates and returns an
empty tuple. When x is a sequence,
tuple(x)
returns a tuple whose items are the same as the items in sequence
x.
4.2.2.3 Lists
A
list is a mutable ordered sequence of items. The
items of a list are arbitrary objects and may be of different types.
To specify a list, use a series of expressions (the
items of the list) separated by commas
(,) and within brackets ([ ]).
You may optionally place a redundant comma after the last item. An
empty list is denoted by an empty pair of brackets. Here are some
example lists:
[42,3.14,'hello'] # List with three items
[100] # List with one item
[ ] # Empty list
You can also call the built-in list to create a
list. For example:
list('wow')
This builds a list equal to:
['w', 'o', 'w']
list( ) without arguments creates and returns an
empty list. When x is a sequence,
list(x)
creates and returns a new list whose items are the same as the items
in sequence x. You can also build lists
with list comprehensions, as discussed later in this
chapter.
4.2.3 Dictionaries
A
mapping is an arbitrary collection of objects
indexed by nearly arbitrary values called keys.
Mappings are mutable and, unlike sequences, are unordered.
Python provides a single built-in mapping type, the dictionary type.
Library and extension modules provide other mapping types, and you
can write others yourself (as discussed in Chapter 5). Keys in a dictionary may be of different
types, but they must be hashable (see function
hash in Section 8.2 in Chapter 8). Values
in a dictionary are arbitrary objects and may be of different types.
An item in a dictionary is a key/value pair. You
can think of a dictionary as an associative array (also known in some
other languages as a hash).
To
specify a dictionary, use a series of pairs of expressions (the pairs
are the items of the dictionary) separated by commas
(,) within braces ({ }). You
may optionally place a redundant comma after the last item. Each item
in a dictionary is written
key:value,
where key is an expression giving the
item's key and value is
an expression giving the item's value. If a key
appears more than once in a dictionary, only one of the items with
that key is kept in the dictionary. In other words, dictionaries do
not allow duplicate keys. An empty dictionary is denoted by an empty
pair of braces. Here are some dictionaries:
{ 'x':42, 'y':3.14, 'z':7 } # Dictionary with three items and string keys
{ 1:2, 3:4 } # Dictionary with two items and integer keys
{ } # Empty dictionary
In Python 2.2 and up, you can call the built-in
dict to create a dictionary. For example:
dict([[1,2],[3,4]])
This builds a dictionary equal to:
{1:2,3:4}
dict( ) without arguments creates and returns an
empty dictionary. When the argument x to
dict is a mapping, dict returns
a new dictionary object with the same keys and values as
x. When x is a
sequence, the items in x must be pairs,
and
dict(x)
returns a dictionary whose items (key/value pairs) are the same as
the items in sequence x. If a key appears
more than once in x, only the last item
with that key is kept in the resulting dictionary.
4.2.4 None
The built-in type
None denotes a null object.
None has no methods or other attributes. You can
use None as a placeholder when you need a
reference but you don't care about what object you
refer to, or when you need to indicate that no object is there.
Functions return None as their result unless they
have specific return statements coded to return other values.
4.2.5 Callables
In Python,
callable types are those whose instances support the function call
operation (see Section 4.4 later in this chapter). Functions are
obviously callable, and Python provides built-in functions (see Chapter 8) and also supports user-defined functions (see
Section 4.10 later in this chapter).
Generators, which are new as of Python 2.2, are also callable (see
Section 4.10.8 later in this
chapter).
Types are also callable. Thus, the dict,
list, and tuple built-ins
discussed earlier are in fact types. Prior to Python 2.2, these names
referred to factory functions for creating objects of these types. As
of Python 2.2, however, they refer to the type objects themselves.
Since types are callable, this change does not break existing
programs. See Chapter 8 for a complete list of
built-in types.
As we'll discuss in Chapter 5,
class objects are callable. So are methods, which are functions bound
to class attributes. Finally, class instances whose classes supply
_ _call_ _ methods are also callable.
4.2.6 Boolean Values
Prior to Python 2.3, there is no explicit
Boolean type in Python. However, every data value in Python can be
evaluated as a truth value: true or false. Any non-zero number or
non-empty string, tuple, list, or dictionary evaluates as true. Zero
(of any numeric type), None, and empty strings,
tuples, lists, and dictionaries evaluate as false. Python also has a
number of built-in functions that return Boolean results.
Built-in names True and False
were introduced in Python 2.2.1 to represent true and false; in older
versions of Python, 1 and 0 are
used instead. Throughout the rest of this book, I will use
True and False to represent
true and false. If you are using a version of Python older than
2.2.1, you'll need to substitute
1 and 0 when using examples
from this book.
Python 2.2.1 also introduced a new built-in function named
bool. When this function is called with any
argument, it considers the argument's value in a
Boolean context and returns False or
True accordingly.
In Python 2.3, bool becomes a type (a subclass of
int) and True and
False are the values of that type. The only
substantial effect of this innovation is that the string
representations of Boolean values become 'True'
and 'False', while in earlier versions they are
'1' and '0'.
The 2.2.1 and 2.3 changes are handy because they let you speak of
functions and expressions as "returning
True or False"
or "returning a Boolean." The
changes also let you write clearer code when you want to return a
truth value (e.g., return True
instead of return 1).
|