2.3 TermsNow that we've talked about the kinds of data you can represent in Perl, we'd like to introduce you to the various kinds of terms you can use to pull that data into expressions. We'll use the technical term term when we want to talk in terms of these syntactic units. (Hmm, this could get confusing.) The first terms we'll talk about are variables . 2.3.1 Variables
There are variable types corresponding to each of the three data types
we mentioned. Each of these is introduced (grammatically speaking) by
what we call a "funny character". Scalar variables are always named
with an initial
Entire arrays or array slices (and also slices of hashes) are named with
Entire hashes are named by
Any of these nine constructs may serve as an lvalue , that is, they specify a location that you could assign a value to, among other things.[ 2 ]
In addition, subroutine calls are named with an initial
Every variable type has its own namespace. You can, without fear of
conflict, use the same name for a scalar variable, an array, or a hash
(or, for that matter, a filehandle, a subroutine name, a label, or your
pet llama). This means that
Since variable names always start with
Case
is
significant - Sometimes you want to name something indirectly. It is possible to replace an alphanumeric name with an expression that returns a reference to the actual variable (see Chapter 4, References and Nested Data Structures ). 2.3.2 Scalar ValuesWhether it's named directly or indirectly, or is just a temporary value on a stack, a scalar always contains a single value. This value may be a number,[ 4 ] a string,[ 5 ] or a reference to another piece of data. (Or there may be no value at all, in which case the scalar is said to be undefined .) While we might speak of a scalar as "containing" a number or a string, scalars are essentially typeless; there's no way to declare a scalar to be of type "number" or "string". Perl converts between the various subtypes as needed, so you can treat a number as a string or a string as a number, and Perl will do the Right Thing.[ 6 ]
While strings and numbers are interchangeable for nearly all intents and purposes, references are a bit different. They're strongly typed, uncastable[ 7 ] pointers with built-in reference-counting and destructor invocation. You can use them to create complex data types, including user-defined objects. But they're still scalars, for all that. See Chapter 4 for more on references.
2.3.2.1 Numeric literalsNumeric literals are specified in any of several customary[ 8 ] floating point or integer formats:
12345 # integer 12345.67 # floating point 6.02E23 # scientific notation 0xffff # hexadecimal 0377 # octal 4_294_967_296 # underline for legibility
Since Perl uses the comma as a list separator, you cannot use it to delimit
the triples in a large number. To improve legibility, Perl does allow you
to use an underscore character instead. The underscore only
works within literal numbers specified in your program, not for strings
functioning as numbers or data read from somewhere else. Similarly, the
leading
2.3.2.2 String literals
String literals are usually delimited by either single or double quotes.
They work much like UNIX shell quotes: double-quoted string literals
are subject to backslash and variable interpolation; single-quoted
strings are not (except for You can also embed newlines directly in your strings; that is, they can begin and end on different lines. This is nice for many reasons, but it also means that if you forget a trailing quote, the error will not be reported until Perl finds another line containing the quote character, which may be much further on in the script. Fortunately, this usually causes an immediate syntax error on the same line, and Perl is then smart enough to warn you that you might have a runaway string. Note that a single-quoted string must be separated from a preceding word by a space, since a single quote is a valid (though deprecated) character in an identifier; see Chapter 5 . With double-quoted strings, the usual C-style backslash rules apply for inserting characters such as newline, tab, and so on. You may also specify characters in octal and hexadecimal, or as control characters:
In addition, there are escape sequences to modify the case of subsequent characters, as with the substitution operator in the vi editor:
Besides the backslash escapes listed above, double-quoted strings are
subject to
variable interpolation
of scalar and list values.
This means that you can insert the values of certain variables directly
into a string literal. It's really just a handy form of string
concatenation.
Variable interpolation may only be done for scalar variables, entire arrays (but not
hashes), single elements from an array or hash, or slices (multiple
subscripts) of an array or hash. In other words, you may only interpolate
expressions that begin with
The following code segment prints out: "
$Price = '$100'; # not interpolated print "The price is $Price.\n"; # interpolated
As in some shells, you can put braces around the identifier to
distinguish it from following alphanumerics:
$days{'Feb'} can be written as:
$days{Feb} and the quotes will be assumed automatically. But anything more complicated in the subscript will be interpreted as an expression. Apart from the subscripts of interpolated array and hash variables, there are no multiple levels of interpolation. In particular, contrary to the expectations of shell programmers, backquotes do not interpolate within double quotes, nor do single quotes impede evaluation of variables when used within double quotes. 2.3.2.3 Pick your own quotesWhile we usually think of quotes as literal values, in Perl they function more like operators, providing various kinds of interpolating and pattern matching capabilities. Perl provides the customary quote characters for these behaviors, but also provides a way for you to choose your quote character for any of them.
Some of these are simply forms of "syntactic sugar" to let you avoid
putting too many backslashes into quoted strings. Any non-alphanumeric,
non-whitespace delimiter can be used in place of
$single = q!I said, "You said, 'She said it.'"!; $double = qq(Can't we get some "good" $variable?); $chunk_of_code = q { if ($condition) { print "Gotcha!"; } };
Finally, for two-string constructs like
tr [a-z] [A-Z]; 2.3.2.4 Or leave the quotes out entirelyA word that has no other interpretation in the grammar will be treated as if it were a quoted string. These are known as barewords .[ 12 ] For example:
@days = (Mon,Tue,Wed,Thu,Fri); print STDOUT hello, ' ', world, "\n";
sets the array
use strict 'subs'; then any bareword that would not be interpreted as a subroutine call produces a compile-time error instead. The restriction lasts to the end of the enclosing block. An inner block may countermand this by saying:
no strict 'subs'; Note that the bare identifiers in constructs like:
"${verb}able" $days{Feb} are not considered barewords, since they're allowed by explicit rule rather than by having "no other interpretation in the grammar". 2.3.2.5 Interpolating array valuesArray variables are interpolated into double-quoted strings by joining all the elements of the array with the delimiter specified in the $" variable[ 13 ] (which is a space by default). The following are equivalent:
$temp = join($",@ARGV); print $temp; print "@ARGV";
Within search patterns (which also undergo double-quotish interpolation)
there is a bad ambiguity: Is
2.3.2.6 "Here" documents
A line-oriented form of quoting is based on the shell's
here-document
syntax.[
15
]
Following a
print <<EOF; # same as earlier example The price is $Price. EOF print <<"EOF"; # same as above, with explicit quotes The price is $Price. EOF print <<'EOF'; # single-quoted quote All things (e.g. a camel's journey through A needle's eye) are possible, it's true. But picture how the camel feels, squeezed out In one long bloody thread, from tail to snout. -- C.S. Lewis EOF print << x 10; # print next line 10 times The camels are coming! Hurrah! Hurrah! print <<"" x 10; # the preferred way to write that The camels are coming! Hurrah! Hurrah! print <<`EOC`; # execute commands echo hi there echo lo there EOC print <<"dromedary", <<"camelid"; # you can stack them I said bactrian. dromedary She said llama. camelid Just don't forget that you have to put a semicolon on the end to finish the statement, as Perl doesn't know you're not going to try to do this:
print <<ABC 179231 ABC + 20; # prints 179251 2.3.2.7 Other literal tokens
Two special literals are __
The __ 2.3.3 ContextUntil now we've seen a number of terms that can produce scalar values. Before we can discuss terms further, though, we must come to terms with the notion of context . 2.3.3.1 Scalar and list context
Every operation[ 16 ] that you invoke in a Perl script is evaluated in a specific context, and how that operation behaves may depend on the requirements of that context. There are two major contexts: scalar and list . For example, assignment to a scalar variable evaluates the right-hand side in a scalar context, while assignment to an array or a hash (or slice of either) evaluates the right-hand side in a list context. Assignment to a list of scalars would also provide a list context to the right-hand side.
You will be miserable until you learn the difference between scalar and list context, because certain operators know which context they are in, and return lists in contexts wanting a list, and scalar values in contexts wanting a scalar. (If this is true of an operation, it will be mentioned in the documentation for that operation.) In computer lingo, the functions are overloaded on the type of their return value. But it's a very simple kind of overloading, based only on the distinction between singular and plural values, and nothing else.
Other operations
supply
the list contexts to their operands, and you
can tell which ones they are because they all have
Scalar context can be further classified into string context, numeric context, and don't-care context. Unlike the scalar versus list distinction we just made, operations never know which scalar context they're in. They simply return whatever kind of scalar value they want to, and let Perl translate numbers to strings in string context, and strings to numbers in numeric context. Some scalar contexts don't care whether a string or number is returned, so no conversion will happen. (This happens, for example, when you are assigning the value to another variable. The new variable just takes on the same subtype as the old value.) 2.3.3.2 Boolean context
One special scalar context is called
Boolean context
. Boolean context is
simply any place where an expression is being evaluated to see whether it's
true or false. We sometimes write true and false when we mean
the technical definition that Perl uses: a scalar value is
true if it is not the null string or the number 0 (or its string
equivalent, A Boolean context is a don't-care context in the sense that it never causes any conversions to happen (at least, no conversions beyond what scalar context would impose). We said that a null string is false, but there are actually two varieties of null scalars: defined and undefined. Boolean context doesn't distinguish between defined and undefined scalars. Undefined null scalars are returned when there is no real value for something, such as when there was an error, or at end of file, or when you refer to an uninitialized variable or element of an array. An undefined null scalar may become defined the first time you use it as if it were defined, but prior to that you can use the defined operator to determine whether the value is defined or not. (The return value of defined is always defined, but not always true.) 2.3.3.3 Void contextAnother peculiar kind of scalar context is the void context. This context not only doesn't care what the return value is, it doesn't even want a return value. From the standpoint of how functions work, it's no different from an ordinary scalar context. But if you use the -w command-line switch, the Perl compiler will warn you if you use an expression with no side effects in a place that doesn't want a value, such as in a statement that doesn't return a value. For example, if you use a string as a statement:
"Camel Lot"; you may get a warning like this:
Useless use of a constant in void context in myprog line 123; 2.3.3.4 Interpolative context
We mentioned that double-quoted literal strings do backslash
interpretation and variable interpolation, but the interpolative context
(often called "double-quote context") applies to more than just
double-quoted strings. Some other double-quotish constructs are the
generalized backtick operator The interpolative context only happens inside quotes, or things that work like quotes, so perhaps it's not fair to call it a context in the same sense as scalar and list context. (Then again, maybe it is.) 2.3.4 List Values and ArraysNow that we've talked about context, we can talk about list values, and how they behave in context. List values are denoted by separating individual values by commas (and enclosing the list in parentheses where precedence requires it):
(
In a list context, the value of the list literal is all the values of the list in order. In a scalar context, the value of a list literal is the value of the final element, as with the C comma operator, which always throws away the value on the left and returns the value on the right. (In terms of what we discussed earlier, the left side of the comma operator provides a void context.) For example:
@stuff = ("one", "two", "three");
assigns the entire list value to array
$stuff = ("one", "two", "three");
assigns only the value
@stuff = ("one", "two", "three"); $stuff = @stuff; # $stuff gets 3, not "three"
Until now we've pretended that
(@foo,@bar,&SomeSub)
contains all the elements of
The null list is represented by You may place an optional comma at the end of any list value. This makes it easy to come back later and add more elements.
@numbers = ( 1, 2, 3, ); Another way to specify a literal list is with the qw (quote words) syntax we mentioned earlier. This construct is equivalent to splitting a single-quoted string on whitespace. For example:
@foo = qw( apple banana carambola coconut guava kumquat mandarin nectarine peach pear persimmon plum ); (Note that those parentheses are behaving as quote characters, not ordinary parentheses. We could just as easily have picked angle brackets or braces or slashes.) A list value may also be subscripted like a normal array. You must put the list in parentheses (real ones) to avoid ambiguity. Examples:
# Stat returns list value. $modification_time = (stat($file))[9]; # SYNTAX ERROR HERE. $modification_time = stat($file)[9]; # OOPS, FORGOT PARENS # Find a hex digit. $hexdigit = ('a','b','c','d','e','f')[$digit-10]; # A "reverse comma operator". return (pop(@foo),pop(@foo))[0]; Lists may be assigned to if and only if each element of the list is legal to assign to:
($a, $b, $c) = (1, 2, 3); ($map{red}, $map{green}, $map{blue}) = (0x00f, 0x0f0, 0xf00); List assignment in a scalar context returns the number of elements produced by the expression on the right side of the assignment: $x = ( ($foo,$bar) = (7,7,7) ); # set $x to 3, not 2 $x = ( ($foo,$bar) = f() ); # set $x to f()'s return count This is handy when you want to do a list assignment in a Boolean context, since most list functions return a null list when finished, which when assigned produces a 0, which is interpreted as false. The final list element may be an array or a hash: ($a, $b, @rest) = split; my ($a, $b, %rest) = @arg_list; You can actually put an array or hash anywhere in the list you assign to, but the first one in the list will soak up all the values, and anything after it will get an undefined value. This may be useful in a local or my , where you probably want the arrays initialized to be empty anyway.
You may find the number of elements in the array @days + 0; # implicitly force @days into a scalar context scalar(@days) # explicitly force @days into a scalar context Note that this only works for arrays. It does not work for list values in general. A comma-separated list evaluated in a scalar context will return the last value, like the C comma operator.
Closely related to the scalar evaluation of
@whatever = (); $#whatever = -1; And the following is always true:[ 20 ]
scalar(@whatever) == $#whatever + 1; 2.3.5 Hashes (Associative Arrays)As we indicated previously, a hash is just a funny kind of array in which you look values up using key strings instead of numbers. It defines associations between keys and values, so hashes are often called associative arrays. There really isn't any such thing as a hash literal in Perl, but if you assign an ordinary list to a hash, each pair of values in the list will be taken to indicate one key/value association:
%map = ('red',0x00f,'green',0x0f0,'blue',0xf00); This has the same effect as:
%map = (); # clear the hash first $map{red} = 0x00f; $map{green} = 0x0f0; $map{blue} = 0xf00;
It is often more readable to use the
%map = ( red => 0x00f, green => 0x0f0, blue => 0xf00, ); or for initializing anonymous hash references to be used as records:
$rec = { witch => 'Mable the Merciless', cat => 'Fluffy the Ferocious', date => '10/31/1776', }; or for using call-by-named-parameter to invoke complicated functions:
$field = $query->radio_group( NAME => 'group_name', VALUES => ['eenie','meenie','minie'], DEFAULT => 'meenie', LINEBREAK => 'true', LABELS => \%labels, ); But we're getting ahead of ourselves. Back to hashes.
You can use a hash variable (
If you evaluate a hash variable in a scalar context, it returns a value
that is true if and only if the hash contains any key/value pairs. (If
there are any key/value pairs, the value returned is a string consisting
of the number of used buckets and the number of allocated buckets,
separated by a slash. This is pretty much only useful to find out
whether Perl's (compiled in) hashing algorithm is performing poorly on
your data set. For example, you stick 10,000 things in a hash, but
evaluating 2.3.6 Typeglobs and Filehandles
Perl uses an internal type called a
typeglob
to hold an entire
symbol table entry. The type prefix of a typeglob is a Typeglobs (or references thereto) are still used for passing or storing filehandles. If you want to save away a filehandle, do it this way:
$fh = *STDOUT; or perhaps as a real reference, like this:
$fh = \*STDOUT; This is also the way to create a local filehandle. For example:
sub newopen { my $path = shift; local *FH; # not my! open (FH, $path) || return undef; return *FH; } $fh = newopen('/etc/passwd'); See the open entry in Chapter 3 and the FileHandle module in Chapter 7 , for how to generate new filehandles. But the main use of typeglobs nowadays is to alias one symbol table entry to another symbol table entry. If you say:
*foo = *bar;
it makes everything named "
*foo = \$bar;
makes 2.3.7 Input OperatorsThere are several input operators we'll discuss here because they parse as terms. In fact, sometimes we call them pseudo-literals because they act like quoted strings in many ways. (Output operators like print parse as list operators and are discussed in Chapter 3 .) 2.3.7.1 Command input (backtick) operatorFirst of all, we have the command input operator, also known as the backticks operator, because it looks like this:
$info = `finger $user`; A string enclosed by backticks (grave accents) first undergoes variable interpolation just like a double-quoted string. The result of that is then interpreted as a command by the shell, and the output of that command becomes the value of the pseudo-literal. (This is modeled after a similar operator in some of the UNIX shells.) In scalar context, a single string consisting of all the output is returned. In list context, a list of values is returned, one for each line of output. (You can set $/ to use a different line terminator.)
The command is executed each time the pseudo-literal is evaluated.
The numeric status value of the command is saved in
$?
(see the section
"Special Variables" later in this chapter for the interpretation of
$?
). Unlike the
csh
version
of this command, no translation is done on the return
data - newlines remain newlines. Unlike any of the shells, single
quotes do not hide variable names in the command from interpretation.
To pass a
The generalized form of backticks is 2.3.7.2 Line input (angle) operator
The most heavily used input operator is the line input operator, also
known as the angle operator.
Evaluating a filehandle in angle brackets (
while (defined($_ = <STDIN>)) { print $_; } # the long way while (<STDIN>) { print; } # the short way for (;<STDIN>;) { print; } # while loop in disguise print $_ while defined($_ = <STDIN>); # long statement modifier print while <STDIN>; # short statement modifier Remember that this special magic requires a while loop. If you use the input operator anywhere else, you must assign the result explicitly if you want to keep the value:
if (<STDIN>) { print; } # WRONG, prints old value of $_ if ($_ = <STDIN>) { print; } # okay
The filehandles
In the while loops above, we were evaluating the line input operator in a scalar context, so it returned each line separately. However, if you use it in a list context, a list consisting of all the remaining input lines is returned, one line per list element. It's easy to make a large data space this way, so use this feature with care:
$one_line = <MYFILE>; # Get first line. @all_lines = <MYFILE>; # Get the rest of the lines. There is no while magic associated with the list form of the input operator, because the condition of a while loop is always a scalar context (as is any conditional).
Using the null filehandle within the angle operator is special and can be used to
emulate the command-line behavior of typical UNIX filter programs such as
sed
and
awk
. When you read
lines from
Here's how it works: the first time
while (<>) { ... # code for each line } is equivalent to the following Perl-like pseudocode:
@ARGV = ('-') unless @ARGV; while ($ARGV = shift) { open(ARGV, $ARGV) or warn "Can't open $ARGV: $!\n"; while (<ARGV>) { ... # code for each line } }
except that it isn't so cumbersome to say, and will actually work. It
really does shift array
@ARGV
and put
the current filename into variable
You can modify
@ARGV
before the first
If you want to set @ARGV to your own list of files, go right ahead. If you want to pass switches into your script, you can use one of the Getopts modules or put a loop on the front like this:
while ($_ = $ARGV[0], /^-/) { shift; last if /^--$/; if (/^-D(.*)/) { $debug = $1 } if (/^-v/) { $verbose++ } ... # other switches } while (<>) { ... # code for each line }
The
If the string inside the angle brackets is a scalar
variable (for example,
$fh = \*STDIN; $line = <$fh>; 2.3.7.3 Filename globbing operatorYou might wonder what happens to a line input operator if you put something fancier inside the angle brackets. What happens is that it mutates into a different operator. If the string inside the angle brackets is anything other than a filehandle name or a scalar variable (even if there are just extra spaces), it is interpreted as a filename pattern to be "globbed".[ 22 ] The pattern is matched against the files in the current directory (or the directory specified as part of the glob pattern), and the filenames so matched are returned by the operator. As with line input, the names are returned one at a time in scalar context, or all at once in list context. In fact, the latter usage is more prevalent. You generally see things like:
my @files = <*.html>;
As with other kinds of pseudo-literals, one level of variable
interpolation is done first, but you can't say Whether you use the glob function or the old angle-bracket form, the globbing operator also does while magic like the line input operator, and assigns the result to $_ . For example:
while (<*.c>) { chmod 0644, $_; } is equivalent to:
open(FOO, "echo *.c | tr -s ' \t\r\f' '\\012\\012\\012\\012'|"); while (<FOO>) { chop; chmod 0644, $_; } In fact, it's currently implemented that way, more or less. (Which means it will not work on filenames with spaces in them unless you have csh (1) on your machine.) Of course, the shortest way to do the above is:
chmod 0644, <*.c>;
Because globbing invokes a subshell, it's often faster to call
readdir
yourself and just do your own
grep
on the filenames. Furthermore,
due to its current implementation of using a shell, the
glob
routine may get " A glob evaluates its (embedded) argument only when it is starting a new list. All values must be read before it will start over. In a list context this isn't important, because you automatically get them all anyway. In a scalar context, however, the operator returns the next value each time it is called, or a false value if you've just run out. Again, false is returned only once. So if you're expecting a single value from a glob, it is much better to say:
($file) = <blurch*>; # list context than to say:
$file = <blurch*>; # scalar context because the former slurps all the matched filenames and resets the operator, while the latter will alternate between returning a filename and returning false. It you're trying to do variable interpolation, it's definitely better to use the glob operator, because the older notation can cause people to become confused with the indirect filehandle notation. But with things like this, it begins to become apparent that the borderline between terms and operators is a bit mushy:
@files = glob("$dir/*.[ch]"); # call glob as function @files = glob $some_pattern; # call glob as operator We left the parentheses off of the second example to illustrate that glob can be used as a unary operator; that is, a prefix operator that takes a single argument. The glob operator is an example of a named unary operator , which is just one of the kinds of operators we'll talk about in the section "Operators" later in this chapter. But first we're going to talk about pattern matching operations, which also parse like terms but operate like operators. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|