Semantics (Programming Perl)

6.2. Semantics

Before you get too worked up over all that syntax, just remember that the normal way to define a simple subroutine ends up looking like this:

sub razzle {
    print "Ok, you've been razzled.\n";
}

and the normal way to call it is simply:

razzle();

In this case, we ignored inputs (arguments) and outputs (return values). But the Perl model for passing data into and out of a subroutine is really quite simple: all function parameters are passed as one single, flat list of scalars, and multiple return values are likewise returned to the caller as one single, flat list of scalars. As with any LIST, any arrays or hashes passed in these lists will interpolate their values into the flattened list, losing their identities--but there are several ways to get around this, and the automatic list interpolation is frequently quite useful. Both parameter lists and return lists may contain as many or as few scalar elements as you'd like (though you may put constraints on the parameter list by using prototypes). Indeed, Perl is designed around this notion of variadic functions (those taking any number of arguments), unlike C, where they're sort of grudgingly kludged in so that you can call printf(3).

Now, if you're going to design a language around the notion of passing varying numbers of arbitrary arguments, you'd better make it easy to process those arbitrary lists of arguments. Any arguments passed to a Perl routine come in as the array @_. If you call a function with two arguments, they are accessible inside the function as the first two elements of that array: $_[0] and $_[1]. Since @_ is a just a regular array with an irregular name, you can do anything to it you'd normally do to an array.[2] The array @_ is a local array, but its values are aliases to the actual scalar parameters. (This is known as pass-by-reference semantics.) Thus you can modify the actual parameters if you modify the corresponding element of @_. (This is rarely done, however, since it's so easy to return interesting values in Perl.)

[2] This is an area where Perl is more orthogonal than the typical programming language.

The return value of the subroutine (or of any other block, for that matter) is the value of the last expression evaluated. Or you may use an explicit return statement to specify the return value and exit the subroutine from any point in the subroutine. Either way, as the subroutine is called in a scalar or list context, so also is the final expression of the routine evaluated in that same scalar or list context.

6.2.1. Tricks with Parameter Lists

Perl does not yet have named formal parameters, but in practice all you do is copy the values of @_ to a my list, which serves nicely for a list of formal parameters. (Not coincidentally, copying the values changes the pass-by-reference semantics into pass-by-value, which is how people usually expect parameters to work anyway, even if they don't know the fancy computer science terms for it.) Here's a typical example:

sub maysetenv {
    my ($key, $value) = @_;
    $ENV{$key} = $value unless $ENV{$key};
}

But you aren't required to name your parameters, which is the whole point of the @_ array. For example, to calculate a maximum, you can just iterate over @_ directly:

sub max {
    my $max = shift(@_);
    for my $item (@_) {
        $max = $item if $max < $item;
    }
    return $max;
}

$bestday = max($mon,$tue,$wed,$thu,$fri);

Or you can fill an entire hash at once:

sub configuration {
    my %options = @_;
    print "Maximum verbosity.\n" if $options{VERBOSE} == 9;
}

configuration(PASSWORD => "xyzzy", VERBOSE => 9, SCORE => 0);

Here's an example of not naming your formal arguments so that you can modify your actual arguments:

upcase_in($v1, $v2);  # this changes $v1 and $v2
sub upcase_in {
    for (@_) { tr/a-z/A-Z/ }
}

You aren't allowed to modify constants in this way, of course. If an argument were actually a scalar literal like "hobbit" or read-only scalar variable like $1, and you tried to change it, Perl would raise an exception (presumably fatal, possibly career-threatening). For example, this won't work:

upcase_in("frederick");

It would be much safer if the upcase_in function were written to return a copy of its parameters instead of changing them in place:

($v3, $v4) = upcase($v1, $v2);
sub upcase {
    my @parms = @_;
    for (@parms) { tr/a-z/A-Z/ }
    # Check whether we were called in list context.
    return wantarray ? @parms : $parms[0];
}

Notice how this (unprototyped) function doesn't care whether it was passed real scalars or arrays. Perl will smash everything into one big, long, flat @_ parameter list. This is one of the places where Perl's simple argument-passing style shines. The upcase function will work perfectly well without changing the upcase definition even if we feed it things like this:

@newlist = upcase(@list1, @list2);
@newlist = upcase( split /:/, $var );

Do not, however, be tempted to do this:

(@a, @b) = upcase(@list1, @list2);   # WRONG

Why not? Because, like the flat incoming parameter list in @_, the return list is also flat. So this stores everything in @a and empties out @b by storing the null list there. See the later section Section 6.3, "Passing References" for alternatives.

6.2.2. Error Indications

If you want your function to return in such a way that the caller will realize there's been an error, the most natural way to do this in Perl is to use a bare return statement without an argument. That way when the function is used in scalar context, the caller gets undef, and when used in list context, the caller gets a null list.

Under extraordinary circumstances, you might choose to raise an exception to indicate an error. Use this measure sparingly, though; otherwise, your whole program will be littered with exception handlers. For example, failing to open a file in a generic file-opening function is hardly an exceptional event. However, ignoring that failure might well be. The wantarray built-in returns undef if your function was called in void context, so you can tell if you're being ignored:

if ($something_went_awry) {
    return if defined wantarray;  # good, not void context.
    die "Pay attention to my error, you danglesocket!!!\n";
}

6.2.3. Scoping Issues

Subroutines may be called recursively because each call gets its own argument array, even when the routine calls itself. If a subroutine is called using the & form, the argument list is optional. If the & is used but the argument list is omitted, something special happens: the @_ array of the calling routine is supplied implicitly. This is an efficiency mechanism that new users may wish to avoid.

&foo(1,2,3);    # pass three arguments
foo(1,2,3);     # the same

foo();          # pass a null list
&foo();         # the same

&foo;           # foo() gets current args, like foo(@_), but faster!
foo;            # like foo() if sub foo predeclared, else bareword "foo"

Not only does the & form make the argument list optional, but it also disables any prototype checking on the arguments you do provide. This is partly for historical reasons and partly to provide a convenient way to cheat if you know what you're doing. See the section Section 6.4, "Prototypes" later in this chapter.

Variables you access from inside a function that haven't been declared private to that function are not necessarily global variables; they still follow the normal block-scoping rules of Perl. As explained in Section 6.5, "Names" in Chapter 2, "Bits and Pieces", this means they look first in the surrounding lexical scope (or scopes) for resolution, then on to the single package scope. From the viewpoint of a subroutine, then, any my variables from an enclosing lexical scope are still perfectly visible.

For example, the bumpx function below has access to the file-scoped $x lexical variable because the scope where the my was declared--the file itself--hasn't been closed off before the subroutine is defined:

# top of file
my $x = 10;         # declare and initialize variable
sub bumpx { $x++ }  # function can see outer lexical variable

C and C++ programmers would probably think of $x as a "file static" variable. It's private as far as functions in other files are concerned, but global from the perspective of functions declared after the my. C programmers who come to Perl looking for what they would call "static variables" for files or functions find no such keyword in Perl. Perl programmers generally avoid the word "static", because static systems are dead and boring, and because the word is so muddled in historical usage.

Although Perl doesn't include the word "static" in its lexicon, Perl programmers have no problem creating variables that are private to a function and persist across function calls. There's just no special word for these. Perl's richer scoping primitives combine with automatic memory management in ways that someone looking for a "static" keyword might never think of trying.

Lexical variables don't get automatically garbage collected just because their scope has exited; they wait to get recycled until they're no longer used, which is much more important. To create private variables that aren't automatically reset across function calls, enclose the whole function in an extra block and put both the my declaration and the function definition within that block. You can even put more than one function there for shared access to an otherwise private variable:

{
    my $counter = 0;
    sub next_counter { return ++$counter }
    sub prev_counter { return --$counter }
}

As always, access to the lexical variable is limited to code within the same lexical scope. The names of the two functions, on the other hand, are globally accessible (within the package), and, since they were defined inside $counter's scope, they can still access that variable even though no one else can.

If this function is loaded via require or use, then this is probably just fine. If it's all in the main program, you'll need to make sure any run-time assignment to my is executed early enough, either by putting the whole block before your main program, or alternatively, by placing a BEGIN or INIT block around it to make sure it gets executed before your program starts:

BEGIN {
    my @scale = ('A' .. 'G');
    my $note  = -1;
    sub next_pitch { return $scale[ ($note += 1) %= @scale ] };
}

The BEGIN doesn't affect the subroutine definition, nor does it affect the persistence of any lexicals used by the subroutine. It's just there to ensure the variables get initialized before the subroutine is ever called. For more on declaring private and global variables, see my and our respectively in Chapter 29, "Functions". The BEGIN and INIT constructs are explained in Chapter 18, "Compiling".