Built-in Data Types (Programming Perl)

2.3. Built-in Data Types

Before we start talking about various kinds of tokens you can build from characters, we need a few more abstractions. To be specific, we need three data types.

Computer languages vary in how many and what kinds of data types they provide. Unlike some commonly used languages that provide many confusing types for similar kinds of values, Perl provides just a few built-in data types. Consider C, in which you might run into char, short, int, long, long long, bool, wchar_t, size_t, off_t, regex_t, uid_t, u_longlong_t, pthread_key_t, fp_exception_field_type, and so on. That's just some of the integer types! Then there are floating-point numbers, and pointers, and strings.

All these complicated types correspond to just one type in Perl: the scalar. (Usually Perl's simple data types are all you need, but if not, you're free to define fancy dynamic types using Perl's object-oriented features--see Chapter 12, "Objects".) Perl's three basic data types are: scalars, arrays of scalars, and hashes of scalars (also known as associative arrays). Some people may prefer to call these data structures rather than types. That's okay.

Scalars are the fundamental type from which more complicated structures are built. A scalar stores a single, simple value--typically a string or a number. Elements of this simple type may be combined into either of the two aggregate types. An array is an ordered list of scalars that you access with an integer subscript (or index). All indexing in Perl starts at 0. Unlike many programming languages, however, Perl treats negative subscripts as valid: instead of counting from the beginning, negative subscripts count back from the end of whatever it is you're indexing into. (This applies to various substring and sublist operations as well as to regular subscripting.) A hash, on the other hand, is an unordered set of key/value pairs that you access using strings (the keys) as subscripts to look up the scalars (the values) corresponding to a given key. Variables are always one of these three types. (Other than variables, Perl also has other abstractions that you can think of as data types, such as filehandles, directory handles, formats, subroutines, symbol tables, and symbol table entries.)

Abstractions are wonderful, and we'll collect more of them as we go along, but they're also useless in a way. You can't do anything with an abstraction directly. That's why computer languages have syntax. We need to introduce you to the various kinds of syntactic terms you can use to pull your abstract data into expressions. We like to use the technical term term when we want to talk in terms of these syntactic units. (Hmm, this could get terminally confusing. Just remember how your math teacher used to talk about the terms of an equation, and you won't go terribly wrong.)

Just like the terms in a math equation, the purpose of most terms in Perl is to produce values for operators like addition and multiplication to operate on. Unlike in a math equation, however, Perl has to do something with the values it calculates, not just think with a pencil in its hand about whether the two sides of the equation are equal. One of the most common things to do with a value is to store it somewhere:

$x = $y;

That's an example of the assignment operator (not the numeric equality operator, which is spelled == in Perl). The assignment gets the value from $y and puts it into $x. Notice that we aren't using the term $x for its value; we're using it for its location. (The old value of $x gets clobbered by the assignment.) We say that $x is an lvalue, meaning it's the sort of storage location we can use on the left side of an assignment. We say that $y is an rvalue because it's used on the right side.

There's also a third kind of value, called a temporary value, that you need to understand if you want to know what Perl is really doing with your lvalues and rvalues. If we do some actual math and say:

$x = $y + 1;

Perl takes the rvalue $y and adds the rvalue 1 to it, which produces a temporary value that is eventually assigned to the lvalue $x. It may help you to visualize what is going on if we tell you that Perl stores these temporary values in an internal structure called a stack.[4] The terms of an expression (the ones we're talking about in this chapter) tend to push values onto the stack, while the operators of the expression (which we'll discuss in the next chapter) tend to pop them back off the stack, perhaps leaving another temporary result on the stack for the next operator to work with. The pushes and pops all balance out--by the time the expression is done, the stack is entirely empty (or as empty as it was when we started). More about temporary values later.

[4] A stack works just like one of those spring-loaded plate dispensers you see in a buffet restaurant--you can push plates onto the top of the stack, or you can pop them off again (to use the Comp. Sci. vernacular).

Some terms can only be rvalues, such as the 1 above, while others can serve as either lvalues or rvalues. In particular, as the assignments above illustrate, a variable may function as either. And that's what our next section is about.