5. Hashes

Contents:
Introduction
Adding an Element to a Hash
Testing for the Presence of a Key in a Hash
Deleting from a Hash
Traversing a Hash
Printing a Hash
Retrieving from a Hash in Insertion Order
Hashes with Multiple Values Per Key
Inverting a Hash
Sorting a Hash
Merging Hashes
Finding Common or Different Keys in Two Hashes
Hashing References
Presizing a Hash
Finding the Most Common Anything
Representing Relationships Between Data
Program: dutree

Doing linear scans over an associative array is like trying to club someone to death with a loaded Uzi.

- Larry Wall

5.0. Introduction

People and parts of computer programs interact in all sorts of ways. Single scalar variables are like hermits, living a solitary existence whose only meaning comes from within the individual. Arrays are like cults, where multitudes marshal themselves under the name of a charismatic leader. In the middle lies the comfortable, intimate ground of the one-to-one relationship that is the hash. (Older documentation for Perl often called hashes associative arrays , but that's a mouthful. Other languages that support similar constructs sometimes use different terms for them; you may hear about hash tables , tables , dictionaries , mappings , or even alists , depending on the language.)

Unfortunately, this isn't a relationship of equals. Hashes are an of relationship, like saying "Andy is the boss of Nat," "The blood pressure of our patient is 112/62," and "The name of journal ISSN 1087-903X is The Perl Journal ." Hashes only give convenient ways to access values for "Nat's boss" and "1087-903X's name"; you can't ask "Whose boss is Andy?" Finding the answer to that question is a recipe in this chapter.

Fortunately, hashes have their benefits, just like relationships. Hashes are a built-in data type in Perl. Their use reduces many complex algorithms to simple variable accesses. They are also fast and convenient ways to build indices and quick lookup tables.

It's time to put a name to these notions. The relationship embodied in a hash is a good thing to use for its name. For instance, the relationships in the examples above are boss of , blood pressure of , and name of . We'd give them Perl names %boss , %blood_ pressure , and %name . Where a lone scalar has $ as its type identifier and an entire array has @ , a hash has % .

Only use the % when referring to the hash as a whole, such as %boss . When referring to the value for a key, it's a single scalar value and so a $ is called for, just as when referring to one element of an array you also use a $ . This means that "the boss of Nat" would be written as $boss{"Nat"} .

A regular array uses whole numbers for indices, but the indices of a hash are always strings. Its values may be any arbitrary scalar values, including references. Using references as values, you can create hashes that hold not merely strings or numbers, but also arrays, other hashes, or objects. (Or rather, references to arrays, hashes, or objects.)

A hash can be initialized with a list, where elements of the list are key and value pairs:

%age = ( "Nat",   24,
         "Jules", 25,
         "Josh",  17  );

This is equivalent to:

$age{"Nat"}   = 24;
$age{"Jules"} = 25;
$age{"Josh"}  = 17;

To make it easier to read and write hash initializations, the => operator, sometimes known as a comma arrow , was created. Mostly it behaves as a better-looking comma. For example, you can write a hash initialization this way:

%food_color = (
               "Apple"  => "red",
               "Banana" => "yellow",
               "Lemon"  => "yellow",
               "Carrot" => "orange"
              );

(This particular hash is used in many examples in this chapter.) This initialization is also an example of hash-list equivalence - hashes behave in some ways as though they were lists of key-value pairs. We'll use this in a number of recipes, including the merging and inverting recipes.

Unlike a regular comma, the comma arrow has a special property: It quotes any word preceding it, which means you can safely omit the quotes and improve legibility. Single-word hash keys are also automatically quoted, which means you can write $hash{somekey} instead of $hash{"somekey"} . You could rewrite the preceding initialization of %food_color as:

%food_color = (
                Apple  => "red",
                Banana => "yellow",
                Lemon  => "yellow",
                Carrot => "orange"
               );

One important issue to be aware of regarding hashes is that their elements are stored in an internal order convenient for efficient retrieval. This means that no matter what order you insert your data, it will come out in an unpredictable disorder.

5. Hashes

5.0. Introduction

See Also