While most of the user information is nicely represented in textual form, other system information is more naturally represented in other forms. For example, the IP address of an interface is internally managed as a four-byte number. While it is frequently decoded into a textual representation consisting of four small integers separated by periods, this encoding and decoding is wasted effort if a human is not interpreting the data in the meantime.
As a result, the network routines in Perl that expect or return an IP address use a four-byte string that contains one character for each sequential byte in memory. While constructing and interpreting such a byte-string is fairly straightforward using
(not presented here), Perl provides a short cut that is equally applicable to more difficult structures.
function works a bit like
, taking a format control string and a list of values, and creating a single string from those values. The
format string is geared towards creating a binary data structure, however. For example, to take four small integers and pack them as successive unsigned bytes in a composite string, use the following format:
$buf = pack("CCCC", 140, 186, 65, 25);
format string is four
represents a separate value taken from the following list (similar to what a
field does in
format (according to the Perl manpages, the reference card,
, the HTML files, or even
Perl: The Motion Picture
) refers to a single byte computed from an unsigned character value (a small integer). The resulting string in
is a four-character string - each character being one byte from the four values
Similarly, the format
generates a signed long value. On many machines, this is a four-byte number, although this format is machine-dependent. On a four-byte
machine, the statement:
$buf = pack("l",0x41424344);
generates a four-character string that looks like either
, depending on whether the machine is
big-endian. These results occur because we are packing one value into four characters (the length of a long integer), and the one value just happens to be composed of the bytes representing the ASCII values for the first four letters of the alphabet. Similarly:
$buf = pack("ll", 0x41424344, 0x45464748);
creates an eight-byte string consisting of
, once again depending on whether the machine is little- or big-endian.
The exact list of the various pack formats is given in the reference documentation (
). You'll see a few here as examples, but we're not going to list them all.
What if you were given the
and were told that it was really the memory image (one character is one byte) of two long (four-byte) signed values? How would you interpret it? Well, you'd need to do the inverse of
. This function takes a format control string (usually identical to the one you'd give
) and a data string, and returns a list of values that make up the memory image defined in the data string. For example, let's take that string apart:
($val1,$val2) = unpack("ll","ABCDEFGH");
This statement gives us back something like
, or possibly
instead (depending on big-endian-ness). In fact, by the values that come back, we can determine if we are on a little- or big-endian machine.
Whitespace in the format control string is ignored, and can be used for readability. A number in the format control string generally repeats the previous specification that many times. For example,
can also be written
with no change in meaning. (A few of the specifications use a trailing number as a part of the specification, and thus cannot be multiplied in this manner.)
A format character can also be followed by a
, which repeats the format character enough times to swallow up the rest of the list or the rest of the binary image string (depending on whether you are packing or unpacking). So, another way to pack four unsigned characters into a string is:
$buf = pack("C*", 140, 186, 65, 25);
The four values here are swallowed up by the one format specification. If you had wanted two short integers followed by "as many unsigned chars as possible," you could say something like:
$buf = pack("s2 C*", 3141, 5926, 5, 3, 5, 8, 9, 7, 9, 3, 2)
Here, we take the first two values as shorts (generating four or eight characters, probably) and the remaining nine values as unsigned characters (generating nine characters, almost certainly).
Going in the other direction,
with an asterisk specification can generate a list of elements of unpredetermined length. For example, unpacking with
creates one list element (a number) for each string character. Therefore, this statement:
@values = unpack("C*", "hello, world!\n");
yields a list of 14 elements, one for each of the characters of the string.