home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


15.2 Extracting and Replacing a Substring

Pulling out a piece of a string can be done with careful application of regular expressions, but if the piece is always at a known character position, this method is inefficient. Instead, you should use substr . This function takes three arguments: a string value, a start position (measured as with index ), and a length, like so:

$s = substr(

$string

,

$start

,

$length

);

The start position works like index : the first character is zero, the second character is one, and so on. The length is the number of characters to grab at that point: a length of zero means no characters, one means get the first character, two means two characters, and so on. ( substr stops at the end of the string, so if you ask for too many characters, don't worry.) substr looks like this:

$hello = "hello, world!";
$grab  = substr($hello, 3, 2);   # $grab gets "lo"
$grab  = substr($hello, 7, 100); # 7 to end, or "world!"

You could even create a " 10 to the power of " operator for small integer powers, as in:

$big = substr("10000000000",0,$power+1); # 10**$power

If the count of characters is zero, an empty string is returned. If either the starting position or ending position is less than zero, the position is counted that many characters from the end of the string. So -1 for a start position and 1 (or more) for the length gives you the last character. Similarly, -2 for a start position starts with the second-to-last character. The following example illustrates the point:

$stuff = substr("a very long string",-3,3); # last three chars
$stuff = substr("a very long string",-3,1); # the letter "i"

If the starting position is before the beginning of the string (like a huge negative number bigger than the length of the string), the beginning of the string is the start position (as if you had used zero for a starting position). If the start position is a huge positive number, the empty string is always returned. In other words, substr probably does what you expect it to do, as long as you expect it to always return something other than an error.

Omitting the length argument provides the same result as including a huge number for that argument - grabbing everything from the selected position to the end of the string.[ 1 ]

[1] Very old Perl versions did not allow the third argument to be omitted, leading to the use of a huge number for that argument by pioneer Perl programmers. You may come across such cases in your Perl archeological expeditions.

If the first argument to substr is a scalar variable (in other words, it could appear on the left side of an assignment operator), then the substr itself could appear on the left side of an assignment operator. This case may look strange if you come from a C background, but if you've ever played with some dialects of BASIC, it's quite normal.

What gets changed as the result of such an assignment is the part of the string that would have been returned had the substr been used on the right-hand side of the expression instead. In other words, substr($var,3,2) returns the fourth and fifth characters (starting at 3 , for a count of 2 ), so assigning a value to substr($var,3,2) changes those two characters as shown:

$hw = "hello world!";
substr($hw, 0, 5) = "howdy"; # $hw is now "howdy world!"

The length of the replacement text (what gets assigned into the substr ) doesn't have to be the same as the text it is replacing, as shown in this example. The string will automatically grow or shrink as necessary to accommodate the text. Here's an example in which the string gets shorter:

substr($hw, 0, 5) = "hi"; # $hw is now "hi world!"

and here's an example that makes the string longer:

substr($hw, -6, 5) = "nationwide news"; # replaces "world"

The shrinking and growing are fairly efficient, so don't worry about using them arbitrarily, although replacing a string with a string of equal length is a faster solution.