A.6. Answers to Chapter 7 Exercises
-
Here's one way to do it:
/fred/
Of course, you have to put that into the test program! This is pretty
simple. The more important part of this exercise is trying it out on
the sample strings. It doesn't match Fred,
showing that regular expressions are case-sensitive. (We'll see
how to change that later.) It does match frederick
and Alfred, since both of those strings contain
the four-letter string fred.. (Matching whole
words only, so that frederick and
Alfred wouldn't match, is another feature
we'll see later.)
If the test program is working correctly,[388] it should show those two
matches as something like |<fred>erick| and
|Al<fred>|, using the angle brackets to show
where fred was found inside each string.
-
Here's one way to do it:
/a+b*/
That matches the letter a one or more times
(that's the plus), followed by b zero or
more times (that's the star). Well, that's what the
exercise asked for, but you may have come up with something
different. After all, if you're looking for
any number of b's, you
know you'll always find what you're looking for. So you
could have written /a+/ instead, and matched the
same strings.[389]
For that matter, when you want to match one or more
a's, you know that the match will succeed
when you find even the first one. So, /a/ will
match the same set of strings as the first two. The description
"any string containing at least one a
followed by any number of b's" means
the exact same thing as "any string containing
a." Of the sample strings, this matches all
of them except fred.
There are even more ways to make this pattern than we show here.
Often, in trying to write a pattern, you will need to decide which
one of many possible patterns best suits your needs.
-
Here's one way to do it:
/\\*\**/
That's what the text asked for: a backslash (typed twice, since
we mean a real backslash[390]) zero or more times
(that's the first star), followed by an asterisk (backslashed,
since star is a metacharacter) zero or more times (that's the
last star). Whew!
And what about the sample strings? Did it match any of them? You bet:
it matches all of them! It's because the backslashes and
asterisks aren't required in the pattern; that is, this pattern
can match the empty string. Here's a rule you can rely upon:
when a pattern may freely match the empty
string, it'll always match, since the
empty string can be found in any string. In fact, it'll always
match in the first place that you look.
So, this pattern matches all four characters in
\\**, as you'd expect. It matches the empty
string at the beginning of fred, which you may not
have expected. In the string barney \\\***, it
matches the empty string at the beginning. You might wish it would
hunt down the backslashes and stars at the end of that string, but it
doesn't bother. It looks at the beginning, sees zero
backslashes followed by zero asterisks, declares the match a success,
and goes home to watch television. And in *wilma\,
it matches just the star at the beginning; as you see, this pattern
never gets away from the beginning of the string, since it always
matches at the first opportunity.
Now, if someone asked you for a pattern to match any number of
backslashes followed by any number of asterisks, you'd be
technically correct to give them this one. But chances are,
that's not what they really wanted. Spoken languages like
English may be ambiguous and not say exactly what they mean, but
regular expressions always mean exactly what they say they mean.
In this case, maybe the person who asked for the pattern forgot to
say that he or she always wants to match at least one character, when
the pattern matches at all. We can do that. If there's at least
one backslash, /\\+\**/ will match. (That's
just like what we had before, but there's a plus in place of
the first star, meaning one or more backslashes.) If there's
not at least one backslash, then in order to match at least one
character, we'll need at least one asterisk, so we want
/\*+/. When you put those two possibilities
together, you get:
/\\+\**|\*+/
Ugly, isn't it? Regular expressions are powerful but not
beautiful. And they've contributed to Perl being maligned as a
"write-only language." To be sure that no one criticizes
your code in that way, though, it's kind to put an explanatory
comment near any pattern that's not obvious. On the other hand,
when you've been using these for a year, you will have a
different definition of "obvious" than you have today.
How does this new pattern work with the sample strings? With
\\**, it matches all four characters, just like
the last one. It won't match fred, which is
probably the right behavior given the problem description. For
barney \\\***, it matches the six characters at
the end, as you hoped. And for *wilma\, it matches
the asterisk at the beginning.
-
Here's one way to do it:
while (<>) {
if (/wilma/) {
print;
}
}
This is a grep-like program. For each line of text
(contained in $_), we check to see whether the
pattern matches. If it matches, we print it. This program uses
print's default: if you don't tell
it to print something else, it prints $_. So we
have written a program that uses $_ all the way
through, but never mentions it anywhere. Perl folks love to use the
defaults and save time typing, so you'll see a lot of programs
that do this.
And if, for extra credit, you wanted to match a capitalized
Wilma as well, /wilma|Wilma/
would do the job. Or, more simply, you could have written
/(w|W)ilma/. People who have used other regular
expression implementations and already know about character classes,
which we'll discuss in the next chapter, could make that last
one even shorter (and more efficient).[391]
-
Here's one way to do it:
while (<>) {
if (/wilma/) {
if (/fred/) {
print;
}
}
}
This tests /fred/ only after we find
/wilma/ matches, but fred could
appear before or after wilma in the line; each
test is independent of the other.
If you wanted to avoid the extra nested if test,
you might have written something like this:[392]
while (<>) {
if (/wilma.*fred|fred.*wilma/) {
print;
}
}
This works because we'll either have wilma
before fred, or fred before
wilma. If we had written just
/wilma.*fred/, that wouldn't have matched a
line like fred and wilma flintstone, even though
that line mentions both of them.
We made this an extra-credit exercise because many folks have a
mental block here. We showed you an "or" operation (with
the vertical bar, "|"), but we never
showed you an "and" operation. That's because there
isn't one in regular expressions.[393] If you want to know whether one pattern and another are
both successful, just test both of them.
| | | A.5. Answers to Chapter 6 Exercises | | A.7. Answers to Chapter 8 Exercises |
Copyright © 2002 O'Reilly & Associates. All rights reserved.
|
|