8.3. AnchorsBy default, if a pattern doesn't match at the start of the string, it can "float" on down the string, trying to match somewhere else. But there are a number of anchors that may be used to hold the pattern at a particular point in a string. The caret[178] anchor (^) marks the beginning of the string, while the dollar sign ($) marks the end.[179] So the pattern /^fred/ will match fred only at the start of the string; it wouldn't match manfred mann. And /rock$/ will match rock only at the end of the string; it wouldn't match knute rockne.
Sometimes, you'll want to use both of these anchors, to ensure that the pattern matches an entire string. A common example is /^\s*$/, which matches a blank line. But this "blank" line may include some whitespace characters, like tabs and spaces, which are invisible to you and me. Any line that matches that pattern looks just like any other one on paper, so this pattern treats all blank lines as equivalent. Without the anchors, it would match nonblank lines as well. 8.3.1. Word AnchorsAnchors aren't just at the ends of the string. The word-boundary anchor, \b, matches at either end of a word.[180] So we can use /\bfred\b/ to match the word fred but not frederick or alfred or manfred mann. This is similar to the feature often called something like "match whole words only" in a word processor's search command.
Alas, these aren't words as you and I are likely to think of them; they're those \w-type words made up of ordinary letters, digits, and underscores. The \b anchor matches at the start or end of a group of \w characters. In Figure 8-1, there's a grey underline under each "word," and the arrows show the corresponding places where \b could match. There are always an even number of word boundaries in a given string, since there's an end-of-word for every start-of-word. The "words" are sequences of letters, digits, and underscores; that is, a word in this sense is what's matched by /\w+/. There are five words in that sentence: That, s, a, word, and boundary.[181] Notice that the quote marks around word don't change the word boundaries; these words are made of \w characters.
Each arrow points to the beginning or the end of one of the grey underlines, since the word boundary anchor \b matches only at the beginning or the end of a group of word characters. Figure 8-1. Word-boundary matches with \bThe word-boundary anchor is useful to ensure that we don't accidentally find cat in delicatessen, dog in boondoggle, or fish in selfishness. Sometimes you'll want just one word-boundary anchor, as when using /\bhunt/ to match words like hunt or hunting or hunter, but not shunt, or when using /stone\b/ to match words like sandstone or flintstone but not capstones. The nonword-boundary anchor is \B; it matches at any point where \b would not match. So the pattern /\bsearch\B/ will match searches, searching, and searched, but not search or researching. Copyright © 2002 O'Reilly & Associates. All rights reserved. |
|