6.3. Matching WordsProblemYou want to pick out words from a string. SolutionThink long and hard about what you want a word to be and what separates one word from the next, then write a regular expression that embodies your decisions. For example: /\S+/ # as many non-whitespace bytes as possible /[A-Za-z'-]+/ # as many letters, apostrophes, and hyphens Discussion
Because words vary between applications, languages, and input streams, Perl does not have built-in definitions of words. You must make them from character classes and quantifiers yourself, as we did previously. The second pattern is an attempt to recognize
Most approaches will have limitations because of the vagaries of written human languages. For instance, although the second pattern successfully identifies /\b([A-Za-z]+)\b/ # usually best /\s([A-Za-z]+)\s/ # fails at ends or w/ punctuation
Although Perl provides
See Also
The treatment of Copyright © 2001 O'Reilly & Associates. All rights reserved. |
|