home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Unix Power ToolsUnix Power ToolsSearch this book

32.8. Regular Expressions: Specifying a Range of Characters with [...]

If you want to match specific characters, you can use square brackets, [ ], to identify the exact characters you are searching for. The pattern that will match any line of text that contains exactly one digit is ^[0123456789]$. This is longer than it has to be. You can use the hyphen between two characters to specify a range: ^[0-9]$. You can intermix explicit characters with character ranges. This pattern will match a single character that is a letter, digit, or underscore: [A-Za-z0-9_]. Character sets can be combined by placing them next to one another. If you wanted to search for a word that:

  • started with an uppercase T,

  • was the first word on a line,

  • had a lowercase letter as its second letter,

  • was three letters long (followed by a space character ()), and

  • had a lowercase vowel as its third letter,

the regular expression would be:

^T[a-z][aeiou]

To be specific: a range is a contiguous series of characters, from low to high, in the ASCII character set.[101] For example, [z-a] is not a range because it's backwards. The range [A-z] matches both uppercase and lowercase letters, but it also matches the six characters that fall between uppercase and lowercase letters in the ASCII chart: [, \, ], ^, _, and '.

[101]Some languages, notably Java and Perl, do support Unicode regular expressions, but as Unicode generally subsumes the ASCII 7-bit character set, regular expressions written for ASCII will work as well.

-- BB



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.