6.2.3. Discussion
Apart from Unicode properties or POSIX character classes, Perl can't
directly express "something alphabetic" independent of locale, so we
have to be more clever. The \w regular expression
notation matches one alphabetic, numeric, or underscore
character—hereafter known as an "alphanumunder" for short.
Therefore, \W is one character that is not one of
those. The negated character class [^\W\d_]
specifies a character that must be neither a non-alphanumunder, a
digit, nor an underscore. That leaves nothing but alphabetics, which
is what we were looking for.
Here's how you'd use this in a program:
use locale;
use POSIX 'locale_h';
# the following locale string might be different on your system
unless (setlocale(LC_ALL, "fr_CA.ISO8859-1")) {
die "couldn't set locale to French Canadian\n";
}
while (<DATA>) {
chomp;
if (/^[^\W\d_]+$/) {
print "$_: alphabetic\n";
} else {
print "$_: line noise\n";
}
}
_ _END_ _
silly
façade
coöperate
niño
Renée
Molière
hæmoglobin
naïve
tschüß
random!stuff#here
POSIX character classes help a little here; available ones are
alpha, alnum,
ascii, blank,
cntrl, digit,
graph, lower,
print, punct,
space, upper,
word, and xdigit. These are
valid only within a square-bracketed character class specification:
$phone =~ /\b[:digit:]{3}[[:space:][:punct:]]?[:digit:]{4}\b/; # WRONG
$phone =~ /\b[[:digit:]]{3}[[:space:][:punct:]]?[[:digit:]]{4}\b/; # RIGHT
It would be easier to use properties instead, because they don't have
to occur only within other square brackets:
$phone =~ /\b\p{Number}{3}[\p{Space}\p{Punctuation]?\p{Number}{4}\b/;
$phone =~ /\b\pN{3}[\pS\pP]?\pN{4}\b/; # abbreviated form
Match any one character with Unicode property
prop using
\p{prop}; to match any
character lacking that property, use
\P{prop} or
[^\p{prop}]. The relevant
property when looking for alphabetics is
Alphabetic, which can be abbreviated as simply
Letter or even just L.
Other relevant properties include
UppercaseLetter,
LowercaseLetter, and
TitlecaseLetter; their short forms are
Lu, Ll, and
Lt, respectively.