home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


6.2. Matching Letters

Problem

You want to see whether a value only consists of alphabetic characters.

Solution

The obvious character class for matching regular letters isn't good enough in the general case:

if ($var =~ /^[A-Za-z]+$/) {
    # it is purely alphabetic
}

That's because it doesn't respect the user's locale settings. If you need to match letters with diacritics as well, use locale and match against a negated character class:

use locale;
if ($var =~ /^[^\W\d_]+$/) {
    print "var is purely alphabetic\n";
}

Discussion

Perl can't directly express "something alphabetic" independent of locale, so we have to be more clever. The \w regular expression notation matches one alphabetic, numeric, or underscore character. Therefore, \W is not one of those. The negated character class [^\W\d_] specifies a byte that must not be an alphanumunder, a digit, or an underscore. That leaves us with nothing but alphabetics, which is what we were looking for.

Here's how you'd use this in a program:

use locale;
use POSIX 'locale_h';

# the following locale string might be different on your system
unless (setlocale(LC_ALL, "fr_CA.ISO8859-1")) {
    die "couldn't set locale to French Canadian\n";
}

while (<DATA>) {
    chomp;
    if (/^[^\W\d_]+$/) {
        print "$_: alphabetic\n";
    } else {
        print "$_: line noise\n";
    }
}

__END__
silly
façade
coöperate
niño
Renée
Molière
hæmoglobin
naïve
tschüß
random!stuff#here





See Also

The treatment of locales in Perl in perllocale (1); your system's locale (3) manpage; we discuss locales in greater depth in Recipe 6.12 ; the "Perl and the POSIX Locale" section of Chapter 7 of Mastering Regular Expressions