Checking Whether a String Is a Valid Number (Perl Cookbook, 2nd Edition)

2.1.3. Discussion

This problem gets to the heart of what we mean by a number. Even things that sound simple, like integer, make you think hard about what you will accept; for example, "Is a leading + for positive numbers optional, mandatory, or forbidden?" The many ways that floating-point numbers can be represented could overheat your brain.

Decide what you will and will not accept. Then, construct a regular expression to match those things alone. Here are some precooked solutions (the Cookbook's equivalent of just-add-water meals) for most common cases:

warn "has nondigits"        if     /\D/;
warn "not a natural number" unless /^\d+$/;             # rejects -3
warn "not an integer"       unless /^-?\d+$/;           # rejects +3
warn "not an integer"       unless /^[+-]?\d+$/;
warn "not a decimal number" unless /^-?\d+\.?\d*$/;     # rejects .2
warn "not a decimal number" unless /^-?(?:\d+(?:\.\d*)?|\.\d+)$/;
warn "not a C float"
       unless /^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/;

These lines do not catch the IEEE notations of "Infinity" and "NaN", but unless you're worried that IEEE committee members will stop by your workplace and beat you over the head with copies of the relevant standards documents, you can probably forget about these strange forms.

If your number has leading or trailing whitespace, those patterns won't work. Either add the appropriate logic directly, or call the trim function from Recipe 1.19.

The CPAN module Regexp::Common provides a wealth of canned patterns that test whether a string looks like a number. Besides saving you from having to figure out the patterns on your own, it also makes your code more legible. By default, this module exports a hash called %RE that you index into, according to which kind of regular expression you're looking for. Be careful to use anchors as needed; otherwise, it will search for that pattern anywhere in the string. For example:

use Regexp::Common;
$string = "Gandalf departed from the Havens in 3021 TA.";
print "Is an integer\n"           if $string =~ / ^   $RE{num}{int}  $ /x;
print "Contains the integer $1\n" if $string =~ /   ( $RE{num}{int} )  /x;

The following examples are other patterns that the module can use to match numbers:

$RE{num}{int}{-sep=>',?'}              # match 1234567 or 1,234,567
$RE{num}{int}{-sep=>'.'}{-group=>4}    # match 1.2345.6789
$RE{num}{int}{-base => 8}              # match 014 but not 99
$RE{num}{int}{-sep=>','}{-group=3}     # match 1,234,594
$RE{num}{int}{-sep=>',?'}{-group=3}    # match 1,234 or 1234
$RE{num}{real}                         # match 123.456 or -0.123456
$RE{num}{roman}                        # match xvii or MCMXCVIII
$RE{num}{square}                       # match 9 or 256 or 12321

Some of these patterns, such as square, were not available in early module versions. General documentation for the module can be found in the Regexp::Common manpage, but more detailed documentation for just the numeric patterns is in the Regexp::Common::number manpage.

Some techniques for identifying numbers don't involve regular expressions. Instead, these techniques use functions from system libraries or Perl to determine whether a string contains an acceptable number. Of course, these functions limit you to the definition of "number" offered by your libraries and Perl.

If you're on a POSIX system, Perl supports the POSIX::strtod function. Its semantics are cumbersome, so the following is a getnum wrapper function for more convenient access. This function takes a string and returns either the number it found or undef for input that isn't a C float. The is_numeric function is a frontend to getnum for when you just want to ask, "Is this a float?"

sub getnum {
    use POSIX qw(strtod);
    my $str = shift;
    $str =~ s/^\s+//;           # remove leading whitespace
    $str =~ s/\s+$//;           # remove trailing whitespace
    $! = 0;
    my($num, $unparsed) = strtod($str);
    if (($str eq '') || ($unparsed != 0) || $!) {
        return;
    } else {
        return $num;
    } 
} 

sub is_numeric { defined scalar &getnum }

The Scalar::Util module, newly standard as of Perl v5.8.1, exports a function called looks_like_number( ) that uses the Perl compiler's own internal function of the same name (see perlapi(1)). It returns true for any base-10 number that is acceptable to Perl itself, such as 0, 0.8, 14.98, and 6.02e23—but not 0xb1010, 077, 0x392, or numbers with underscores in them. This means that you must check for alternate bases and decode them yourself if you want to permit users to enter such numbers, as in Example 2-1.

Example 2-1. Decode numbers

    #!/usr/bin/perl -w
    use Scalar::Util qw(looks_like_number);
    print "$0: hit ^D (your eof character) to exit\n";
    for (;;) {
        my ($on, $n);      # original string and its numeric value 
        print "Pick a number, any number: ";
        $on = $n = <STDIN>;
        last if !defined $n;
        chomp($on,$n);
        $n =~ s/_//g;                      # allow 186_282.398_280_685
        $n = oct($n) if $n =~ /^0/;  # allow 0xFF, 037, 0b1010
        if (looks_like_number($n)) {
            printf "Decimal double of $on is %g\n", 2*$n;
        } else {
            print "That doesn't look like a number to Perl.\n";
        }
    }
    print "\nBye.\n";

2.1. Checking Whether a String Is a Valid Number

2.1.1. Problem

2.1.2. Solution

2.1.3. Discussion

Example 2-1. Decode numbers

2.1.4. See Also