home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam    

Book HomePerl CookbookSearch this book

1.16. Soundex Matching


You have two English surnames and want to know whether they sound somewhat similar, regardless of spelling. This would let you offer users a "fuzzy search" of names in a telephone book to catch "Smith" and "Smythe" and others within the set, such as "Smite" and "Smote."


Use the standard Text::Soundex module:

 use Text::Soundex;

 $CODE  = soundex($STRING);
 @CODES = soundex(@LIST);


The soundex algorithm hashes words (particularly English surnames) into a small space using a simple model that approximates an English speaker's pronunciation of the words. Roughly speaking, each word is reduced to a four character string. The first character is an uppercase letter; the remaining three are digits. By comparing the soundex values of two strings, we can guess whether they sound similar.

The following program prompts for a name and looks for similarly sounding names from the password file. This same approach works on any database with names, so you could key the database on the soundex values if you wanted to. Such a key wouldn't be unique, of course.

use Text::Soundex;
use User::pwent;

print "Lookup user: ";
chomp($user = <STDIN>);
exit unless defined $user;
$name_code = soundex($user);

while ($uent = getpwent()) {
    ($firstname, $lastname) = $uent->gecos =~ /(\w+)[^,]*\b(\w+)/;

    if ($name_code eq soundex($uent->name) ||
        $name_code eq soundex($lastname)   ||
        $name_code eq soundex($firstname)  )
        printf "%s: %s %s\n", $uent->name, $firstname, $lastname;

See Also

The documentation for the standard Text::Soundex and User::pwent modules (also in Chapter 7 of Programming Perl ); your system's passwd (5) manpage; Volume 3, Chapter 6 of The Art of Computer Programming

Previous: 1.15. Parsing Comma-Separated Data Perl Cookbook Next: 1.17. Program: fixstyle
1.15. Parsing Comma-Separated Data Book Index 1.17. Program: fixstyle

Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.