1.16. Soundex MatchingProblemYou have two English surnames and want to know whether they sound somewhat similar, regardless of spelling. This would let you offer users a "fuzzy search" of names in a telephone book to catch "Smith" and "Smythe" and others within the set, such as "Smite" and "Smote." SolutionUse the standard Text::Soundex module: use Text::Soundex; $CODE = soundex($STRING); @CODES = soundex(@LIST); DiscussionThe soundex algorithm hashes words (particularly English surnames) into a small space using a simple model that approximates an English speaker's pronunciation of the words. Roughly speaking, each word is reduced to a four character string. The first character is an uppercase letter; the remaining three are digits. By comparing the soundex values of two strings, we can guess whether they sound similar. The following program prompts for a name and looks for similarly sounding names from the password file. This same approach works on any database with names, so you could key the database on the soundex values if you wanted to. Such a key wouldn't be unique, of course. use Text::Soundex; use User::pwent; print "Lookup user: "; chomp($user = <STDIN>); exit unless defined $user; $name_code = soundex($user); while ($uent = getpwent()) { ($firstname, $lastname) = $uent->gecos =~ /(\w+)[^,]*\b(\w+)/; if ($name_code eq soundex($uent->name) || $name_code eq soundex($lastname) || $name_code eq soundex($firstname) ) { printf "%s: %s %s\n", $uent->name, $firstname, $lastname; } } See AlsoThe documentation for the standard Text::Soundex and User::pwent modules (also in Chapter 7 of Programming Perl ); your system's passwd (5) manpage; Volume 3, Chapter 6 of The Art of Computer Programming |
|