7.2 Library ModulesAs mentioned earlier, the following library modules are arranged in alphabetical order, for easy reference. AnyDBM_File--Provide Framework for Multiple DBMs
use AnyDBM_File; This module is a "pure virtual base class"--it has nothing of its own. It's just there to inherit from the various DBM packages. By default it inherits from NDBM_File for compatibility with earlier versions of Perl. If it doesn't find NDBM_File, it looks for DB_File, GDBM_File, SDBM_File (which is always there--it comes with Perl), and finally ODBM_File. Perl's dbmopen function (which now exists only for backward compatibility) actually just calls tie to bind a hash to AnyDBM_File. The effect is to bind the hash to one of the specific DBM classes that AnyDBM_File inherits from. You can override the defaults and determine which class dbmopen will tie to. Do this by redefining @ISA:
@AnyDBM_File::ISA = qw(DB_File GDBM_File NDBM_File); Note, however, that an explicit use takes priority over the ordering of @ISA, so that:
use GDBM_File; will cause the next dbmopen to tie your hash to GDBM_File. You can tie hash variables directly to the desired class yourself, without using dbmopen or AnyDBM_File. For example, by using multiple DBM implementations, you can copy a database from one format to another:
use Fcntl; # for O_* values use NDBM_File; use DB_File; tie %oldhash, "NDBM_File", $old_filename, O_RDWR; tie %newhash, "DB_File", $new_filename, O_RDWR|O_CREAT|O_EXCL, 0644; while (($key,$val) = each %oldhash) { $newhash{$key} = $val; } DBM comparisonsHere's a table of the features that the different DBMish packages offer:
See alsoRelevant library modules include: DB_File, GDBM_File, NDBM_File, ODBM_File, and SDBM_File. Related manpages: dbm (3), ndbm (3). Tied variables are discussed extensively in Chapter 5, Packages, Modules, and Object Classes, and the dbmopen entry in Chapter 3, Functions, may also be helpful. You can pick up the unbundled modules from the src/misc/ directory on your nearest CPAN site. Here are the most popular ones, but note that their version numbers may have changed by the time you read this: AutoLoader--Load Functions Only on Demand
package GoodStuff; use Exporter; use AutoLoader; @ISA = qw(Exporter AutoLoader); The AutoLoader module provides a standard mechanism for delayed loading of functions stored in separate files on disk. Each file has the same name as the function (plus a .al ), and comes from a directory named after the package (with the auto/ directory). For example, the function named GoodStuff::whatever() would be loaded from the file auto/GoodStuff/whatever.al. A module using the AutoLoader should have the special marker _ _END_ _ prior to the actual subroutine declarations. All code before this marker is loaded and compiled when the module is used. At the marker, Perl stops parsing the file. When a subroutine not yet in memory is called, the AUTOLOAD function attempts to locate it in a directory relative to the location of the module file itself. As an example, assume POSIX.pm is located in /usr/local/lib/perl5/POSIX.pm. The AutoLoader will look for the corresponding subroutines for this package in /usr/ local/lib/perl5/auto/POSIX/*.al. Lexicals declared with my in the main block of a package using the AutoLoader will not be visible to autoloaded functions, because the given lexical scope ends at the _ _END_ _ marker. A module using such variables as file-scoped globals will not work properly under the AutoLoader. Package globals must be used instead. When running under use strict, the use vars pragma may be employed in such situations as an alternative to explicitly qualifying all globals with the package name. Package variables predeclared with this pragma will be accessible to any autoloaded routines, but of course will not be invisible outside the module file. The AutoLoader is a counterpart to the SelfLoader module. Both delay the loading of subroutines, but the SelfLoader accomplishes this by storing the subroutines right there in the module file rather than in separate files elsewhere. While this avoids the use of a hierarchy of disk files and the associated I/O for each routine loaded, the SelfLoader suffers a disadvantage in the one-time parsing of the lines after _ _DATA_ _, after which routines are cached. The SelfLoader can also handle multiple packages in a file. AutoLoader, on the other hand, only reads code as it is requested, and in many cases should be faster. But it requires a mechanism like AutoSplit to be used to create the individual files. On systems with restrictions on file name length, the file corresponding to a subroutine may have a shorter name than the routine itself. This can lead to conflicting filenames. The AutoSplit module will warn of these potential conflicts when used to split a module. See the discussion of autoloading in Chapter 5, Packages, Modules, and Object Classes. Also see the AutoSplit module, a utility that automatically splits a module into a collection of files for autoloading. AutoSplit--Split a Module for Autoloading
# from a program use AutoSplit; autosplit_modules(@ARGV) # or from the command line perl -MAutoSplit -e 'autosplit(FILE, DIR, KEEP, CHECK, MODTIME)' ... # another interface perl -MAutoSplit -e 'autosplit_lib_modules(@ARGV)' ... This function splits up your program or module into files that the AutoLoader module can handle. It is mainly used to build autoloading Perl library modules, especially complex ones like POSIX. It is used by both the standard Perl libraries and by the MakeMaker module to automatically configure libraries for autoloading. The autosplit() interface splits the specified FILE into a hierarchy rooted at the directory DIR. It creates directories as needed to reflect class hierarchy. It then creates the file autosplit.ix, which acts as both a forward declaration for all package routines and also as a timestamp for when the hierarchy was last updated. The remaining three arguments to autosplit() govern other options to the autosplitter. If the third argument, KEEP, is false, then any pre-existing .al files in the autoload directory are removed if they are no longer part of the module (obsoleted functions). The fourth argument, CHECK, instructs autosplit() to check the module currently being split to ensure that it really does include a use specification for the AutoLoader module, and skips the module if AutoLoader is not detected. Lastly, the MODTIME argument specifies that autosplit() is to check the modification time of the module against that of the autosplit.ix file, and only split the module if it is newer. Here's a typical use of AutoSplit by the MakeMaker utility via the command line:
perl -MAutoSplit -e 'autosplit($ARGV[0], $ARGV[1], 0, 1, 1)' MakeMaker defines this as a make macro, and it is invoked with file and directory arguments. The autosplit() function splits the named file into the given directory and deletes obsolete .al files, after checking first that the module does use the AutoLoader and ensuring that the module isn't already split in its current form. The autosplit_lib_modules() form is used in the building of Perl. It takes as input a list of files (modules) that are assumed to reside in a directory lib/ relative to the current directory. Each file is sent to the autosplitter one at a time, to be split into the directory lib/auto/. In both usages of the autosplitter, only subroutines defined following the Perl special marker _ _END_ _ are split out into separate files. Routines placed prior to this marker are not autosplit, but are forced to load when the module is first required. Currently, AutoSplit cannot handle multiple package specifications within one file. AutoSplit will inform the user if it is necessary to create the top-level directory specified in the invocation. It's better if the script or installation process that invokes AutoSplit has created the full directory path ahead of time. This warning may indicate that the module is being split into an incorrect path. AutoSplit will also warn the user of subroutines whose names cause potential naming conflicts on machines with severely limited (eight characters or less) filename length. Since the subroutine name is used as the filename, these warnings can aid in portability to such systems. Warnings are issued and the file skipped if AutoSplit cannot locate either the _ _END_ _ marker or a specification of the form package Name;. AutoSplit will also complain if it can't create directories or files. Benchmark--Check and Compare Running Times of Code
use Benchmark; # timeit(): run $count iterations of the given Perl code, and time it $t = timeit($count, 'CODE'); # $t is now a Benchmark object # timestr(): convert Benchmark times to printable strings print "$count loops of 'CODE' took:", timestr($t), "\n"; # timediff(): calculate the difference between two times $t = timediff($t1 - $t2); # timethis(): run "code" $count times with timeit(); also, print out a # header saying "timethis $count: " $t = timethis($count, "CODE"); # timethese(): run timethis() on multiple chunks of code @t = timethese($count, { 'Name1' => '...CODE1...', 'Name2' => '...CODE2...', }); # new method: return the current time $t0 = new Benchmark; # ... your CODE here ... $t1 = new Benchmark; $td = timediff($t1, $t0); print "the code took: ", timestr($td), "\n"; # debug method: enable or disable debugging Benchmark->debug (1); $t = timeit(10, ' 5 ** $Global '); Benchmark->debug(0); The Benchmark module encapsulates a number of routines to help you figure out how long it takes to execute some code a given number of times within a loop. For the timeit() routine, $count is the number of times to run the loop. CODE is a string containing the code to run. timeit() runs a null loop with $count iterations, and then runs the same loop with your code inserted. It reports the difference between the times of execution. For timethese(), a loop of $count iterations is run on each code chunk separately, and the results are reported separately. The code to run is given as a hash with keys that are names and values that are code. timethese() is handy for quick tests to determine which way of doing something is faster. For example:
$ perl -MBenchmark -Minteger timethese(100000, { add => '$i += 2', inc => '$i++; $i++' }); _ _END_ _ Benchmark: timing 1000000 iterations of add, inc... add: 4 secs ( 4.52 usr 0.00 sys = 4.52 cpu) inc: 6 secs ( 5.32 usr 0.00 sys = 5.32 cpu) The following routines are exported into your namespace if you use the Benchmark module:
timeit() timethis() timethese() timediff() timestr() The following routines will be exported into your namespace if you specifically ask that they be imported:
clearcache() # clear just the cache element indexed by $key clearallcache() # clear the entire cache disablecache() # do not use the cache enablecache() # resume caching NotesCode is executed in the caller's package. The null loop times are cached, the key being the number of iterations. You can control caching with calls like these:
clearcache($key); clearallcache(); disablecache(); enablecache(); Benchmark inherits only from the Exporter class. The elapsed time is measured using time (2) and the granularity is therefore only one second. Times are given in seconds for the whole loop (not divided by the number of iterations). Short tests may produce negative figures because Perl can appear to take longer to execute the empty loop than a short test. The user and system CPU time is measured to millisecond accuracy using times (3). In general, you should pay more attention to the CPU time than to elapsed time, especially if other processes are running on the system. Also, elapsed times of five seconds or more are needed for reasonable accuracy. Because you pass in a string to be evaled instead of a closure to be executed, lexical variables declared with my outside of the eval are not visible. Carp--Generate Error Messages
use Carp; carp "Be careful!"; # warn of errors (from perspective of caller) croak "We're outta here!"; # die of errors (from perspective of caller) confess "Bye!"; # die of errors with stack backtrace carp() and croak() behave like warn and die, respectively, except that they report the error as occurring not at the line of code where they are invoked, but at a line in one of the calling routines. Suppose, for example, that you have a routine goo() containing an invocation of carp(). In that case--and assuming that the current stack shows no callers from a package other than the current one--carp() will report the error as occurring where goo() was called. If, on the other hand, callers from different packages are found on the stack, then the error is reported as occurring in the package immediately preceding the package in which the carp() invocation occurs. The intent is to let library modules act a little more like built-in functions, which always report errors where you call them from. confess() is like die except that it prints out a stack backtrace. The error is reported at the line where confess() is invoked, not at a line in one of the calling routines. Config--Access Perl Configuration Information
use Config; if ($Config{cc} =~ /gcc/) { print "built by gcc\n"; } use Config qw(myconfig config_sh config_vars); print myconfig(); print config_sh(); config_vars(qw(osname archname)); The Config module contains all the information that the Configure script had to figure out at Perl build time (over 450 values).[1]
Shell variables from the config.sh file (written by Configure) are stored in a readonly hash, %Config, indexed by their names. Values set to the string "undef" in config.sh are returned as undefined values. The Perl exists function should be used to check whether a named variable exists.
Here's a more sophisticated example using %Config:
use Config; defined $Config{sig_name} or die "No sigs?"; foreach $name (split(' ', $Config{sig_name})) { $signo{$name} = $i; $signame[$i] = $name; $i++; } print "signal #17 = $signame[17]\n"; if ($signo{ALRM}) { print "SIGALRM is $signo{ALRM}\n"; } Because configuration information is not stored within the Perl executable itself, it is possible (but unlikely) that the information might not relate to the actual Perl binary that is being used to access it. The Config module checks the Perl version number when loaded to try to prevent gross mismatches, but can't detect subsequent rebuilds of the same version. Cwd--Get Pathname of Current Working Directory
use Cwd; $dir = cwd(); # get current working directory safest way $dir = getcwd(); # like getcwd(3) or getwd(3) $dir = fastcwd(); # faster and more dangerous use Cwd 'chdir'; # override chdir; keep PWD up to date chdir "/tmp"; print $ENV{PWD}; # prints "/tmp" cwd() gets the current working directory using the most natural and safest form for the current architecture. For most systems it is identical to `pwd` (but without the trailing line terminator). getcwd() does the same thing by re-implementing getcwd (3) or getwd (3) in Perl. fastcwd() looks the same as getcwd(), but runs faster. It's also more dangerous because you might chdir out of a directory that you can't chdir back into. It is recommended that one of these functions be used in all code to ensure portability because the pwd program probably only exists on UNIX systems. If you consistently override your chdir built-in function in all packages of your program, then your PWD environment variable will automatically be kept up to date. Otherwise, you shouldn't rely on it. (Which means you probably shouldn't rely on it.) DB_File--Access to Berkeley DB
use DB_File; # brackets in following code indicate optional arguments [$X =] tie %hash, "DB_File", $filename [, $flags, $mode, $DB_HASH]; [$X =] tie %hash, "DB_File", $filename, $flags, $mode, $DB_BTREE; [$X =] tie @array, "DB_File", $filename, $flags, $mode, $DB_RECNO; $status = $X->del($key [, $flags]); $status = $X->put($key, $value [, $flags]); $status = $X->get($key, $value [, $flags]); $status = $X->seq($key, $value [, $flags]); $status = $X->sync([$flags]); $status = $X->fd; untie %hash; untie @array; DB_File is the most flexible of the DBM-style tie modules. It allows Perl programs to make use of the facilities provided by Berkeley DB (not included). If you intend to use this module you should really have a copy of the Berkeley DB manual page at hand. The interface defined here mirrors the Berkeley DB interface closely. Berkeley DB is a C library that provides a consistent interface to a number of database formats. DB_File provides an interface to all three of the database (file) types currently supported by Berkeley DB. The file types are:
How does DB_File interface to Berkeley DB?DB_File gives access to Berkeley DB files using Perl's tie function. This allows DB_File to access Berkeley DB files using either a hash (for DB_HASH and DB_BTREE file types) or an ordinary array (for the DB_RECNO file type). In addition to the tie interface, it is also possible to use most of the functions provided in the Berkeley DB API. Differences from Berkeley DBBerkeley DB uses the function dbopen (3) to open or create a database. Below is the C prototype for dbopen (3).
DB * dbopen (const char *file, int flags, int mode, DBTYPE type, const void *openinfo) The type parameter is an enumeration selecting one of the three interface methods, DB_HASH, DB_BTREE or DB_RECNO. Depending on which of these is actually chosen, the final parameter, openinfo, points to a data structure that allows tailoring of the specific interface method. This interface is handled slightly differently in DB_File. Here is an equivalent call using DB_File.
tie %array, "DB_File", $filename, $flags, $mode, $DB_HASH; The filename, flags, and mode parameters are the direct equivalent of their dbopen (3) counterparts. The final parameter $DB_HASH performs the function of both the type and openinfo parameters in dbopen (3). In the example above $DB_HASH is actually a reference to a hash object. DB_File has three of these predefined references. Apart from $DB_HASH, there are also $DB_BTREE and $DB_RECNO. The keys allowed in each of these predefined references are limited to the names used in the equivalent C structure. So, for example, the $DB_HASH reference will only allow keys called bsize, cachesize, ffactor, hash, lorder, and nelem. To change one of these elements, just assign to it like this:
$DB_HASH->{cachesize} = 10_000; Array offsetsIn order to make RECNO more compatible with Perl, the array offset for all RECNO arrays begins at 0 rather than 1 as in Berkeley DB. In-memory databasesBerkeley DB allows the creation of in-memory databases by using NULL (that is, a (char *)0 in C) in place of the filename. DB_File uses undef instead of NULL to provide this functionality.
use strict; use Fcntl; use DB_File; my ($k, $v, %hash); tie(%hash, 'DB_File', undef, O_RDWR|O_CREAT, 0, $DB_BTREE) or die "can't tie DB_File: $!": foreach $k (keys %ENV) { $hash{$k} = $ENV{$k}; } # this will now come out in sorted lexical order # without the overhead of sorting the keys while (($k,$v) = each %hash) { print "$k=$v\n"; } Using the Berkeley DB interface directlyIn addition to accessing Berkeley DB using a tied hash or array, you can also make direct use of most functions defined in the Berkeley DB documentation. To do this you need to remember the return value from tie, or use the tied function to get at it yourself later on.
$db = tie %hash, "DB_File", "filename"; Once you have done that, you can access the Berkeley DB API functions directly.
$db->put($key, $value, R_NOOVERWRITE); # invoke the DB "put" function All the functions defined in the dbopen (3) manpage are available except for close() and dbopen() itself. The DB_File interface to these functions mirrors the way Berkeley DB works. In particular, note that all these functions return only a status value. Whenever a Berkeley DB function returns data via one of its parameters, the DB_File equivalent does exactly the same thing. All the constants defined in the dbopen manpage are also available. Below is a list of the functions available. (The comments only tell you the differences from the C version.)
ExamplesHere are a few examples. First, using $DB_HASH:
use DB_File; use Fcntl; tie %h, "DB_File", "hashed", O_RDWR|O_CREAT, 0644, $DB_HASH; # Add a key/value pair to the file $h{apple} = "orange"; # Check for value of a key print "No, we have some bananas.\n" if $h{banana}; # Delete delete $h{"apple"}; untie %h; Here is an example using $DB_BTREE. Just to make life more interesting, the default comparison function is not used. Instead, a Perl subroutine, Compare(), does a case-insensitive comparison.
use DB_File; use Fcntl; sub Compare { my ($key1, $key2) = @_; "\L$key1" cmp "\L$key2"; } $DB_BTREE->{compare} = 'Compare'; tie %h, 'DB_File', "tree", O_RDWR|O_CREAT, 0644, $DB_BTREE; # Add a key/value pair to the file $h{Wall} = 'Larry'; $h{Smith} = 'John'; $h{mouse} = 'mickey'; $h{duck} = 'donald'; # Delete delete $h{duck}; # Cycle through the keys printing them in order. # Note it is not necessary to sort the keys as # the btree will have kept them in order automatically. while ($key = each %h) { print "$key\n" } untie %h; The preceding code yields this output:
mouse Smith Wall Next, an example using $DB_RECNO. You may access a regular textfile as an array of lines. But the first line of the text file is the zeroth element of the array, and so on. This provides a clean way to seek to a particular line in a text file.
my(@line, $number); $number = 10; use Fcntl; use DB_File; tie(@line, "DB_File", "/tmp/text", O_RDWR|O_CREAT, 0644, $DB_RECNO) or die "can't tie file: $!"; $line[$number - 1] = "this is a new line $number"; Here's an example of updating a file in place:
use Fcntl; use DB_File; tie(@file, 'DB_File', "/tmp/sample", O_RDWR, 0644, $DB_RECNO) or die "can't update /tmp/sample: $!"; print "line #3 was ", $file[2], "\n"; $file[2] = `date`; untie @file; Note that the tied array interface is incomplete, causing some operations on the resulting array to fail in strange ways. See the discussion of tied arrays in Chapter 5, Packages, Modules, and Object Classes. Some object methods are provided to avoid this. Here's an example of reading a file backward:
use DB_File; use Fcntl; $H = tie(@h, "DB_File", $file, O_RDWR, 0640, $DB_RECNO) or die "Cannot open file $file: $!\n"; # print the records in reverse order for ($i = $H->length - 1; $i >= 0; --$i) { print "$i: $h[$i]\n"; } untie @h; Locking databasesConcurrent access of a read-write database by several parties requires that each use some kind of locking. Here's an example that uses the fd() method to get the file descriptor, and then a careful open to give something Perl will flock for you. Run this repeatedly in the background to watch the locks granted in proper order. You have to call the sync() method to ensure that the writes make it to disk between access, or else the library would normally hold some in its own cache.
use Fcntl; use DB_File;
use strict;
sub LOCK_SH { 1 } sub LOCK_EX { 2 } sub LOCK_NB { 4 } sub LOCK_UN { 8 }
my($oldval, $fd, $db_obj, %db_hash, $value, $key);
$key = shift || 'default'; $value = shift || 'magic';
$value .= " $$";
$db_obj = tie(%db_hash, 'DB_File', '/tmp/foo.db', O_CREAT|O_RDWR, 0644) or die "dbcreat /tmp/foo.db $!"; $fd = $db_obj->fd; print "$$: db fd is $fd\n"; open(DB_FH, "+<&=$fd") or die "fdopen $!";
unless (flock (DB_FH, LOCK_SH | LOCK_NB)) { print "$$: CONTENTION; can't read during write update! Waiting for read lock ($!) ...."; unless (flock (DB_FH, LOCK_SH)) { die "flock: $!" } } print "$$: Read lock granted\n";
$oldval = $db_hash{$key}; print "$$: Old value was $oldval\n"; flock(DB_FH, LOCK_UN);
unless (flock (DB_FH, LOCK_EX | LOCK_NB)) { print "$$: CONTENTION; must have exclusive lock! Waiting for write lock ($!) ...."; unless (flock (DB_FH, LOCK_EX)) { die "flock: $!" } }
print "$$: Write lock granted\n"; $db_hash{$key} = $value; sleep 10;
$db_obj->sync(); # to flush flock(DB_FH, LOCK_UN); untie %db_hash; undef $db_obj; # removing the last reference to the DB # closes it. Closing DB_FH is implicit. print "$$: Updated db to $key=$value\n"; See alsoRelated manpages: dbopen (3), hash (3), recno (3), btree (3). Berkeley DB is available from these locations:
Devel::SelfStubber--Generate Stubs for a SelfLoading Module
use Devel::SelfStubber; $modulename = "Mystuff::Grok"; # no .pm suffix or slashes $lib_dir = ""; # defaults to current directory Devel::SelfStubber->stub($modulename, $lib_dir); # stubs only # to generate the whole module with stubs inserted correctly use Devel::SelfStubber; $Devel::SelfStubber::JUST_STUBS = 0; Devel::SelfStubber->stub($modulename, $lib_dir); Devel::SelfStubber supports inherited, autoloaded methods by printing the stubs you need to put in your module before the _ _DATA_ _ token. A subroutine stub looks like this:
sub moo; The stub ensures that if a method is called, it will get loaded. This is best explained using the following example: Assume four classes, A, B, C, and D. A is the root class, B is a subclass of A, C is a subclass of B, and D is another subclass of A.
A / \ B D / C If D calls an autoloaded method moo() which is defined in class A, then the method is loaded into class A, and executed. If C then calls method moo(), and that method was reimplemented in class B, but set to be autoloaded, then the lookup mechanism never gets to the AUTOLOAD mechanism in B because it first finds the moo() method already loaded in A, and so erroneously uses that. If the method moo() had been stubbed in B, then the lookup mechanism would have found the stub, and correctly loaded and used the subroutine from B. So, to get autoloading to work right with classes and subclasses, you need to make sure the stubs are loaded. The SelfLoader can load stubs automatically at module initialization with:
SelfLoader->load_stubs(); But you may wish to avoid having the stub-loading overhead associated with your initialization.[2] In this case, you can put the subroutine stubs before the _ _DATA_ _ token. This can be done manually, by inserting the output of the first call to the stub() method above. But the module also allows automatic insertion of the stubs. By default the stub() method just prints the stubs, but you can set the global $Devel::SelfStubber::JUST_STUBS to 0 and it will print out the entire module with the stubs positioned correctly, as in the second call to stub().
At the very least, this module is useful for seeing what the SelfLoader thinks are stubs; in order to ensure that future versions of the SelfStubber remain in step with the SelfLoader, the SelfStubber actually uses the SelfLoader to determine which stubs are needed. diagnostics--Force Verbose Warning Diagnostics
# As a pragma: use diagnostics; use diagnostics -verbose; enable diagnostics; disable diagnostics; # As a program: $ perl program 2>diag.out $ splain [-v] [-p] diag.out The diagnostics module extends the terse diagnostics normally emitted by both the Perl compiler and the Perl interpreter, augmenting them with the more explicative and endearing descriptions found in Chapter 9, Diagnostic Messages. It affects the compilation phase of your program rather than merely the execution phase. To use in your program as a pragma, merely say:
use diagnostics; at the start (or near the start) of your program. (Note that this enables Perl's -w flag.) Your whole compilation will then be subject to the enhanced diagnostics. These are still issued to STDERR. Due to the interaction between run-time and compile-time issues, and because it's probably not a very good idea anyway, you may not use:
no diagnostics to turn diagnostics off at compile time. However, you can turn diagnostics on or off at run-time by invoking diagnostics::enable() and diagnostics::disable(), respectively. The -verbose argument first prints out the perldiag (1) manpage introduction before any other diagnostics. The $diagnostics::PRETTY variable, if set in a BEGIN block, results in nicer escape sequences for pagers:
BEGIN { $diagnostics::PRETTY = 1 } The standalone programWhile apparently a whole other program, splain is actually nothing more than a link to the (executable) diagnostics.pm module. It acts upon the standard error output of a Perl program, which you may have treasured up in a file, or piped directly to splain. The -v flag has the same effect as:
use diagnostics -verbose The -p flag sets $diagnostics::PRETTY to true. Since you're post-processing with splain, there's no sense in being able to enable() or disable() diagnostics. Output from splain (unlike the pragma) is directed to STDOUT. ExamplesThe following file is certain to trigger a few errors at both run-time and compile-time:
use diagnostics; print NOWHERE "nothing\n"; print STDERR "\n\tThis message should be unadorned.\n"; warn "\tThis is a user warning"; print "\nDIAGNOSTIC TESTER: Please enter a <CR> here: "; my $a, $b = scalar <STDIN>; print "\n"; print $x/$y; If you prefer to run your program first and look at its problems afterward, do this while talking to a Bourne-like shell:
perl -w test.pl 2>test.out ./splain < test.out If you don't want to modify your source code, but still want on-the-fly warnings, do this:
perl -w -Mdiagnostics test.pl If you want to control warnings on the fly, do something like this. (Make sure the use comes first, or you won't be able to get at the enable() or disable() methods.)
use diagnostics; # checks entire compilation phase print "\ntime for 1st bogus diags: SQUAWKINGS\n"; print BOGUS1 'nada'; print "done with 1st bogus\n"; disable diagnostics; # only turns off run-time warnings print "\ntime for 2nd bogus: (squelched)\n"; print BOGUS2 'nada'; print "done with 2nd bogus\n"; enable diagnostics; # turns back on run-time warnings print "\ntime for 3rd bogus: SQUAWKINGS\n"; print BOGUS3 'nada'; print "done with 3rd bogus\n"; disable diagnostics; print "\ntime for 4th bogus: (squelched)\n"; print BOGUS4 'nada'; print "done with 4th bogus\n"; DirHandle--Supply Object Methods for Directory Handles
use DirHandle; my $d = new DirHandle "."; # open the current directory if (defined $d) { while (defined($_ = $d->read)) { something($_); } $d->rewind; while (defined($_ = $d->read)) { something_else($_); } } DirHandle provides an alternative interface to Perl's opendir, closedir, readdir, and rewinddir functions. The only objective benefit to using DirHandle is that it avoids name-space pollution by creating anonymous globs to hold directory handles. Well, and it also closes the DirHandle automatically when the last reference goes out of scope. But since most people only keep a directory handle open long enough to slurp in all the filenames, this is of dubious value. But hey, it's object-oriented. DynaLoader--Automatic Dynamic Loading of Perl Modules
package YourModule; require DynaLoader; @ISA = qw(... DynaLoader ...); bootstrap YourModule; This module defines the standard Perl interface to the dynamic linking mechanisms available on many platforms. A common theme throughout the module system is that using a module should be easy, even if the module itself (or the installation of the module) is more complicated as a result. This applies particularly to the DynaLoader. To use it in your own module, all you need are the incantations listed above in the synopsis. This will work whether YourModule is statically or dynamically linked into Perl. (This is a Configure option for each module.) The bootstrap() method will either call YourModule's bootstrap routine directly if YourModule is statically linked into Perl, or if not, YourModule will inherit the bootstrap() method from DynaLoader, which will do everything necessary to load in your module, and then call YourModule's bootstrap() method for you, as if it were there all the time and you called it yourself. Piece of cake, of the have-it-and-eat-it-too variety. The rest of this description talks about the DynaLoader from the viewpoint of someone who wants to extend the DynaLoader module to a new architecture. The Configure process selects which kind of dynamic loading to use by choosing to link in one of several C implementations, which must be linked into perl statically. (This is unlike other C extensions, which provide a single implementation, which may be linked in either statically or dynamically.) The DynaLoader is designed to be a very simple, high-level interface that is sufficiently general to cover the requirements of SunOS, HP-UX, NeXT, Linux, VMS, Win-32, and other platforms. By itself, though, DynaLoader is practically useless for accessing non-Perl libraries because it provides almost no Perl-to-C "glue". There is, for example, no mechanism for calling a C library function or supplying its arguments in any sort of portable form. This job is delegated to the other extension modules that you may load in by using DynaLoader. Internal interface summary
Variables: @dl_library_path @dl_resolve_using @dl_require_symbols $dl_debug Subroutines: bootstrap($modulename); @filepaths = dl_findfile(@names); $filepath = dl_expandspec($spec); $libref = dl_load_file($filename); $symref = dl_find_symbol($libref, $symbol); @symbols = dl_undef_symbols(); dl_install_xsub($name, $symref [, $filename]); $message = dl_error; The bootstrap() and dl_findfile() routines are standard across all platforms, and so are defined in DynaLoader.pm. The rest of the functions are supplied by the particular .xs file that supplies the implementation for the platform. (You can examine the existing implementations in the ext/DynaLoader/ *.xs files in the Perl source directory. You should also read DynaLoader.pm, of course.) These implementations may also tweak the default values of the variables listed below.
English--Use English or awk Names for Punctuation Variables
use English; ... if ($ERRNO =~ /denied/) { ... } This module provides aliases for the built-in "punctuation" variables. Variables with side effects that get triggered merely by accessing them (like $0) will still have the same effects under the aliases. For those variables that have an awk (1) version, both long and short English alternatives are provided. For example, the $/ variable can be referred to either as $RS or as $INPUT_RECORD_SEPARATOR if you are using the English module. Here is the list of variables along with their English alternatives:
Env--Import Environment Variables
use Env; # import all possible variables use Env qw(PATH HOME TERM); # import only specified variables Perl maintains environment variables in a pseudo-associative array named %ENV. Since this access method is sometimes inconvenient, the Env module allows environment variables to be treated as simple variables. The Env::import() routine ties environment variables to global Perl variables with the same names. By default it ties suitable, existing environment variables (that is, variables yielded by keys %ENV). An environmental variable is considered suitable if its name begins with an alphabetic character, and if it consists of nothing but alphanumeric characters plus underscore. If you supply arguments when invoking use Env, they are taken to be a list of environment variables to tie. It's OK if the variables don't yet exist. After an environment variable is tied, you can use it like a normal variable. You may access its value:
@path = split(/:/, $PATH); or modify it any way you like:
$PATH .= ":."; To remove a tied environment variable from the environment, make it the undefined value:
undef $PATH; Note that the corresponding operation performed directly against %ENV is not undef, but delete:
delete $ENV{PATH}; Exporter--Default Import Method for Modules
# in module YourModule.pm: package YourModule; use Exporter (); @ISA = qw(Exporter); @EXPORT = qw(...); # Symbols to export by default. @EXPORT_OK = qw(...); # Symbols to export on request. %EXPORT_TAGS = (tag => [...]); # Define names for sets of symbols. # in other files that wish to use YourModule: use YourModule; # Import default symbols into my package. use YourModule qw(...); # Import listed symbols into my package. use YourModule (); # Do not import any symbols! Any module may define a class method called import(). Perl automatically calls a module's import() method when processing the use statement for the module. The module itself doesn't have to define the import() method, though. The Exporter module implements a default import() method that many modules choose to inherit instead. The Exporter module supplies the customary import semantics, and any other import() methods will tend to deviate from the normal import semantics in various (hopefully documented) ways. Now we'll talk about the normal import semantics. Specialized import listsIgnoring the class name, which is always the first argument to a class method, the arguments that are passed into the import() method are known as an import list. Usually the import list is nothing more than a list of subroutine or variable names, but occasionally you may want to get fancy. If the first entry in an import list begins with !, :, or /, the list is treated as a series of specifications that either add to or delete from the list of names to import. They are processed left to right. Specifications are in the form:
A leading ! indicates that matching names should be deleted from the list of names to import. If the first specification is a deletion, it is treated as though preceded by :DEFAULT. If you just want to import extra names in addition to the default set, you will still need to include :DEFAULT explicitly. For example, suppose that YourModule.pm says:
@EXPORT = qw(A1 A2 A3 A4 A5); @EXPORT_OK = qw(B1 B2 B3 B4 B5); %EXPORT_TAGS = ( T1 => [qw(A1 A2 B1 B2)], T2 => [qw(A1 A2 B3 B4)] ); Individual names in EXPORT_TAGS must also appear in @EXPORT or @EXPORT_OK. Note that you cannot use the tags directly within either @EXPORT or @EXPORT_OK (though you could preprocess tags into either of those arrays, and in fact, the export_tags() and export_ok_tags() functions below do precisely that). An application using YourModule can then say something like this:
use YourModule qw(:DEFAULT :T2 !B3 A3); The :DEFAULT adds in A1, A2, A3, A4, and A5. The :T2 adds in only B3 and B4, since A1 and A2 were already added. The !B3 then deletes B3, and the A3 does nothing because A3 was already included. Other examples include:
use Socket qw(!/^[AP]F_/ !SOMAXCONN !SOL_SOCKET); use POSIX qw(:errno_h :termios_h !TCSADRAIN !/^EXIT/); Remember that most patterns (using //) will need to be anchored with a leading ^, for example, /^EXIT/ rather than /EXIT/. You can say:
BEGIN { $Exporter::Verbose=1 } in order to see how the specifications are being processed and what is actually being imported into modules. Module version checkingThe Exporter module will convert an attempt to import a number from a module into a call to $module_name->require_version($value). This can be used to validate that the version of the module being used is greater than or equal to the required version. The Exporter module also supplies a default require_version() method, which checks the value of $VERSION in the exporting module. Since the default require_version() method treats the $VERSION number as a simple numeric value, it will regard version 1.10 as lower than 1.9. For this reason it is strongly recommended that the module developer use numbers with at least two decimal places; for example, 1.09. Prior to release 5.004 or so of Perl, this only worked with modules that use the Exporter module; in particular, this means that you can't check the version of a class module that doesn't require the Exporter module. Managing unknown symbolsIn some situations you may want to prevent certain symbols from being exported. Typically this applies to extensions with functions or constants that may not exist on some systems. The names of any symbols that cannot be exported should be listed in the @EXPORT_FAIL array. If a module attempts to import any of these symbols, the Exporter will give the module an opportunity to handle the situation before generating an error. The Exporter will call an export_fail() method with a list of the failed symbols:
@failed_symbols = $module_name->export_fail(@failed_symbols); If the export_fail() method returns an empty list, then no error is recorded and all requested symbols are exported. If the returned list is not empty, then an error is generated for each symbol and the export fails. The Exporter provides a default export_fail() method that simply returns the list unchanged. Uses for the export_fail() method include giving better error messages for some symbols and performing lazy architectural checks. Put more symbols into @EXPORT_FAIL by default and then take them out if someone actually tries to use them and an expensive check shows that they are usable on that platform. Tag handling utility functionsSince the symbols listed within %EXPORT_TAGS must also appear in either @EXPORT or @EXPORT_OK, two utility functions are provided that allow you to easily add tagged sets of symbols to @EXPORT or @EXPORT_OK:
%EXPORT_TAGS = (Bactrian => [qw(aa bb cc)], Dromedary => [qw(aa cc dd)]);
Exporter::export_tags('Bactrian'); # add aa, bb and cc to @EXPORT Exporter::export_ok_tags('Dromedary'); # add aa, cc and dd to @EXPORT_OK Any names that are not tags are added to @EXPORT or @EXPORT_OK unchanged, but will trigger a warning (with -w) to avoid misspelt tag names being silently added to @EXPORT or @EXPORT_OK. Future versions may regard this as a fatal error. ExtUtils::Install--Install Files from Here to There
use ExtUtils::Install; install($hashref, $verbose, $nonono); uninstall($packlistfile, $verbose, $nonono); install() and uninstall() are specific to the way ExtUtils::MakeMaker handles the platform-dependent installation and deinstallation of Perl extensions. They are not designed as general-purpose tools. If you're reading this chapter straight through (brave soul), you probably want to take a glance at the MakeMaker entry first. (Or just skip over everything in the ExtUtils package until you start writing an Ext.) install() takes three arguments: a reference to a hash, a verbose switch, and a don't-really-do-it switch. The hash reference contains a mapping of directories; each key/value pair is a combination of directories to be copied. The key is a directory to copy from, and the value is a directory to copy to. The whole tree below the "from" directory will be copied, preserving timestamps and permissions. There are two keys with a special meaning in the hash: `read` and `write`. After the copying is done, install will write the list of target files to the file named by $hashref->{write}. If there is another file named by $hashref->{read}, the contents of this file will be merged into the written file. The read and the written file may be identical, but on the Andrew File System (AFS) it is fairly likely that people are installing to a different directory than the one where the files later appear. uninstall() takes as first argument a file containing filenames to be unlinked. The second argument is a verbose switch, the third is a no-don't-really-do-it-now switch (useful to know what will happen without actually doing it). ExtUtils::Liblist--Determine Libraries to Use and How to Use Them
require ExtUtils::Liblist; ExtUtils::Liblist::ext($potential_libs, $Verbose); This utility takes a list of libraries in the form -llib1 -llib2 -llib3 and returns lines suitable for inclusion in a Perl extension Makefile on the current platform. Extra library paths may be included with the form -L/another/path. This will affect the searches for all subsequent libraries. ExtUtils::Liblist::ext() returns a list of four scalar values, which Makemaker will eventually use in constructing a Makefile, among other things. The values are:
PortabilityThis module deals with a lot of system dependencies and has quite a few architecture-specific ifs in the code. ExtUtils::MakeMaker--Create a Makefile for a Perl Extension
use ExtUtils::MakeMaker; WriteMakefile( ATTRIBUTE => VALUE, ... ); # which internally is really more like... %att = (ATTRIBUTE => VALUE, ...); MM->new(\%att)->flush; When you build an extension to Perl, you need to have an appropriate Makefile[3] in the extension's source directory. And while you could conceivably write one by hand, this would be rather tedious. So you'd like a program to write it for you.
Originally, this was done using a shell script (actually, one for each extension) called Makefile.SH, much like the one that writes the Makefile for Perl itself. But somewhere along the line, it occurred to the perl5-porters that, by the time you want to compile your extensions, there's already a bare-bones version of the Perl executable called miniperl, if not a fully installed perl. And for some strange reason, Perl programmers prefer programming in Perl to programming in shell. So they wrote MakeMaker, just so that you can write Makefile.PL instead of Makefile.SH. MakeMaker isn't a program; it's a module (or it wouldn't be in this chapter). The module provides the routines you need; you just need to use the module, and then call the routines. As with any programming job, there are many degrees of freedom; but your typical Makefile.PL is pretty simple. For example, here's ext/POSIX/Makefile.PL from the Perl distribution's POSIX extension (which is by no means a trivial extension):
use ExtUtils::MakeMaker; WriteMakefile( NAME => 'POSIX', LIBS => ["-lm -lposix -lcposix"], MAN3PODS => ' ', # Pods will be built by installman. XSPROTOARG => '-noprototypes', # XXX remove later? VERSION_FROM => 'POSIX.pm', ); Several things are apparent from this example, but the most important is that the WriteMakefile() function uses named parameters. This means that you can pass many potential parameters, but you're only required to pass the ones you want to be different from the default values. (And when we say "many", we mean "many"--there are about 75 of them. See the Attributes section later.) As the synopsis above indicates, the WriteMakefile() function actually constructs an object. This object has attributes that are set from various sources, including the parameters you pass to the function. It's this object that actually writes your Makefile, meshing together the demands of your extension with the demands of the architecture on which the extension is being installed. Like many craftily crafted objects, this MakeMaker object delegates as much of its work as possible to various other subroutines and methods. Many of these may be overridden in your Makefile.PL if you need to do some fine tuning. (Generally you don't.) But let's not lose track of the goal, which is to write a Makefile that will know how to do anything to your extension that needs doing. Now as you can imagine, the Makefile that MakeMaker writes is quite, er, full-featured. It's easy to get lost in all the details. If you look at the POSIX Makefile generated by the bit of code above, you will find a file containing about 122 macros and 77 targets. You will want to go off into a corner and curl up into a little ball, saying, "Never mind, I didn't really want to know." Well, the fact of the matter is, you really don't want to know, nor do you have to. Most of these items take care of themselves--that's what MakeMaker is there for, after all. We'll lay out the various attributes and targets for you, but you can just pick and choose, like in a cafeteria. We'll talk about the make targets first, because they're the actions you eventually want to perform, and then work backward to the macros and attributes that feed the targets. But before we do that, you need to know just a few more architectural features of MakeMaker to make sense of some of the things we'll say. The targets at the end of your Makefile depend on the macro definitions that are interpolated into them. Those macro definitions in turn come from any of several places. Depending on how you count, there are about five sources of information for these attributes. Ordered by increasing precedence and (more or less) decreasing permanence, they are:
The first four of these turn into attributes of the object we mentioned, and are eventually written out as macro definitions in your Makefile. In most cases, the names of the values are consistent from beginning to end. (Except that the Config database keeps the names in lowercase, as they come from Perl's config.sh file. The names are translated to uppercase when they become attributes of the object.) In any case, we'll tend to use the term attributes to mean both attributes and the Makefile macros derived from them. The Makefile.PL and the hints may also provide overriding methods for the object, if merely changing an attribute isn't good enough. The hints files are expected to be named like their counterparts in PERL_SRC/hints, but with a .pl filename extension (for example, next_3_2.pl ), because the file consists of Perl code to be evaluated. Apart from that, the rules governing which hintsfile is chosen are the same as in Configure. The hintsfile is evaled within a routine that is a method of our MakeMaker object, so if you want to override or create an attribute, you would say something like:
$self->{LIBS} = ['-ldbm -lucb -lc']; By and large, if your Makefile isn't doing what you want, you just trace back the name of the misbehaving attribute to its source, and either change it there or override it downstream. Extensions may be built using the contents of either the Perl source directory tree or the installed Perl library. The recommended way is to build extensions after you have run make install on Perl itself. You can then build your extension in any directory on your hard disk that is not below the Perl source tree. The support for extensions below the ext/ directory of the Perl distribution is only good for the standard extensions that come with Perl. If an extension is being built below the ext/ directory of the Perl source, then MakeMaker will set PERL_SRC automatically (usually to ../..). If PERL_SRC is defined and the extension is recognized as a standard extension, then other variables default to the following:
PERL_INC = PERL_SRC PERL_LIB = PERL_SRC/lib PERL_ARCHLIB = PERL_SRC/lib INST_LIB = PERL_LIB INST_ARCHLIB = PERL_ARCHLIB If an extension is being built away from the Perl source, then MakeMaker will leave PERL_SRC undefined and default to using the installed copy of the Perl library. The other variables default to the following:
PERL_INC = $archlibexp/CORE PERL_LIB = $privlibexp PERL_ARCHLIB = $archlibexp INST_LIB = ./blib/lib INST_ARCHLIB = ./blib/arch If Perl has not yet been installed, then PERL_SRC can be defined as an override on the command line. TargetsFar and away the most commonly used make targets are those used by the installer to install the extension. So we aim to make the normal installation very easy:
perl Makefile.PL # generate the Makefile make # compile the extension make test # test the extension make install # install the extension This assumes that the installer has dynamic linking available. If not, a couple of additional commands are also necessary:
make perl # link a new perl statically with this extension make inst_perl # install that new perl appropriately Other interesting targets in the generated Makefile are:
make config # check whether the Makefile is up-to-date make clean # delete local temp files (Makefile gets renamed) make realclean # delete derived files (including ./blib) make ci # check in all files in the MANIFEST file make dist # see the "Distribution Support" section below Now we'll talk about some of these commands, and how each of them is related to MakeMaker. So we'll not only be talking about things that happen when you invoke the make target, but also about what MakeMaker has to do to generate that make target. So brace yourself for some temporal whiplash. Running MakeMakerThis command is the one most closely related to MakeMaker because it's the one in which you actually run MakeMaker. No temporal whiplash here. As we mentioned earlier, some of the default attribute values may be overridden by adding arguments of the form KEY=VALUE. For example:
perl Makefile.PL PREFIX=/tmp/myperl5 To get a more detailed view of what MakeMaker is doing, say:
perl Makefile.PL verbose Making whatever is neededA make command without arguments performs any compilation needed and puts any generated files into staging directories that are named by the attributes INST_LIB, INST_ARCHLIB, INST_EXE, INST_MAN1DIR, and INST_MAN3DIR. These directories default to something below . /blib if you are not building below the Perl source directory. If you are building below the Perl source, INST_LIB and INST_ARCHLIB default to .. /.. /lib, and INST_EXE is not defined. Running testsThe goal of this command is to run any regression tests supplied with the extension, so MakeMaker checks for the existence of a file named test.pl in the current directory and, if it exists, adds commands to the test target of the Makefile that will execute the script with the proper set of Perl -I options (since the files haven't been installed into their final location yet). MakeMaker also checks for any files matching glob(`t/*.t`). It will add commands to the test target that execute all matching files via the Test::Harness module with the -I switches set correctly. If you pass TEST_VERBOSE=1, the test target will run the tests verbosely. Installing filesOnce the installer has tested the extension, the various generated files need to get put into their final resting places. The install target copies the files found below each of the INST_* directories to their INSTALL* counterparts.
The INSTALL* attributes in turn default to their %Config counterparts, $Config{installprivlib}, $Config{installarchlib}, and so on. If you don't set INSTALLARCHLIB or INSTALLSITEARCH, MakeMaker will assume you want them to be subdirectories of INSTALLPRIVLIB and INSTALLSITELIB, respectively. The exact relationship is determined by Configure. But you can usually just go with the defaults for all these attributes. The PREFIX attribute can be used to redirect all the INSTALL* attributes in one go. Here's the quickest way to install a module in a nonstandard place:
perl Makefile.PL PREFIX=~ The value you specify for PREFIX replaces one or more leading pathname components in all INSTALL* attributes. The prefix to be replaced is determined by the value of $Config{prefix}, which typically has a value like /usr. (Note that the tilde expansion above is done by MakeMaker, not by perl or make.) If the user has superuser privileges and is not working under the Andrew File System (AFS) or relatives, then the defaults for INSTALLPRIVLIB, INSTALLARCHLIB, INSTALLBIN, and so on should be appropriate. By default, make install writes some documentation of what has been done into the file given by $(INSTALLARCHLIB)/perllocal.pod. This feature can be bypassed by calling make pure_install. If you are using AFS, you must specify the installation directories, since these most probably have changed since Perl itself was installed. Do this by issuing these commands:
perl Makefile.PL INSTALLSITELIB=/afs/here/today INSTALLBIN=/afs/there/now INSTALLMAN3DIR=/afs/for/manpages make Be careful to repeat this procedure every time you recompile an extension, unless you are sure the AFS installation directories are still valid. Static linking of a new Perl binaryThe steps above are sufficient on a system supporting dynamic loading. On systems that do not support dynamic loading, however, the extension has to be linked together statically with everything else you might want in your perl executable. MakeMaker supports the linking process by creating appropriate targets in the Makefile. If you say:
make perl it will produce a new perl binary in the current directory with all extensions linked in that can be found in INST_ARCHLIB, SITELIBEXP, and PERL_ARCHLIB. To do that, MakeMaker writes a new Makefile ; on UNIX it is called Makefile.aperl, but the name may be system-dependent. When you want to force the creation of a new perl, we recommend that you delete this Makefile.aperl so the directories are searched for linkable libraries again. The binary can be installed in the directory where Perl normally resides on your machine with:
make inst_perl To produce a Perl binary with a different filename than perl, either say:
perl Makefile.PL MAP_TARGET=myperl make myperl make inst_perl or say:
perl Makefile.PL make myperl MAP_TARGET=myperl make inst_perl MAP_TARGET=myperl In either case, you will be asked to confirm the invocation of the inst_perl target, since this invocation is likely to overwrite your existing Perl binary in INSTALLBIN. By default make inst_perl documents what has been done in the file given by $(INSTALLARCHLIB)/perllocal.pod. This behavior can be bypassed by calling make pure_inst_perl. Sometimes you might want to build a statically linked Perl even though your system supports dynamic loading. In this case you may explicitly set the linktype:
perl Makefile.PL LINKTYPE=static Attributes you can setThe following attributes can be specified as arguments to WriteMakefile() or as NAME=VALUE pairs on the command line. We give examples below in the form they would appear in your Makefile.PL, that is, as though passed as a named parameter to WriteMakefile() (including the comma that comes after it).
Additional lowercase attributesThere are additional lowercase attributes that you can use to pass parameters to the methods that spit out particular portions of the Makefile. These attributes are not normally required.
Useful Makefile macrosHere are some useful macros that you probably shouldn't redefine because they're derivative.
Overriding MakeMaker methodsIf you cannot achieve the desired Makefile behavior by specifying attributes, you may define private subroutines in the Makefile.PL. Each subroutine returns the text it wishes to have written to the Makefile. To override a section of the Makefile you can use one of two styles. You can just return a new value:
sub MY::c_o { "new literal text" } or you can edit the default by saying something like:
sub MY::c_o { my $self = shift; local *c_o; $_=$self->MM::c_o; s/old text/new text/; $_; } Both methods above are available for backward compatibility with older Makefile.PLs. If you still need a different solution, try to develop another subroutine that better fits your needs and then submit the diffs to either perl5-porters@nicoh.com or comp.lang.perl.modules as appropriate. Distribution supportFor authors of extensions, MakeMaker provides several Makefile targets. Most of the support comes from the ExtUtils::Manifest module, where additional documentation can be found. Note that a MANIFEST file is basically just a list of filenames to be shipped with the kit to build the extension.
Customization of the distribution targets can be done by specifying a hash reference to the dist attribute of the WriteMakefile() call. The following parameters are recognized:
An example:
WriteMakefile( 'dist' => { COMPRESS=>"gzip", SUFFIX=>"gz" }) ExtUtils::Manifest--Utilities to Write and Check a MANIFEST File
require ExtUtils::Manifest; ExtUtils::Manifest::mkmanifest(); ExtUtils::Manifest::manicheck(); ExtUtils::Manifest::filecheck(); ExtUtils::Manifest::fullcheck(); ExtUtils::Manifest::skipcheck(); ExtUtild::Manifest::manifind(); ExtUtils::Manifest::maniread($file); ExtUtils::Manifest::manicopy($read, $target, $how); These routines automate the maintenance and use of a MANIFEST file. A MANIFEST file is essentially just a list of filenames, one per line, with an optional comment on each line, separated by whitespace (usually one or more tabs). The idea is simply that you can extract the filenames by saying:
awk '{print $1}' MANIFEST mkmanifest() writes the names of all files in and below the current directory to a file named in the global variable $ExtUtils::Manifest::MANIFEST (which defaults to MANIFEST) in the current directory. As the counterpart to the awk command above, it works much like:
find . -type f -print > MANIFEST except that it also checks the existing MANIFEST file (if any) and copies over any comments that are found there. Also, all filenames that match any regular expression in a file MANIFEST.SKIP (if such a file exists) are ignored. manicheck() checks whether all files listed in a MANIFEST file in the current directory really do exist. filecheck() finds files below the current directory that are not mentioned in the MANIFEST file. An optional MANIFEST.SKIP file will be consulted, and any filename matching a regular expression in such a file will not be reported as missing in the MANIFEST file. fullcheck() does both a manicheck() and a filecheck(). skipcheck() lists all files that are skipped due to your MANIFEST.SKIP file. manifind() returns a hash reference. The keys of the hash are the files found below the current directory. The values are null strings, representing all the MANIFEST comments that aren't there. maniread($file) reads a named MANIFEST file (defaults to MANIFEST in the current directory) and returns a hash reference, the keys of which are the filenames, and the values of which are the comments that are there. Er, which may be null if the comments aren't there. . . . manicopy($read, $target, $how) copies the files that are the keys in the hash %$read to the named target directory. The hash reference $read is typically returned by the maniread() function. manicopy() is useful for producing a directory tree identical to the intended distribution tree. The third parameter $how can be used to specify a different method of "copying". Valid values are "cp", which actually copies the files, "ln", which creates hard links, and "best", which mostly links the files but copies any symbolic link to make a tree without any symbolic link. "best" is the default, though it may not be the best default. Ignoring filesThe MANIFEST.SKIP file may contain regular expressions of files that should be ignored by mkmanifest() and filecheck(). The regular expressions should appear one on each line. A typical example:
\bRCS\b ^MANIFEST\. (?i)^makefile$ ~$ \.html$ \.old$ ^blib/ ^MakeMaker-\d Exportabilitymkmanifest(), manicheck(), filecheck(), fullcheck(), maniread(), and manicopy() are exportable. Global variables$ExtUtils::Manifest::MANIFEST defaults to MANIFEST. Changing it results in both a different MANIFEST and a different MANIFEST.SKIP file. This is useful if you want to maintain different distributions for different audiences (say a user version and a developer version including RCS). $ExtUtils::Manifest::Quiet defaults to 0. You can set it to a true value to get all the functions to shutup already. DiagnosticsAll diagnostic output is sent to STDERR.
See alsoThe ExtUtils::MakeMaker library module generates a Makefile with handy targets for most of this functionality. ExtUtils::Miniperl--Write the C Code for perlmain.c
use ExtUtils::Miniperl; writemain(@directories); writemain() takes an argument list of directories containing archive libraries that are needed by Perl modules and that should be linked into a new Perl binary. It correspondingly writes to STDOUT a file intended to be compiled as perlmain.c that contains all the bootstrap code to make the modules associated with the libraries available from within Perl. The typical usage is from within a Makefile generated by ExtUtils::MakeMaker. So under normal circumstances you won't have to deal with this module directly.
This entire module is automatically generated from a script called minimod.PL when Perl itself is built. So if you want to patch it, please patch minimod.PL in the Perl distribution instead.
ExtUtils::Mkbootstrap--Make a Bootstrap File for Use by DynaLoader
use ExtUtils::Mkbootstrap; mkbootstrap(); mkbootstrap() typically gets called from an extension's Makefile. It writes a *.bs file that is needed by some architectures to do dynamic loading. It is otherwise unremarkable, and MakeMaker usually handles the details. If you need to know more about it, you've probably already read the module. ExtUtils::Mksymlists--Write Linker Option Files for Dynamic Extension
use ExtUtils::Mksymlists; Mksymlists( NAME => $name, DL_FUNCS => { $pkg1 => [$func1, $func2], $pkg2 => [$func3] }, DL_VARS => [$var1, $var2, $var3]); ExtUtils::Mksymlists() produces files used by the linker under some OSes during the creation of shared libraries for dynamic extensions. It is normally called from a MakeMaker-generated Makefile when the extension is built. The linker option file is generated by calling the function Mksymlists(), which is exported by default from ExtUtils::Mksymlists. It takes one argument, a list of key/value pairs, in which the following keys are recognized:
When calling Mksymlists(), one should always specify the NAME attribute. In most cases, this is all that's necessary. In the case of unusual extensions, however, the other attributes can be used to provide additional information to the linker. ExtUtils::MM_OS2--Methods to Override UNIX Behavior in ExtUtils::MakeMaker
use ExtUtils::MM_OS2; # Done internally by ExtUtils::MakeMaker if needed See ExtUtils::MM_Unix for documentation of the methods provided there. This package overrides the implementation of the methods, not the interface. ExtUtils::MM_Unix--Methods Used by ExtUtils::MakeMaker
require ExtUtils::MM_Unix; The methods provided by this package (and by the other MM_* packages) are designed to be used in conjunction with ExtUtils::MakeMaker. You will never require this module yourself. You would only define methods in this or a similar module if you're working on improving the porting capabilities of MakeMaker. Nevertheless, this is a laudable goal, so we'll talk about it here. When MakeMaker writes a Makefile, it creates one or more objects that inherit their methods from package MM. MM itself doesn't provide any methods, but it inherits from the ExtUtils::MM_Unix class. However, for certain platforms, it also inherits from an OS-specific module such as MM_VMS, and it does this before it inherits from the MM_Unix module in the @ISA list. The inheritance tree of MM therefore lets the OS-specific package override any of the methods listed here. In a sense, the MM_Unix package is slightly misnamed, since it provides fundamental methods on non-UNIX systems too, to the extent that the system is like UNIX. MM methodsWe've avoided listing deprecated methods here, as well as any private methods you're unlikely to want to override.
Methods to produce chunks of text for the MakefileWhen MakeMaker thinks it has all its ducks in a row, it calls a special sequence of methods to produce the Makefile for a given MakeMaker object. The list of methods it calls is specified in the array @ExtUtils::MakeMaker::MM_Sections, one method per section. Since these routines are all called the same way, we won't document each of them separately, except to list them. By far the most accurate and up-to-date documentation for what each method does is actually the Makefile that MakeMaker produces. Each section of the file is labeled with the name of the method that produces it, so once you see how you want to change the Makefile, it's a trivial matter to work back from the proposed change and find the method responsible for it. You've plowed through a lot of ugly things to get here, but since you've read this far, we'll reward you by pointing out something incredibly beautiful in MakeMaker. The arguments (if any) that are passed to each method are simply the pseudo-attributes of the same name that you already saw documented under "Additional Lowercase Attributes" in the section on ExtUtils::MakeMaker. You'll recall that those pseudo-attributes were specified as anonymous hashes, which Just Happen to have exactly the same syntax inside as named parameters. Fancy that. So the arguments just come right into your method as ordinary named parameters. Assign the arguments to a hash, and off you go. And it's completely forward and backward compatible. Even if you override a method that didn't have arguments before, there's no problem. Since it's all driven off the method name, just name your new pseudo-attribute after your method, and your method will get its arguments. The return values are also easy to understand: each method simply returns the string it wants to put into its section of the Makefile. Two special methods are post_initialize() and postamble(), each of which returns an empty string by default. You can define them in your Makefile.PL to insert customized text near the beginning or end of the Makefile. Here are the methods. They're called in this order (reading down the columns):
See alsoExtUtils::MakeMaker library module. ExtUtils::MM_VMS--Methods to Override UNIX Behavior in ExtUtils::MakeMaker
use ExtUtils::MM_VMS; # Done internally by ExtUtils::MakeMaker if needed See ExtUtils::MM_Unix for documentation of the methods provided there. This package overrides the implementation of the methods, not the interface. Fcntl--Load the C fcntl.h Defines
use Fcntl; $nonblock_flag = O_NDELAY(); $create_flag = O_CREAT(); $read_write_flag = O_RDWR(); This module is just a translation of the C fcntl.h file. Unlike the old mechanism which required a translated fcntl.ph file, fcntl uses the h2xs program (see the Perl source distribution) and your native C compiler. This means that it has a much better chance of getting the numbers right. Note that only #define symbols get translated; you must still correctly pack up your own arguments to pass as arguments for locking functions and so on. The following routines are exported by default, and each routine returns the value of the #define that is the same as the routine name:
File::Basename--Parse File Specifications
use File::Basename; ($name, $path, $suffix) = fileparse($fullname, @suffixlist) fileparse_set_fstype($os_string); # $os_string specifies OS type $basename = basename($fullname, @suffixlist); $dirname = dirname($fullname); ($name, $path, $suffix) = fileparse("lib/File/Basename.pm", '\.pm'); fileparse_set_fstype("VMS"); $basename = basename("lib/File/Basename.pm", ".pm"); $dirname = dirname("lib/File/Basename.pm"); These routines allow you to parse file specifications into useful pieces using the syntax of different operating systems.
File::CheckTree--Run Many Tests on a Collection of Files
use File::CheckTree; $warnings += validate( q{ /vmunix -e || die /boot -e || die /bin cd csh -ex csh !-ug sh -ex sh !-ug /usr -d || warn "What happened to $file?\n" }); The validate() routine takes a single multi-line string, each line of which contains a filename plus a file test to try on it. (The file test may be given as "cd", causing subsequent relative filenames to be interpreted relative to that directory.) After the file test you may put "|| die" to make it a fatal error if the file test fails. The default is:
|| warn You can reverse the sense of the test by prepending "!". If you specify "cd" and then list some relative filenames, you may want to indent them slightly for readability. If you supply your own die or warn message, you can use $file to interpolate the filename. File tests may be grouped: -rwx tests for all of -r, -w, and -x. Only the first failed test of the group will produce a warning. validate() returns the number of warnings issued, presuming it didn't die. File::Copy--Copy Files or Filehandles
use File::Copy; copy("src-file", "dst-file"); copy("Copy.pm", \*STDOUT); use POSIX; use File::Copy 'cp'; $fh = FileHandle->new("/dev/null", "r"); cp($fh, "dst-file");' The Copy module provides one function, copy(), that takes two parameters: a file to copy from and a file to copy to. Either argument may be a string, a FileHandle reference, or a FileHandle glob. If the first argument is a filehandle of some sort, it will be read from; if it is a filename, it will be opened for reading. Likewise, the second argument will be written to (and created if need be). An optional third parameter is a hint that requests the buffer size to be used for copying. This is the number of bytes from the first file that will be held in memory at any given time, before being written to the second file. The default buffer size depends upon the file and the operating system, but will generally be the whole file (up to 2Mb), or 1kb for filehandles that do not reference files (for example, sockets). When running under VMS, this routine performs an RMS copy of the file, in order to preserve file attributes, indexed file structure, and so on. The buffer size parameter is ignored. You may use the syntax:
use File::Copy "cp" to get at the cp() alias for the copy() function. The syntax is exactly the same. copy() returns 1 on success, 0 on failure; $! will be set if an error was encountered. File::Find--Traverse a File Tree
use File::Find; find(\&wanted, 'dir1', 'dir2'...); sub wanted { ... } use File::Find; finddepth(\&wanted, 'dir1', 'dir2'...); # traverse depth-first sub wanted { ... } find() is similar to the UNIX find (1) command in that it traverses the specified directories, performing whatever tests or other actions you request. However, these actions are given in the subroutine, wanted(), which you must define (but see find2perl below). For example, to print out the names of all executable files, you could define wanted() this way:
sub wanted { print "$File::Find::name\n" if -x; } $File::Find::dir contains the current directory name, and $_ the current filename within that directory. $File::Find::name contains "$File::Find::dir/$_". You are chdired to $File::Find::dir when find() is called. You can set $File::Find::prune to true in wanted() in order to prune the tree; that is, find() will not descend into any directory when $File::Find::prune is set. This library is primarily for use with the find2perl (1) command, which is supplied with the standard Perl distribution and converts a find (1) invocation to an appropriate wanted() subroutine. The command:
find2perl / -name .nfs\* -mtime +7 \ -exec rm -f {} \; -o -fstype nfs -prune produces something like:
sub wanted { /^\.nfs.*$/ && (($dev, $ino, $mode, $nlink, $uid, $gid) = lstat($_)) && int(-M _) > 7 && unlink($_) || ($nlink || (($dev, $ino, $mode, $nlink, $uid, $gid) = lstat($_))) && $dev < 0 && ($File::Find::prune = 1); } Set the variable $File::Find::dont_use_nlink if you're using the AFS. finddepth() is just like find(), except that it does a depth-first search. Here's another interesting wanted() function. It will find all symbolic links that don't resolve:
sub wanted { -l and not -e and print "bogus link: $File::Find::name\n"; } File::Path--Create or Remove a Series of Directories
use File::Path mkpath(['/foo/bar/baz', 'blurfl/quux'], 1, 0711); rmtree(['/foo/bar/baz', 'blurfl/quux'], 1, 1); The mkpath() function provides a convenient way to create directories, even if your mkdir (2) won't create more than one level of directory at a time. mkpath() takes three arguments:
It returns a list of all directories created, including intermediate directories, which are assumed to be delimited by the UNIX path separator, /. Similarly, the rmtree() function provides a convenient way to delete a subtree from the directory structure, much like the UNIX rm -r command. rmtree() takes three arguments:
rmtree() returns the number of files successfully deleted. Symbolic links are treated as ordinary files. FileCache--Keep More Files Open Than the System Permits
use FileCache; cacheout $path; # open the file whose path name is $path print $path "stuff\n"; # print stuff to file given by $path The cacheout() subroutine makes sure that the file whose name is $path is created and accessible through the filehandle also named $path. It permits you to write to more files than your system allows to be open at once, performing the necessary opens and closes in the background. By preceding each file access with:
cacheout $path; you can be sure that the named file will be open and ready to do business. However, you do not need to invoke cacheout() between successive accesses to the same file. cacheout() does not create directories for you. If you use it to open an existing file that FileCache is seeing for the first time, the file will be truncated to zero length with no questions asked. (However, in its opening and closing of files in the background, cacheout() keeps track of which files it has opened before and does not overwrite them, but appends to them instead.) cacheout() checks the value of NOFILE in sys/param.h to determine the number of open files allowed. This value is incorrect on some systems, in which case you should set $FileCache::maxopen to be four less than the correct value for NOFILE. FileHandle--Supply Object Methods for Filehandles
use FileHandle; $fh = new FileHandle; if ($fh->open "< file") { print <$fh>; $fh->close; } $fh = new FileHandle "> file"; if (defined $fh) { print $fh "bar\n"; $fh->close; } $fh = new FileHandle "file", "r"; if (defined $fh) { print <$fh>; undef $fh; # automatically closes the file } $fh = new FileHandle "file", O_WRONLY|O_APPEND; if (defined $fh) { print $fh "stuff\n"; undef $fh; # automatically closes the file } $pos = $fh->getpos; $fh->setpos $pos; $fh->setvbuf($buffer_var, _IOLBF, 1024); ($readfh, $writefh) = FileHandle::pipe; autoflush STDOUT 1;
The following supported FileHandle methods are just front ends for the corresponding built-in Perl functions:
The following supported FileHandle methods correspond to Perl special variables:
Furthermore, for doing normal I/O you might need these methods:
BugsDue to backward compatibility, all filehandles resemble objects of class FileHandle, or actually classes derived from that class. But they aren't. Which means you can't derive your own class from FileHandle and inherit those methods. While it may look as though the filehandle methods corresponding to the built-in variables are unique to a particular filehandle, currently some of them are not, including the following:
input_line_number() GDBM_File--Tied Access to GDBM Library
use GDBM_File; tie %hash, "GDBM_File", $filename, &GDBM_WRCREAT, 0644); # read/writes of %hash are now read/writes of $filename untie %hash; GDBM_File is a module that allows Perl programs to make use of the facilities provided by the GNU gdbm library. If you intend to use this module, you should have a copy of the gdbm (3) manpage at hand. Most of the libgdbm.a functions are available as methods of the GDBM_File interface. Availabilitygdbm is available from any GNU archive. The master site is prep.ai.mit.edu, but you are strongly urged to use one of the many mirrors. You can obtain a list of mirror sites by issuing the command, finger fsf@prep.ai.mit.edu. A copy is also stored on CPAN: See alsoDB_File library module. Getopt::Long--Extended Processing of Command-Line Options
use Getopt::Long; $result = GetOptions(option-descriptions); The Getopt::Long module implements an extended function called GetOptions(). This function retrieves and processes the command-line options with which your Perl program was invoked, based on the description of valid options that you provide. GetOptions() adheres to the POSIX syntax for command-line options, with GNU extensions. In general, this means that options have long names instead of single letters, and are introduced with a double hyphen - -. (A single hyphen can also be used, but implies restrictions on functionality. See later in the chapter.) There is no bundling of command-line options, as was the case with the more traditional single-letter approach. For example, the UNIX ps (1) command can be given the command-line argument:
-vax which means the combination of -v, -a and -x. With the Getopt::Long syntax, -vax would be a single option. Command-line options can be used to set values. These values can be specified in one of two ways:
- -size 24 - -size=24 GetOptions() is called with a list of option descriptions, each of which consists of two elements: the option specifier and the option linkage. The option specifier defines the name of the option and, optionally, the value it can take. The option linkage is usually a reference to a variable that will be set when the option is used. For example, the following call to GetOptions():
&GetOptions("size=i" => \$offset); will accept a command-line option "size" that must have an integer value. With a command line of - -size 24 this will cause the variable $offset to be assigned the value 24. Alternatively, the first argument to GetOptions may be a reference to a hash describing the linkage for the options. The following call is equivalent to the example above:
%optctl = (size => \$offset); &GetOptions(\%optctl, "size=i"); Linkage may be specified using either of the above methods, or both. The linkage specified in the argument list takes precedence over the linkage specified in the hash. The command-line options are implicitly taken from array @ARGV. Upon completion of GetOptions(), @ARGV will contain only the command-line arguments that were not options. (But see below for a way to process non-option arguments.) Each option specifier handed to GetOptions() designates the name of an option, possibly followed by an argument specifier. Values for argument specifiers are:
A lone hyphen - is considered an option; the corresponding option name is the empty string. A lone double hyphen - - terminates the processing of options and arguments. Any options following the double hyphen will remain in @ARGV when GetOptions() returns. If an argument specifier concludes with @ (as in =s@), then the option is treated as an array. That is, multiple invocations of the same option, each with a particular value, will result in the list of values being assigned to the option variable, which is an array. See the following section for an example. Linkage specificationThe linkage specifier is optional. If no linkage is explicitly specified but a hash reference is passed, GetOptions() will place the value in the hash. For example:
%optctl = (); &GetOptions (\%optctl, "size=i"); will perform the equivalent of the assignment:
$optctl{"size"} = 24; For array options, a reference to an anonymous array is generated. For example:
%optctl = (); &GetOptions (\%optctl, "sizes=i@"); with command-line arguments:
-sizes 24 -sizes 48 will perform the equivalent of the assignment:
$optctl{"sizes"} = [24, 48]; If no linkage is explicitly specified and no hash reference is passed, GetOptions() will put the value in a global variable named after the option, prefixed by opt_. To yield a usable Perl variable, characters that are not part of the syntax for variables are translated to underscores. For example, - -fpp-struct-return will set the variable $opt_fpp_struct_return. (Note that this variable resides in the namespace of the calling program, not necessarily main.) For example:
&GetOptions ("size=i", "sizes=i@"); with command line:
-size 10 -sizes 24 -sizes 48 will perform the equivalent of the assignments:
$opt_size = 10; @opt_sizes = (24, 48); A lone hyphen (-) is considered an option; the corresponding identifier is $opt_ . The linkage specifier can be a reference to a scalar, a reference to an array, or a reference to a subroutine:
Aliases and abbreviationsThe option specifier may actually include a "|"-separated list of option names:
foo|bar|blech=s In this example, foo is the true name of the option. If no linkage is specified, options -foo, -bar and -blech all will set $opt_foo. Options may be invoked as unique abbreviations, depending on configuration variable $Getopt::Long::autoabbrev. Non-option callback routineA special option specifier <> can be used to designate a subroutine to handle non-option arguments. For example:
&GetOptions(..."<>", \&mysub...); In this case GetOptions() will immediately call &mysub for every non-option it encounters in the options list. This subroutine gets the name of the non-option passed. This feature requires $Getopt::Long::order to have the value of the predefined and exported variable, $PERMUTE. See also the examples. Option startersOn the command line, options can start with - (traditional), - - (POSIX), and + (GNU, now being phased out). The latter is not allowed if the environment variable POSIXLY_CORRECT has been defined. Options that start with - - may have an argument appended, following an equals sign (=). For example: - -foo=bar. Return valueA return status of 0 (false) indicates that the function detected one or more errors. Configuration variablesThe following variables can be set to change the default behavior of GetOptions():
ExamplesIf the option specifier is one:i (which takes an optional integer argument), then the following situations are handled:
-one -two # $opt_one = "", -two is next option -one -2 # $opt_one = -2 Also, assume specifiers foo=s and bar:s:
-bar -xxx # $opt_bar = "", -xxx is next option -foo -bar # $opt_foo = '-bar' -foo -- # $opt_foo = '--' In GNU or POSIX format, option names and values can be combined:
+foo=blech # $opt_foo = 'blech' --bar= # $opt_bar = "" --bar=-- # $opt_bar = '--' Example using variable references:
$ret = &GetOptions ('foo=s', \$foo, 'bar=i', 'ar=s', \@ar); With command-line options -foo blech -bar 24 -ar xx -ar yy this will result in:
$bar = 'blech' $opt_bar = 24 @ar = ('xx', 'yy') Example of using the < > option specifier:
@ARGV = qw(-foo 1 bar -foo 2 blech); &GetOptions("foo=i", \$myfoo, "<>", \&mysub); Results:
&mysub("bar") will be called (with $myfoo being 1) &mysub("blech") will be called (with $myfoo being 2) Compare this with:
@ARGV = qw(-foo 1 bar -foo 2 blech); &GetOptions("foo=i", \$myfoo); This will leave the non-options in @ARGV:
$myfoo becomes 2 @ARGV becomes qw(bar blech) If you're using the use strict pragma, which requires you to employ only lexical variables or else globals that are fully declared, you will have to use the double-colon package delimiter or else the use vars pragma. For example:
use strict; use vars qw($opt_rows $opt_cols); use Getopt::Long; Getopt::Std--Process Single-Character Options with Option Clustering
use Getopt::Std; getopt('oDI'); # -o, -D & -I take arg. Sets opt_* as a side effect. getopts('oif:'); # -o & -i are boolean flags, -f takes an argument. # Sets opt_* as a side effect. The getopt() and getopts() functions give your program simple mechanisms for processing single-character options. These options can be clustered (for example, -bdLc might be interpreted as four single-character options), and you can specify individual options that require an accompanying argument. When you invoke getopt() or getopts(), you pass along information about the kinds of options your program expects. These functions then analyze @ARGV, extract information about the options, and return this information to your program in a set of variables. The processing of @ARGV stops when an argument without a leading "-" is encountered, if that argument is not associated with a preceding option. Otherwise, @ARGV is processed to its end and left empty. For each option in your program's invocation, both getopt() and getopts() define a variable $opt_x where x is the option name. If the option takes an argument, then the argument is read and assigned to $opt_x as its value; otherwise, a value of 1 is assigned to the variable. Invoke getopt() with one argument, which should contain all options that require a following argument. For example:
getopt('dV'); If your program is then invoked as:
myscr -bfd January -V 10.4 then these variables will be set in the program:
$opt_b = 1; $opt_f = 1; $opt_d = "January"; $opt_V = 10.4; Space between an option and its following argument is unnecessary. The previous command line could have been given this way:
myscr -bfdJanuary -V10.4 In general, your program can be invoked with options given in any order. All options not "declared" in the invocation of getopt() are assumed to be without accompanying argument. Where getopt() allows any single-character option, getopts() allows only those options you declare explicitly. For example, this invocation:
getopts('a:bc:'); legitimizes only the options -a, -b, and -c. The colon following the a and c means that these two options require an accompanying argument; b is not allowed to have an argument. Accordingly, here are some ways to invoke the program:
myscr -abc # WRONG unless bc is really the argument to -a myscr -a -bc # WRONG, with same qualification myscr -a foo -bc bar # $opt_a = "foo"; $opt_b = 1; $opt_c = "bar" myscr -bafoo -cbar # same as previous getopts() returns false if it encounters errors during option processing. However, it continues to process arguments and assign values as best it can to $opt_x variables. You should always check for errors before assuming that the variables hold meaningful values. getopt() does not return a meaningful value. Remember that both getopt() and getopts() halt argument processing upon reading an argument (without leading "-") where none was called for. This is not considered an error. So a user might invoke your program with invalid arguments, without your being notified of the fact. However, you can always check to see whether @ARGV has been completely emptied or not--that is, whether all arguments have been processed. If you're using the use strict pragma, which requires you to employ only lexical variables or else globals that are fully declared, you will have to use the double-colon package delimiter or else the use vars pragma. For example:
use strict; use vars qw($opt_o $opt_i $opt_D); use Getopt::Std; I18N::Collate--Compare 8-bit Scalar Data According to the Current Locale
use I18N::Collate; setlocale(LC_COLLATE, $locale); # uses POSIX::setlocale $s1 = new I18N::Collate "scalar_data_1"; $s2 = new I18N::Collate "scalar_data_2"; This module provides you with objects that can be collated (ordered) according to your national character set, provided that Perl's POSIX module and the POSIX setlocale (3) and strxfrm (3) functions are available on your system. $locale in the setlocale() invocation shown above must be an argument acceptable to setlocale (3) on your system. See the setlocale (3) manpage for further information. Available locales depend upon your operating system. Here is an example of collation within the standard `C' locale:
use I18N::Collate; setlocale(LC_COLLATE, 'C'); $s1 = new I18N::Collate "Hello"; $s2 = new I18N::Collate "Goodbye"; # following line prints "Hello comes before Goodbye" print "$$s1 comes before $$s2" if $s2 le $s1; The objects returned by the new() method are references. You can get at their values by dereferencing them--for example, $$s1 and $$s2. However, Perl's built-in comparison operators are overloaded by I18N::Collate, so that they operate on the objects returned by new() without the necessity of dereference. The print line above dereferences $s1 and $s2 to access their values directly, but does not dereference the variables passed to the le operator. The comparison operators you can use in this way are the following:
< <= > >= == != <=> lt le gt ge eq ne cmp I18N::Collate uses POSIX::setlocale() and POSIX::strxfrm() to perform the collation. Unlike strxfrm(), however, I18N::Collate handles embedded NULL characters gracefully. To determine which locales are available with your operating system, check whether the command:
locale -a lists them. You can also check the locale (5) or nlsinfo manpages, or look at the filenames within one of these directories (or their subdirectories): /usr/lib/nls, /usr/share/lib/locale, or /etc/locale. Not all locales your vendor supports are necessarily installed. Please consult your operating system's documentation and possibly your local system administrator. integer--Do Arithmetic in Integer Instead of Double
use integer; $x = 10/3; # $x is now 3, not 3.33333333333333333 This module tells the compiler to use integer operations from here to the end of the enclosing block. On many machines, this doesn't matter a great deal for most computations, but on those without floating point hardware, it can make a big difference. This pragma does not automatically cast everything to an integer; it only forces integer operations on arithmetic. For example:
use integer; print sin(3); # 0.141120008059867 print sin(3) + 4; # 4 You can turn off the integer pragma within an inner block by using the no integer directive. IPC::Open2--Open a Process for Both Reading and Writing
use IPC::Open2; # with named filehandles $pid = open2(\*RDR, \*WTR, $cmd_with_args); $pid = open2(\*RDR, \*WTR, $cmd, "arg1", "arg2", ...);
# with object-oriented handles use FileHandle; my($rdr, $wtr) = (FileHandle->new, FileHandle->new); $pid = open2($rdr, $wtr, $cmd_with_args); The open2() function forks a child process to execute the specified command. The first two arguments represent filehandles, one way or another. They can be FileHandle objects, or they can be references to typeglobs, which can either be explicitly named as above, or generated by the Symbol package, as in the example below. Whichever you choose, they represent handles through which your program can read from the command's standard output and write to the command's standard input, respectively. open2() differs from Perl's built-in open function in that it allows your program to communicate in both directions with the child process. open2() returns the process ID of the child process. On failure it reports a fatal error. Here's a simple use of open2() by which you can give the program user interactive access to the bc (1) command. (bc is an arbitrary-precision arithmetic package.) In this case we use the Symbol module to produce "anonymous" symbols:
use IPC::Open2; use Symbol; $WTR = gensym(); # get a reference to a typeglob $RDR = gensym(); # and another one $pid = open2($RDR, $WTR, 'bc'); while (<STDIN>) { # read commands from user print $WTR $_; # write a command to bc(1) $line = <$RDR>; # read the output of bc(1) print STDOUT "$line"; # send the output to the user } open2() establishes unbuffered output for $WTR. However, it cannot control buffering of output from the designated command. Therefore, be sure to heed the following warning.
It is extremely easy for your program to hang while waiting to read the next line of output from the command. In the example just shown, bc is known to read and write one line at a time, so it is safe. But utilities like sort (1) that read their entire input stream before offering any output will cause a deadlock when used in the manner we have illustrated. You might do something like this instead:
$pid = open2($RDR, $WTR, 'sort'); while (<STDIN>) { print $WTR $_; } close($WTR); # finish sending all output to sort(1) while (<$RDR>) { # now read the output of sort(1) print STDOUT "$_"; } More generally, you may have to use select to determine which file descriptors are ready to read, and then sysread for the actual reading.
See alsoThe IPC::open3 module shows an alternative that handles STDERR as well. IPC::Open3--Open a Process for Reading, Writing, and Error Handling
use IPC::Open3; $pid = open3($WTR, $RDR, $ERR, $cmd_with_args); $pid = open3($WTR, $RDR, $ERR, $cmd, "arg1", "arg2", ...); IPC::Open3 works like IPC::Open2, with the following differences:
Warnings given for IPC::Open2 regarding possible program hangs apply to IPC::Open3 as well. lib--Manipulate @INC at Compile-Time
use lib LIST; no lib LIST; This module simplifies the manipulation of Perl's special @INC variable at compile-time. It is used to add extra directories to Perl's search path so that later use or require statements will find modules not located along Perl's default search path. Adding directoriesDirectories itemized in LIST are added to the start of the Perl search path. Saying:
use lib LIST; is almost the same as saying:
BEGIN { unshift(@INC, LIST ) } The difference is that, for each directory in LIST (called $dir here), the lib module also checks to see whether a directory called $dir/$archname/auto exists, where $archname is derived from Perl's configuration information:
use Config; $archname = $Config{'archname'}; If so, the $dir/$archname directory is assumed to be an architecture-specific directory and is added to @INC in front of $dir. If LIST includes both $dir and $dir/$archname, then $dir/$archname will be added to @INC twice (assuming $dir/$archname/auto exists). Deleting directoriesYou should normally only add directories to @INC. If you need to delete directories from @INC, take care to delete only those you yourself added. Otherwise, be certain that the directories you delete are not needed by other modules directly or indirectly invoked by your script. Other modules may have added directories they need for correct operation. By default the statement:
no lib LIST deletes the first instance of each named directory from @INC. To delete multiple instances of the same name from @INC you can specify the name multiple times. To delete all instances of all the specified names from @INC you can specify :ALL as the first parameter of LIST. For example:
no lib qw(:ALL .); For each directory in LIST (called $dir here) the lib module also checks to see whether a directory called $dir/$archname/auto exists. If so, the $dir/$archname directory is assumed to be a corresponding architecture-specific directory and is also deleted from @INC. If LIST includes both $dir and $dir/$archname then $dir/$archname will be deleted from @INC twice (assuming $dir/$archname/auto exists). Restoring the original directory listWhen the lib module is first loaded, it records the current value of @INC in an array @lib::ORIG_INC. To restore @INC to that value you can say:
@INC = @lib::ORIG_INC; See alsoThe AddINC module (not in the standard Perl library, but available from CPAN) deals with paths relative to the source file. Math::BigFloat--Arbitrary-Length, Floating-Point Math Package
use Math::BigFloat; $f = Math::BigFloat->new($string); # NSTR is a number string; SCALE is an integer value. # In all following cases $f remains unchanged. # All methods except fcmp() return a number string. $f->fadd(NSTR); # return sum of NSTR and $f $f->fsub(NSTR); # return $f minus NSTR $f->fmul(NSTR); # return $f multiplied by NSTR $f->fdiv(NSTR[,SCALE]); # return $f divided by NSTR to SCALE places $f->fneg(); # return negative of $f $f->fabs(); # return absolute value of $f $f->fcmp(NSTR); # compare $f to NSTR; see below for return value $f->fround(SCALE); # return rounded value of $f to SCALE digits $f->ffround(SCALE); # return rounded value of $f at SCALEth place $f->fnorm(); # return normalization of $f $f->fsqrt([SCALE]); # return sqrt of $f to SCALE places This module allows you to use floating-point numbers of arbitrary length. For example:
$float = new Math::BigFloat "2.123123123123123123123123123123123"; Number strings (NSTRs) have the form, /[+-]\d*\.?\d*E[+-]\d+/. Embedded white space is ignored, so that the number strings used in the following two lines are identical:
$f = Math::BigFloat->new("-20.0 0732"); $g = $f->fmul("-20.00732"); The return value NaN indicates either that an input parameter was "Not a Number", or else that you tried to divide by zero or take the square root of a negative number. The fcmp() method returns -1, 0, or 1 depending on whether $f is less than, equal to, or greater than the number string given as an argument. If the number string is undefined or null, the undefined value is returned. If SCALE is unspecified, division is computed to the number of digits given by:
max($div_scale, length(dividend)+length(divisor)) A similar default scale value is computed for square roots. When you use this module, Perl's basic math operations are overloaded with routines from Math::BigFloat. Therefore, you don't have to employ the methods shown above to multiply, divide, and so on. You can rely instead on the usual operators. Given this code:
$f = Math::BigFloat->new("20.00732"); $g = Math::BigFloat->new("1.7"); the following six lines all yield the corresponding values for $h:
$h = -20.00732 * 1.7; # 34.012444 (ordinary math--$h is not an object) $h = $f * $g; # "34.012444" ($h is now a BigFloat object) $h = $f * 1.7; # "34.012444" ($h is now a BigFloat object) $h = -20.00732 * $g; # "34.012444" ($h is now a BigFloat object) $h = $f->fmul($g); # "+34012444E-6" ($h is now a BigFloat object) $h = $f->fmul(1.7); # "+34012444E-6" ($h is now a BigFloat object) Math::BigInt--Arbitrary-Length Integer Math Package
use Math::BigInt; $i = Math::BigInt->new($string); # BINT is a big integer string; in all following cases $i remains unchanged. # All methods except bcmp() return a big integer string, or strings. $i->bneg; # return negative of $i $i->babs # return absolute value of $i $i->bcmp(BINT) # compare $i to BINT; see below for return value $i->badd(BINT) # return sum of BINT and $i $i->bsub(BINT) # return $i minus BINT $i->bmul(BINT) # return $i multiplied by BINT $i->bdiv(BINT) # return $i divided by BINT; see below for return value $i->bmod(BINT) # return $i modulus BINT $i->bgcd(BINT) # return greatest common divisor of $i and BINT $i->bnorm # return normalization of $i This module allows you to use integers of arbitrary length. Integer strings (BINTs) have the form /^\s*[+-]?[\d\s]+$/. Embedded whitespace is ignored. Output values are always in the canonical form: /^[+-]\d+$/ . For example:
'+0' # canonical zero value ' -123 123 123' # canonical value: '-123123123' '1 23 456 7890' # canonical value: '+1234567890' The return value NaN results when an input argument is not a number, or when a divide by zero is attempted. The bcmp() method returns -1, 0, or 1 depending on whether $f is less than, equal to, or greater than the number string given as an argument. If the number string is undefined or null, the undefined value is returned. In a list context the bdiv() method returns a two-element array containing the quotient of the division and the remainder; in a scalar context only the quotient is returned. When you use this module, Perl's basic math operations are overloaded with routines from Math::BigInt. Therefore, you don't have to employ the methods shown above to multiply, divide, and so on. You can rely instead on the usual operators. Given this code:
$a = Math::BigInt->new("42 000 000 000 000"); $b = Math::BigInt->new("-111111"); the following five lines yield these string values for $c:
$c = 42000000000000 - -111111; # 42000000111111; ordinary math--$c is a double $c = $a - $b; # "+42000000111111"; $c is now a BigInt object $c = $a - -111111; # "+42000000111111"; $c is now a BigInt object $c = $a->bsub($b); # "+42000000111111"; $c is just a string $c = $a->bsub(-111111); # "+42000000111111"; $c is just a string Math::Complex--Complex Numbers Package
use Math::Complex; $cnum = new Math::Complex; When you use this module, complex numbers declared as:
$cnum = Math::Complex->new(1, 1); can be manipulated with overloaded math operators. The operators:
+ - * / neg ~ abs cos sin exp sqrt are supported, and return references to new objects. Also,
"" (stringify) is available to convert complex numbers to strings. In addition, the methods:
Re Im arg are available. Given a complex number, $cnum:
$cnum = Math::Complex->new($x, $y); then $cnum->Re() returns $x, $cnum->Im() returns $y, and $cnum->arg() returns atan2($y, $x). sqrt(), which should return two roots, returns only one. NDBM_File--Tied Access to NDBM Files
use Fcntl; use NDBM_File; tie(%hash, NDBM_File, 'Op.dbmx', O_RDWR|O_CREAT, 0644); # read/writes of %hash are now read/writes of the file, Op.dmx.pag untie %hash; See Perl's built-in tie function. Also see under DB_File in this chapter for a description of a closely related module. Net::Ping--Check Whether a Host Is Online
use Net::Ping; $hostname = 'elvis'; # host to check $timeout = 10; # how long to wait for a response print "elvis is alive\n" if pingecho($hostname, $timeout); pingecho() uses a TCP echo (not an ICMP one) to determine whether a remote host is reachable. This is usually adequate to tell whether a remote host is available to rsh (1), ftp (1), or telnet (1). The parameters for pingecho() are:
pingecho() uses alarm to implement the timeout, so don't set another alarm while you are using it.
ODBM_File--Tied Access to ODBM Files
use Fcntl; use ODBM_File; tie(%hash, ODBM_File, 'Op.dbmx', O_RDWR|O_CREAT, 0644); # read/writes of %hash are now read/writes of the file, Op.dmx untie %h; See Perl's built-in tie function. Also see under DB_File in this chapter for a description of a closely related module. overload--Overload Perl's Mathematical Operations
# In the SomeThing module: package SomeThing; use overload '+' => \&myadd, '-' => \&mysub; # In your other code: use SomeThing; $a = SomeThing->new(57); $b=5+$a; if (overload::Overloaded $b) {...} # is $b subject to overloading? $strval = overload::StrVal $b; Caveat Scriptor: This interface is the subject of ongoing research. Feel free to play with it, but don't be too surprised if the interface changes subtly (or not so subtly) as it is developed further. If you rely on it for a mission-critical application, please be sure to write some good regression tests. (Or perhaps in this case we should call them "progression" tests.) This module allows you to substitute class methods or your own subroutines for standard Perl operators. For example, the code:
package Number; use overload "+" => \&add, "*=" => "muas"; declares function add() for addition, and method muas() in the Number class (or one of its base classes) for the assignment form *= of multiplication. Arguments to use overload come in key/value pairs. Legal values are values permitted inside a &{ ... } call, so the name of a subroutine, a reference to a subroutine, or an anonymous subroutine will all work. Legal keys are listed below. The subroutine add() will be called to execute $a+$b if $a is a reference to an object blessed into the package Number, or if $a is not an object from a package with overloaded addition, but $b is a reference to a Number. It can also be called in other situations, like $a+=7, or $a++. See the section on "Autogeneration". Calling conventions for binary operationsThe functions specified with the use overload directive are typically called with three arguments. (See the "No Method" section later in this chapter for the four-argument case.) If the corresponding operation is binary, then the first two arguments are the two arguments of the operation. However, due to general object-calling conventions, the first argument should always be an object in the package, so in the situation of 7+$a, the order of the arguments gets interchanged before the method is called. It probably does not matter when implementing the addition method, but whether the arguments are reversed is vital to the subtraction method. The method can query this information by examining the third argument, which can take three different values:
Calling conventions for unary operationsUnary operations are considered binary operations with the second argument being undef. Thus the function that overloads {"++"} is called with arguments ($a, undef, ``) when $a++ is executed. Overloadable operationsThe following operations can be specified with use overload:
Three keys are recognized by Perl that are not covered by the above descriptions: "nomethod", "fallback", and "=". No method"nomethod" should be followed by a reference to a function of four parameters. If defined, it is called when the overloading mechanism cannot find a method for some operation. The first three arguments of this function coincide with the arguments for the corresponding method if it were found; the fourth argument is the symbol corresponding to the missing method. If several methods are tried, the last one is used. For example, 1-$a can be equivalent to:
&nomethodMethod($a, 1, 1, "-") if the pair `nomethod` => `nomethodMethod` was specified in the use overload directive. If some operation cannot be resolved and there is no function assigned to "nomethod", then an exception will be raised via die unless "fallback" was specified as a key in a use overload directive. FallbackThe "fallback" key governs what to do if a method for a particular operation is not found. Three different cases are possible depending on the value of "fallback":
Copy constructorThe value for "=" is a reference to a function with three arguments; that is, it looks like the other values in use overload. However, it does not overload the Perl assignment operator. This would rub Camel hair the wrong way. This operation is called when a mutator is applied to a reference that shares its object with some other reference, such as:
$a=$b; $a++; In order to change $a but not $b, a copy of $$a is made, and $a is assigned a reference to this new object. This operation is done during execution of the $a++, and not during the assignment, (so before the increment $$a coincides with $$b). This is only done if ++ is expressed via a method for "++" or "+=". Note that if this operation is expressed via "+" (a nonmutator):
$a=$b; $a=$a+1; then $a does not reference a new copy of $$a, since $$a does not appear as an lvalue when the above code is executed. If the copy constructor is required during the execution of some mutator, but a method for "=" was not specified, it can be autogenerated as a string copy if the object is a plain scalar. As an example, the actually executed code for:
$a=$b; # Something else which does not modify $a or $b... ++$a; may be:
$a=$b; # Something else which does not modify $a or $b... $a = $a->clone(undef, ""); $a->incr(undef, ""); This assumes $b is subject to overloading, "++" was overloaded with \&incr, and "=" was overloaded with \&clone. AutogenerationIf a method for an operation is not found, and the value for "fallback" is true or undefined, Perl tries to autogenerate a substitute method for the missing operation based on the defined operations. Autogenerated method substitutions are possible for the following operations:
One restriction for the comparison operation is that even if, for example, cmp returns a blessed reference, the autogenerated lt function will produce only a standard logical value based on the numerical value of the result of cmp. In particular, a working numeric conversion is needed in this case (possibly expressed in terms of other conversions). Similarly, .= and x= operators lose their overloaded properties if the string conversion substitution is applied. When you chop an object that is subject to overloaded operations, the object is promoted to a string and its overloading properties are lost. The same can happen with other operations as well.
Run-time overloadingSince all use directives are executed at compile-time, the only way to change overloading during run-time is:
eval 'use overload "+" => \&addmethod'; You can also say:
eval 'no overload "+", "--", "<="'; although the use of these constructs during run-time is questionable. Public functionsThe overload module provides the following public functions:
DiagnosticsWhen Perl is run with the -Do switch or its equivalent, overloading induces diagnostic messages. BugsBecause it is used for overloading, the per-package associative array %OVERLOAD now has a special meaning in Perl. Overloading is not yet inherited via the @ISA tree, though individual methods may be. POSIX--Perl Interface to IEEE Std 1003.1
use POSIX; # import all symbols use POSIX qw(setsid); # import one symbol use POSIX qw(:errno_h :fcntl_h); # import sets of symbols printf "EINTR is %d\n", EINTR; $sess_id = POSIX::setsid(); $fd = POSIX::open($path, O_CREAT|O_EXCL|O_WRONLY, 0644); # note: $fd is a filedescriptor, *NOT* a filehandle The POSIX module permits you to access all (or nearly all) the standard POSIX 1003.1 identifiers. Many of these identifiers have been given Perl-ish interfaces. This description gives a condensed list of the features available in the POSIX module. Consult your operating system's manpages for general information on most features. Consult the appropriate Perl built-in function whenever a POSIX routine is noted as being identical to the function. The "Classes" section later in this chapter describes some classes for signal objects, TTY objects, and other miscellaneous objects. The "Functions" section later in this chapter describes POSIX functions from the 1003.1 specification. The remaining sections list various constants and macros in an organization that roughly follows IEEE Std 1003.1b-1993.
A few functions are not implemented because they are C-specific.[4] If you attempt to call one of these functions, it will print a message telling you that it isn't implemented, and will suggest using the Perl equivalent, should one exist. For example, trying to access the setjmp() call will elicit the message: "setjmp() is C-specific: use eval {} instead".
Furthermore, some vendors will claim 1003.1 compliance without passing the POSIX Compliance Test Suites (PCTS). For example, one vendor may not define EDEADLK, or may incorrectly define the semantics of the errno values set by open (2). Perl does not attempt to verify POSIX compliance. That means you can currently say "use POSIX" successfully, and then later in your program find that your vendor has been lax and there's no usable ICANON macro after all. This could be construed to be a bug. Whose bug, we won't venture to guess.
ClassesPOSIX::SigAction
POSIX::SigSet
POSIX::Termios
While these constants are associated with the Termios class, note that they are actually symbols in the POSIX package. Here's an example of a complete program for getting unbuffered, single-character input on a POSIX system:
#!/usr/bin/perl -w use strict; $| = 1; for (1..4) { my $got; print "gimme: "; $got = getone(); print "--> $got\n"; } exit; BEGIN { use POSIX qw(:termios_h); my ($term, $oterm, $echo, $noecho, $fd_stdin); $fd_stdin = fileno(STDIN); $term = POSIX::Termios->new(); $term->getattr($fd_stdin); $oterm = $term->getlflag(); $echo = ECHO | ECHOK | ICANON; $noecho = $oterm & ~$echo; sub cbreak { $term->setlflag($noecho); $term->setcc(VTIME, 1); $term->setattr($fd_stdin, TCSANOW); } sub cooked { $term->setlflag($oterm); $term->setcc(VTIME, 0); $term->setattr($fd_stdin, TCSANOW); } sub getone { my $key = ""; cbreak(); sysread(STDIN, $key, 1); cooked(); return $key; } } END { cooked() } Functions
Pathname constants
POSIX constants
System configuration
Error constants
File control constants
Floating-point constants
Limit constants
Locale constants
Math constants
HUGE_VAL Signal constants
Stat constants
Stat macros
Stdlib constants
Stdio constants
Time constants
Unistd constants
Wait constants
Wait macros
Pod::Text--Convert POD Data to Formatted ASCII Text
use Pod::Text; pod2text("perlfunc.pod", *filehandle); # send formatted output to file $text = pod2text("perlfunc.pod"); # assign formatted output to $text Pod::Text converts documentation in the POD format (such as can be found throughout the Perl distribution) into formatted ASCII text. Termcap is optionally supported for boldface/underline, and can be enabled with:
$Pod::Text::termcap=1 If termcap is not enabled, backspaces are used to simulate bold and underlined text. The pod2text() subroutine can take one or two arguments. The first is the name of a file to read the POD from, or "<&STDIN" to read from STDIN. The second argument, if provided, is a filehandle glob where output should be sent. (Use *STDOUT to write to STDOUT.) A separate pod2text program is included as part of the standard Perl distribution. Primarily, a wrapper for Pod::Text, it can be invoked this way:
pod2text < input.pod Safe--Create Safe Namespaces for Evaluating Perl Code
use Safe; $cpt = new Safe; # create a new safe compartment The Safe extension module allows the creation of compartments in which untrusted Perl code can be evaluated. Each compartment provides a new namespace and has an associated operator mask. The root of the namespace (that is, main::) is changed to a different package, and code evaluated in the compartment cannot refer to variables outside this namespace, even with run-time glob lookups and other tricks. Code that is compiled outside the compartment can choose to place variables into (or share variables with) the compartment's namespace, and only that data will be visible to code evaluated in the compartment. By default, the only variables shared with compartments are the underscore variables $_ and @_ (and, technically, the much less frequently used %_, the _ filehandle and so on). This is because otherwise Perl operators that default to $_ would not work and neither would the assignment of arguments to @_ on subroutine entry. Each compartment has an associated operator mask with which you can exclude particular Perl operators from the compartment. (The mask syntax is explained below.) Recall that Perl code is compiled into an internal format before execution. Evaluating Perl code (for example, via eval STRING or do FILE) causes the code to be compiled into an internal format and then, provided there was no error in the compilation, executed. Code evaluated in a compartment is compiled subject to the compartment's operator mask. Attempting to evaluate compartmentalized code that contains a masked operator will cause the compilation to fail with an error. The code will not be executed. By default, the operator mask for a newly created compartment masks out all operations that give access to the system in some sense. This includes masking off operators such as system, open, chown, and shmget, but operators such as print, sysread, and <FILEHANDLE> are not masked off. These file operators are allowed since, in order for the code in the compartment to have access to a filehandle, the code outside the compartment must have explicitly placed the filehandle variable inside the compartment. Since it is only at the compilation stage that the operator mask applies, controlled access to potentially unsafe operations can be achieved by having a handle to a wrapper subroutine (written outside the compartment) placed into the compartment. For example:
$cpt = new Safe; sub wrapper { ;# vet arguments and perform potentially unsafe operations } $cpt->share('&wrapper'); # see share method below An operator mask exists at user-level as a string of bytes of length MAXO, each of which is either 0x00 or 0x01. Here, MAXO is the number of operators in the current version of Perl. The subroutine MAXO (available for export by package Safe) returns the number of operators in the currently running Perl executable. The presence of a 0x01 byte at offset n of the string indicates that operator number n should be masked (that is, disallowed). The Safe extension makes available routines for converting from operator names to operator numbers (and vice versa) and for converting from a list of operator names to the corresponding mask (and vice versa). Methods in class SafeTo create a new compartment, use:
$cpt = new Safe NAMESPACE, MASK; where NAMESPACE is the root namespace to use for the compartment (defaults to Safe::Root000000000, auto-incremented for each new compartment). MASK is the operator mask to use. Both arguments are optional. The following methods can then be used on the compartment object returned by the above constructor. The object argument is implicit in each case.
Subroutines in package SafeThe Safe package contains subroutines for manipulating operator names and operator masks. All are available for export by the package. The canonical list of operator names is contained in the array op_name defined and initialized in file opcode.h of the Perl source distribution.
SDBM_File--Tied Access to SDBM Files
use Fcntl; use SDBM_File; tie(%hash, SDBM_File, 'Op.dbmx', O_RDWR|O_CREAT, 0644); # read/writes of %hash are now read/writes of the file, Op.dmx.pag untie %h; See Perl's built-in tie function. Also see the DB_File module in this chapter for a description of a closely related module. Search::Dict--Search for Key in Dictionary File
use Search::Dict; look *FILEHANDLE, $key, $dict, $fold; The look() routine sets the file position in FILEHANDLE to be the first line greater than or equal (stringwise) to $key. It returns the new file position, or -1 if an error occurs. If $dict is true, the search is in dictionary order (ignoring everything but word characters and whitespace). If $fold is true, then case is ignored. The file must be sorted into the appropriate order, using the -d and -f flags of UNIX sort (1), or the equivalent command on non-UNIX machines. Unpredictable results will otherwise ensue. SelectSaver--Save and Restore Selected Filehandle
use SelectSaver; select $fh_old; { my $saver = new SelectSaver($fh_new); # selects $fh_new } # block ends; object pointed to by "my" $saver is destroyed # previous handle, $fh_old is now selected # alternative invocation, without filehandle argument my $saver = new SelectSaver; # selected filehandle remains unchanged A SelectSaver object contains a reference to the filehandle that was selected when the object was created. If its new() method is given a filehandle as an argument, then that filehandle is selected; otherwise, the selected filehandle remains unchanged. When a SelectSaver object is destroyed, the filehandle that was selected immediately prior to the object's creation is re-selected. SelfLoader--Load Functions Only on Demand
package GoodStuff; use SelfLoader; [initializing code] _ _DATA_ _ sub {...}; This module is used for delayed loading of Perl functions that (unlike AutoLoader functions) are packaged within your script file. This gives the appearance of faster loading. In the example above, SelfLoader tells its user (GoodStuff) that functions in the GoodStuff package are to be autoloaded from after the _ _DATA_ _ token. The _ _DATA_ _ token tells Perl that the code for compilation is finished. Everything after the _ _DATA_ _ token is available for reading via the filehandle GoodStuff::DATA, where GoodStuff is the name of the current package when the _ _DATA_ _ token is reached. This token works just the same as _ _END_ _ does in package main, except that data after _ _END_ _ is retrievable only in package main, whereas data after _ _DATA_ _ is retrievable in whatever the current package is. Note that it is possible to have _ _DATA_ _ tokens in the same package in multiple files, and that the last _ _DATA_ _ token in a given package that is encountered by the compiler is the one accessible by the filehandle. That is, whenever the _ _DATA_ _ token is parsed, any DATA filehandle previously open in the current package (opened in a different file, presumably) is closed so that the new one can be opened. (This also applies to _ _END_ _ and the main::DATA filehandle: main::DATA is reopened whenever _ _END_ _ is encountered, so any former association is lost.) SelfLoader autoloadingThe SelfLoader will read from the GoodStuff::DATA filehandle to get definitions for functions placed after _ _DATA_ _, and then eval the requested subroutine the first time it's called. The costs are the one-time parsing of the data after _ _DATA_ _, and a load delay for the first call of any autoloaded function. The benefits are a speeded up compilation phase, with no need to load functions that are never used. You can use _ _END_ _ after _ _DATA_ _. The SelfLoader will stop reading from DATA if it encounters the _ _END_ _ token, just as you might expect. If the _ _END_ _ token is present, and is followed by the token DATA, then the SelfLoader leaves the GoodStuff::DATA filehandle open on the line after that token. The SelfLoader exports the AUTOLOAD subroutine to the package using the SelfLoader, and this triggers the automatic loading of an undefined subroutine out of its DATA portion the first time that subroutine is called. There is no advantage to putting subroutines that will always be called after the _ _DATA_ _ token. Autoloading and file-scoped lexicalsA my $pack_lexical statement makes the variable $pack_lexical visible only up to the _ _DATA_ _ token. That means that subroutines declared elsewhere cannot see lexical variables. Specifically, autoloaded functions cannot see such lexicals (this applies to both the SelfLoader and the Autoloader). The use vars pragma (see later in this chapter) provides a way to declare package-level globals that will be visible to autoloaded routines. SelfLoader and AutoLoaderThe SelfLoader can replace the AutoLoader--just change use AutoLoader to use SelfLoader[5] and the _ _END_ _ token to _ _DATA_ _.
There is no need to inherit from the SelfLoader. The SelfLoader works similarly to the AutoLoader, but picks up the subroutine definitions from after the _ _DATA_ _ instead of in the lib/auto/ directory. SelfLoader needs less maintenance at the time the module is installed, since there's no need to run AutoSplit. And it can run faster at load time because it doesn't need to keep opening and closing files to load subroutines. On the other hand, it can run slower because it needs to parse the code after the _ _DATA_ _. Details of the AutoLoader and another view of these distinctions can be found in that module's documentation. How to read DATA from your Perl program(This section is only relevant if you want to use the GoodStuff::DATA together with the SelfLoader.) The SelfLoader reads from wherever the current position of the GoodStuff::DATA filehandle is, until EOF or the _ _END_ _ token. This means that if you want to use that filehandle (and only if you want to), you should either
You could even conceivably do both. Classes and inherited methodsThis section is only relevant if your module is a class, and has methods that could be inherited. A subroutine stub (or forward declaration) looks like:
sub stub; That is, it is a subroutine declaration without the body of the subroutine. For modules that aren't classes, there is no real need for stubs as far as autoloading is concerned. For modules that are classes, and need to handle inherited methods, stubs are needed to ensure that the method inheritance mechanism works properly. You can load the stubs into the module at require time, by adding the statement SelfLoader->load_stubs(); to the module to do this. The alternative is to put the stubs in before the _ _DATA_ _ token before releasing the module, and for this purpose the Devel::SelfStubber module is available. However this does require the extra step of ensuring that the stubs are in the module. If you do this, we strongly recommended that you do it before releasing the module and not at install time. Multiple packages and fully qualified subroutine namesSubroutines in multiple packages within the same file are supported--but you should note that this requires exporting SelfLoader::AUTOLOAD to every package which requires it. This is done automatically by the SelfLoader when it first loads the subs into the cache, but you should really specify it in the initialization before the _ _DATA_ _ by putting a use SelfLoader statement in each package. Fully qualified subroutine names are also supported. For example:
_ _DATA_ _ sub foo::bar {23} package baz; sub dob {32} will all be loaded correctly by the SelfLoader, and the SelfLoader will ensure that the packages "foo" and "baz" correctly have the SelfLoader::AUTOLOAD method when the data after _ _DATA_ _ is first parsed. See the discussion of autoloading in Chapter 5, Packages, Modules, and Object Classes. Also see the AutoLoader module, a utility that handles modules that have been into a collection of files for autoloading. Shell--Run Shell Commands Transparently Within Perl
use Shell qw(date cp ps); # list shell commands you want to use $date = date(); # put the output of the date(1) command into $date cp("-p" "/etc/passwd", "/tmp/passwd"); # copy password file to a tmp file print ps("-ww"); # print the results of a "ps -ww" command This module allows you to invoke UNIX utilities accessible from the shell command line as if they were Perl subroutines. Arguments (including switches) are passed to the utilities as strings. The Shell module essentially duplicates the built-in backtick functionality of Perl. The module was written so that its implementation could serve as a demonstration of autoloading. It also shows how function calls can be mapped to subprocesses. sigtrap--Enable Stack Backtrace on Unexpected Signals
use sigtrap; # initialize default signal handlers use sigtrap LIST; # LIST example: qw(BUS SEGV PIPE SYS ABRT TRAP) The sigtrap pragma initializes a signal handler for the signals specified in LIST, or (if no list is given) for a set of default signals. The signal handler prints a stack dump of the program and then issues a (non-trapped) ABRT signal. In the absence of LIST, the signal handler is set up to deal with the ABRT, BUS, EMT, FPE, ILL, PIPE, QUIT, SEGV, SYS, TERM, and TRAP signals. Socket--Load the C socket.h Defines and Structure Manipulators
use Socket; $proto = getprotobyname('udp'); socket(Socket_Handle, PF_INET, SOCK_DGRAM, $proto); $iaddr = gethostbyname('hishost.com'); $port = getservbyname('time', 'udp'); $sin = sockaddr_in($port, $iaddr); send(Socket_Handle, 0, 0, $sin); $proto = getprotobyname('tcp'); socket(Socket_Handle, PF_INET, SOCK_STREAM, $proto); $port = getservbyname('smtp'); $sin = sockaddr_in($port, inet_aton("127.1")); $sin = sockaddr_in(7, inet_aton("localhost")); $sin = sockaddr_in(7, INADDR_LOOPBACK); connect(Socket_Handle, $sin); ($port, $iaddr) = sockaddr_in(getpeername(Socket_Handle)); $peer_host = gethostbyaddr($iaddr, AF_INET); $peer_addr = inet_ntoa($iaddr); socket(Socket_Handle, PF_UNIX, SOCK_STREAM, 0); unlink('/tmp/usock'); $sun = sockaddr_un('/tmp/usock'); bind(Socket_Handle, $sun); This module is just a translation of the C socket.h file. Unlike the old mechanism of requiring a translated socket.ph file, this uses the h2xs program (see the Perl source distribution) and your native C compiler. This means that it has a far more likely chance of getting the numbers right. This includes all of the commonly used preprocessor-defined constants like AF_INET, SOCK_STREAM, and so on. In addition, some structure manipulation functions are available:
strict--Restrict Unsafe Constructs
use strict; # apply all possible restrictions use strict 'vars'; # restrict unsafe use of variables for rest of block use strict 'refs'; # restrict unsafe use of references for rest of block use strict 'subs'; # restrict unsafe use of barewords for rest of block no strict 'vars'; # relax restrictions on variables for rest of block no strict 'refs'; # relax restrictions on references for rest of block no strict 'subs'; # relax restrictions on barewords for rest of block If no import list is given to use strict, all possible restrictions upon unsafe Perl constructs are imposed. (This is the safest mode to operate in, but is sometimes too strict for casual programming.) Currently, there are three possible things to be strict about: refs, vars, and subs. In all cases the restrictions apply only until the end of the immediately enclosing block.
The no strict 'vars' statement negates any preceding use strict vars for the remainder of the innermost enclosing block. Likewise, no strict 'refs' negates any preceding invocation of use strict refs, and no strict 'subs' negates use strict 'subs'. The arguments to use strict are sometimes given as barewords--that is, without surrounding quotes. Be aware, however, that the following sequence will not work:
use strict; # or just: use strict subs; ... no strict subs; # WRONG! Should be: no strict 'subs'; ... The problem here is that giving subs as a bareword is no longer allowed after the use strict statement. :-) subs--Predeclare Subroutine Names
use subs qw(sub1 sub2 sub3); sub1 $arg1, $arg2; This predeclares the subroutines whose names are in the list, allowing you to use them without parentheses even before they're defined. It has the additional benefit of allowing you to override built-in functions, since you may only override built-ins via an import, and this pragma does a pseudo-import. See also the vars module. Symbol--Generate Anonymous Globs; Qualify Variable Names
use Symbol; $sym = gensym; open($sym, "filename"); $_ = <$sym>; ungensym $sym; # no effect print qualify("x"); # "main::x" print qualify("x", "FOO"); # "FOO::x" print qualify("BAR::x"); # "BAR::x" print qualify("BAR::x", "FOO"); # "BAR::x" print qualify("STDOUT", "FOO"); # "main::STDOUT" (global) print qualify(\*x); # \*x--for example: GLOB(0x99530) print qualify(\*x, "FOO"); # \*x--for example: GLOB(0x99530) gensym() creates an anonymous glob and returns a reference to it. Such a glob reference can be used as a filehandle or directory handle. For backward compatibility with older implementations that didn't support anonymous globs, ungensym() is also provided. But it doesn't do anything. qualify() turns unqualified symbol names into qualified variable names (for example, myvar becomes MyPackage::myvar). If it is given a second parameter, qualify() uses it as the default package; otherwise, it uses the package of its caller. Regardless, global variable names (for example, STDOUT, %ENV, %SIG) are always qualified with main::. Qualification applies only to symbol names (strings). References are left unchanged under the assumption that they are glob references, which are qualified by their nature. Sys::Hostname--Try Every Conceivable Way to Get Hostname
use Sys::Hostname; $host = hostname(); Attempts several methods of getting the system hostname and then caches the result. It tries syscall(SYS_gethostname), `hostname`, `uname -n`, and the file /com/host. If all that fails, it croak()s. All nulls, returns, and newlines are removed from the result. Sys::Syslog--Perl Interface to UNIX syslog(3) Calls
use Sys::Syslog; openlog $ident, $logopt, $facility; syslog $priority, $mask, $format, @args; $oldmask = setlogmask $mask_priority; closelog; Sys::Syslog is an interface to the UNIX syslog (3) program. Call syslog() with a string priority and a list of printf args just like syslog (3). Sys::Syslog needs syslog.ph, which must be created with h2ph by your system administrator. Sys::Syslog provides these functions:
Examples
openlog($program, 'cons, pid', 'user'); syslog('info', 'this is another test'); syslog('mail|warning', 'this is a better test: %d', time); closelog(); syslog('debug', 'this is the last test'); openlog("$program $$", 'ndelay', 'user'); syslog('notice', 'fooprogram: this is really done'); $! = 55; syslog('info', 'problem was %m'); # %m == $! in syslog (3) Term::Cap--Terminal Capabilities Interface
require Term::Cap; $terminal = Tgetent Term::Cap { TERM => undef, OSPEED => $ospeed }; $terminal->Trequire(qw/ce ku kd/); $terminal->Tgoto('cm', $col, $row, $FH); $terminal->Tputs('dl', $count, $FH); These are low-level functions to extract and use capabilities from a terminal capability (termcap) database. For general information about the use of this database, see the termcap (5) manpage. The "new" function of Term::Cap is Tgetent(), which extracts the termcap entry for the specified terminal type and returns a reference to a terminal object. If the value associated with the TERM key in the Tgetent() argument list is false or undefined, then it defaults to the environment variable TERM. Tgetent() looks in the environment for a TERMCAP variable. If it finds one, and if the value does not begin with a slash and looks like a termcap entry in which the terminal type name is the same as the environment string TERM, then the TERMCAP string is used directly as the termcap entry and there is no search for an entry in a termcap file somewhere. Otherwise, Tgetent() looks in a sequence of files for the termcap entry. The sequence consists of the filename in TERMCAP, if any, followed by either the files listed in the TERMPATH environment variable, if any, or otherwise the files $HOME/.termcap, /etc/termcap, and /usr/share/misc/termcap, in that order. (Filenames in TERMPATH may be separated by either a colon or a space.) Whenever multiple files are searched and a tc field occurs in the requested entry, the entry named in the tc field must be found in the same file or one of the succeeding files. If there is a tc field in the TERMCAP environment variable string, Tgetent() continues searching as indicated above. OSPEED is the terminal output bit rate (often mistakenly called the baud rate). OSPEED can be specified as either a POSIX termios/SYSV termio speed (where 9600 equals 9600) or an old BSD-style speed (where 13 equals 9600). See the next section, "Getting Terminal Output Speed", for code illustrating how to obtain the output speed. Tgetent() returns a reference to a blessed object ($terminal in the examples above). The actual termcap entry is available as $terminal->{TERMCAP}. Failure to find an appropriate termcap entry results in a call to Carp::croak(). Once you have invoked Tgetent(), you can manage a terminal by sending control strings to it with Tgoto() and Tputs(). You can also test for the existence of particular terminal capabilities with Trequire(). Trequire() checks to see whether the named capabilities have been specified in the terminal's termcap entry. For example, this line:
$terminal->Trequire(qw/ce ku kd/); checks whether the ce (clear to end of line), ku (keypad up-arrow), and kd (keypad down-arrow) capabilities have been defined. Any undefined capabilities will result in a listing of those capabilities and a call to Carp::croak(). Tgoto() produces a control string to move the cursor relative to the screen. For example, to move the cursor to the fifth line and forty-fifth column on the screen, you can say:
$row = 5; $col = 45; $terminal->Tgoto('cm', $row, $col, STDOUT); The first argument in this call must always be cm. If a file handle is given as the final argument, then Tgoto() sends the appropriate control string to that handle. With or without a handle, the routine returns the control string, so you could achieve the same effect this way:
$str = $terminal->Tgoto('cm', $row, $col); print STDOUT $str; Tgoto() performs the necessary % interpolation on the control strings. (See the termcap (5) manpage for details.) The Tputs() routine allows you to exercise other terminal capabilities. For example, the following code deletes one line at the cursor's present position, and then turns on the bold text attribute:
$count = 1; $terminal->Tputs('dl', $count, $FILEHANDLE); # delete one line $terminal->Tputs('md', $count, $FILEHANDLE); # turn on bold attribute Again, Tputs() returns the terminal control string, and the file handle can be omitted. The $count for such calls should normally be 1, unless padding is required. (Padding involves the output of "no-op" characters in order to effect a delay required by the terminal device. It is most commonly required for hardcopy devices.) A count greater than 1 is taken to specify the amount of padding. See the termcap (5) manpage for more about padding. Tputs() does not perform % interpolation. This means that the following will not work:
$terminal->Tputs('DC', 1, $FILEHANDLE); # delete one character (WRONG!) If the terminal control string requires numeric parameters, then you must do the interpolation yourself:
$str = $terminal->Tputs('DC', 1); $str =~ s/%d/7/; print STDOUT $str; # delete seven characters The output strings for Tputs() are cached for counts of 1. Tgoto() does not cache. $terminal->{_xx} is the raw termcap data and $terminal->{xx} is the cached version (where xx is the two-character terminal capability code). Getting terminal output speedYou can use the POSIX module to get your terminal's output speed for use in the Tgetent() call:
require POSIX; my $termios = new POSIX::Termios; $termios->getattr; my $ospeed = $termios->getospeed; The method using ioctl (2) works like this:
require 'ioctl.pl'; ioctl(TTY, $TIOCGETP, $sgtty); ($ispeed, $ospeed) = unpack('cc', $sgtty); Term::Complete--Word Completion Module
use Term::Complete; $input = Complete('prompt_string', \@completion_list); $input = Complete('prompt_string', @completion_list); The Complete() routine sends the indicated prompt string to the currently selected filehandle, reads the user's response, and places the response in $input. What the user types is read one character at a time, and certain characters result in special processing as follows:
The user is not prevented from providing input that differs from all strings in the completion list, or from adding to input that has been completed from the list. The final input (determined when the user presses the return key) is the string returned by Complete(). The TTY driver is put into raw mode using the system command stty raw -echo and restored using stty -raw echo. When Complete() is called multiple times, it offers the user's immediately previous response as the default response to each prompt. Test::Harness--Run Perl Standard Test Scripts with Statistics
use Test::Harness; runtests(@tests); This module is used by MakeMaker. If you're building a Perl extension and if you have test scripts with filenames matching t/*.t in the extension's subdirectory, then you can run those tests by executing the shell command, make test. runtests(@tests) runs all test scripts named as arguments and checks standard output for the expected "ok n" strings. (Standard Perl test scripts print "ok n" for each single test, where n is an integer incremented by one each time around.) After all tests have been performed, runtests() prints some performance statistics that are computed by the Benchmark module. runtests() is exported by Test::Harness by default. The test script outputThe first line output by a standard test script should be 1..m with m being the number of tests that the test script attempts to run. Any output from the test script to standard error is ignored and bypassed, and thus will be seen by the user. Lines written to standard output that look like Perl comments (starting with /^\s*\#/) are discarded. Lines containing /^(not\s+)?ok\b/ are interpreted as feedback for runtests(). The global variable $Test::Harness::verbose is exportable and can be used to let runtests() display the standard output of the script without altering the behavior otherwise. It is tolerated if the script omits test numbers after ok. In this case Test::Harness maintains its own counter. So the following script output:
1..6 not ok ok not ok ok ok will generate:
FAILED tests 1, 3, 6 Failed 3/6 tests, 50.00% okay Diagnostics
NotesTest::Harness uses $^X to determine which Perl binary to run the tests with. Test scripts running via the shebang (#!) line may not be portable because $^X is not consistent for shebang scripts across platforms. This is no problem when Test::Harness is run with an absolute path to the Perl binary or when $^X can be found in the path. Text::Abbrev--Create an Abbreviation Table from a List
use Text::Abbrev; %hash = (); abbrev(*hash, LIST); The abbrev() routine takes each string in LIST and constructs all unambiguous abbreviations (truncations) of the string with respect to the other strings in LIST. Each such truncation (including the null truncation consisting of the entire string) is used as a key in %hash for which the associated value is the non-truncated string. So, if good is the only string in LIST beginning with g, the following key/value pairs will be created:
g => good, go => good, goo => good, good => good If, on the other hand, the string go is also in the list, then good yields these key/value pairs:
goo => good, good => good and go yields only:
go => go Text::ParseWords--Parse Text into a List of Tokens
use Text::ParseWords; @words = quotewords($delim, $keep, @lines); quotewords() accepts a delimiter (which can be a regular expression) and a list of lines, and then breaks those lines up into a list of delimiter-separated words. It ignores delimiters that appear inside single or double quotes. The $keep argument is a Boolean flag. If it is false, then quotes are removed from the list of words returned by quotewords(); otherwise, quotes are retained. The value of $keep also affects the interpretation of backslashes. If $keep is true, then backslashes are fully preserved in the returned list of words. Otherwise, a single backslash disappears and a double backslash is returned as a single backslash. (Be aware, however, that, regardless of the value of $keep, a single backslash occurring within quotes causes a Perl syntax error--presumably a bug.) Text::Soundex--The Soundex Algorithm Described by Knuth
use Text::Soundex; $code = soundex $string; # get soundex code for a string @codes = soundex @list; # get list of codes for list of strings # set value to be returned for strings without soundex code $soundex_nocode = 'Z000'; This module implements the soundex algorithm as described by Donald Knuth in Volume 3 of The Art of Computer Programming. The algorithm is intended to hash words (in particular surnames) into a small space using a simple model that approximates the sound of the word when spoken by an English speaker. Each word is reduced to a four-character string, the first character being an uppercase letter and the remaining three being digits. If there is no soundex code representation for a string, then the value of $soundex_nocode is returned. This variable is initially set to the undefined value, but many people seem to prefer an unlikely value like Z000. (How unlikely this is depends on the data set being dealt with.) Any value can be assigned to $soundex_nocode. In a scalar context soundex() returns the soundex code of its first argument, and in an array context a list is returned in which each element is the soundex code for the corresponding argument passed to soundex(). For example:
@codes = soundex qw(Mike Stok); leaves @codes containing ('M200', 'S320'). Here are Knuth's examples of various names and the soundex codes they map to:
So we have:
$code = soundex 'Knuth'; # $code contains 'K530' @list = soundex qw(Lloyd Gauss); # @list contains 'L300', 'G200' As the soundex algorithm was originally used a long time ago in the United States, it considers only the English alphabet and pronunciation. As it is mapping a large space (arbitrary-length strings) onto a small space (single letter plus three digits), no inference can be made about the similarity of two strings that end up with the same soundex code. For example, both Hilbert and Heilbronn end up with a soundex code of H416. Text::Tabs--Expand and Unexpand Tabs
use Text::Tabs; $tabstop = 8; # set tab spacing to 8 (default) print expand("Hello\tworld"); # convert tabs to spaces in output print unexpand("Hello, world"); # convert spaces to tabs in output $tabstop = 4; # set tab spacing to 4 print join("\n", expand(split(/\n/, "Hello\tworld, \nit's a nice day.\n"))); This module expands tabs into spaces and "unexpands" spaces into tabs, in the manner of the UNIX expand (1) and unexpand (1) programs. All tabs and spaces--not only leading ones--are subject to being expanded and unexpanded. Both expand() and unexpand() take as argument an array of strings, which are returned with tabs or spaces transformed. Newlines may not be included in the strings, and should be used to split strings into separate elements before they are passed to expand() and unexpand(). expand(), unexpand(), and $tabstop are imported into your program when you use this module. Text::Wrap--Wrap Text into a Paragraph
use Text::Wrap; $Text::Wrap::columns = 20; # default is 76 $pre1 = "\t"; # prepend this to first line of paragraph $pre2 = ""; # prepend this to subsequent lines print wrap($pre1, $pre2, "Hello, world, it's a nice day, isn't it?"); This module is a simple paragraph formatter that wraps text into a paragraph and indents each line. The single exported function, wrap(), takes three arguments: a string to prepend to the first output line; a string to prepend to each subsequent output line; and the text to be wrapped. $columns is exported on request. Tie::Hash, Tie::StdHash--Base Class Definitions for Tied Hashes
package NewHash; require Tie::Hash; @ISA = (Tie::Hash); sub DELETE { ... } # Provides additional method sub CLEAR { ... } # Overrides inherited method package NewStdHash; require Tie::Hash; @ISA = (Tie::StdHash); sub DELETE { ... } package main; tie %new_hash, "NewHash"; tie %new_std_hash, "NewStdHash"; This module provides some skeletal methods for hash-tying classes. (See Chapter 5, Packages, Modules, and Object Classes for a list of the functions required in order to tie a hash to a package.) The basic Tie::Hash package provides a new() method, as well as methods TIEHASH(), EXISTS() and CLEAR(). The Tie::StdHash package provides most methods required for hashes. It inherits from Tie::Hash, and causes tied hashes to behave exactly like standard hashes, allowing for selective overloading of methods. The new() method is provided as grandfathering in case a class forgets to include a TIEHASH() method. For developers wishing to write their own tied hashes, the required methods are briefly defined below. (Chapter 5, Packages, Modules, and Object Classes not only documents these methods, but also has sample code.)
Chapter 5, Packages, Modules, and Object Classes includes a method called DESTROY() as a "necessary" method for tied hashes. However, it is not actually required, and neither Tie::Hash nor Tie::StdHash defines a default for this method. See alsoThe library modules relating to various DBM-related implementations (DB_File, GDBM_File, NDBM_File, ODBM_File, and SDBM_File) show examples of general tied hashes, as does the Config module. While these modules do not utilize Tie::Hash, they serve as good working examples. Tie::Scalar, Tie::StdScalar--Base Class Definitions for Tied Scalars
package NewScalar; require Tie::Scalar; @ISA = (Tie::Scalar); sub FETCH { ... } # Provides additional method sub TIESCALAR { ... } # Overrides inherited method package NewStdScalar; require Tie::Scalar; @ISA = (Tie::StdScalar); sub FETCH { ... } package main; tie $new_scalar, "NewScalar"; tie $new_std_scalar, "NewStdScalar"; This module provides some skeletal methods for scalar-tying classes. (See Chapter 5, Packages, Modules, and Object Classes for a list of the functions required in tying a scalar to a package.) The basic Tie::Scalar package provides a new() method, as well as methods TIESCALAR(), FETCH() and STORE(). The Tie::StdScalar package provides all methods specified in Chapter 5, Packages, Modules, and Object Classes. It inherits from Tie::Scalar and causes scalars tied to it to behave exactly like the built-in scalars, allowing for selective overloading of methods. The new() method is provided as a means of grandfathering for classes that forget to provide their own TIESCALAR() method. For developers wishing to write their own tied-scalar classes, methods are summarized below. (Chapter 5, Packages, Modules, and Object Classes not only documents these, but also has sample code.)
See alsoChapter 5, Packages, Modules, and Object Classes has a good example using tied scalars to associate process IDs with priority. Tie::SubstrHash--Fixed-table-size, Fixed-key-length Hashing
require Tie::SubstrHash; tie %myhash, "Tie::SubstrHash", $key_len, $value_len, $table_size; The Tie::SubstrHash package provides a hash table-like interface to an array of determinate size, with constant key size and record size. Upon tying a new hash to this package, the developer must specify the size of the keys that will be used, the size of the value fields that the keys will index, and the size of the overall table (in terms of the number of key/value pairs, not hard memory). These values will not change for the duration of the tied hash. The newly allocated hash table may now have data stored and retrieved. Efforts to store more than $table_size elements will result in a fatal error, as will efforts to store a value not exactly $value_len characters in length, or to reference through a key not exactly $key_len characters in length. While these constraints may seem excessive, the result is a hash table using much less internal memory than an equivalent freely allocated hash table. Because the current implementation uses the table and key sizes for the hashing algorithm, there is no means by which to dynamically change the value of any of the initialization parameters. Time::Local--Efficiently Compute Time from Local and GMT Time
use Time::Local; $time = timelocal($sec, $min, $hours, $mday, $mon, $year); $time = timegm($sec, $min, $hours, $mday, $mon, $year); These routines take a series of arguments specifying a local (timelocal()) or Greenwich (timegm()) time, and return the number of seconds elapsed between January 1, 1970, and the specified time. The arguments are defined like the corresponding arguments returned by Perl's gmtime and localtime functions. The routines are very efficient and yet are always guaranteed to agree with the gmtime and localtime functions. That is, if you pass the value returned by time to localtime, and if you then pass the values returned by localtime to timelocal(), the returned value from timelocal() will be the same as the value originally returned from time. Both routines return -1 if the integer limit is hit. On most machines this applies to dates after January 1, 2038. vars--Predeclare Global Variable Names
use vars qw($frob @mung %seen); This module predeclares all variables whose names are in the list, allowing you to use them under use strict, and disabling any typo warnings. Packages such as the AutoLoader and SelfLoader that delay loading of subroutines within packages can create problems with file-scoped lexicals defined using my. This is because they move the subroutines outside the scope of the lexical variables. While the use vars pragma cannot duplicate the effect of file-scoped lexicals (total transparency outside of the file), it can act as an acceptable substitute by pre-declaring global symbols, ensuring their availability to the routines whose loading was delayed. See also the subs module. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|