home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Perl CookbookPerl CookbookSearch this book

6.21. Program: urlify

This program puts HTML links around URLs in files. It doesn't work on all possible URLs, but does hit the most common ones. It tries to avoid including end-of-sentence punctuation in the marked-up URL.

It is a typical Perl filter, so it can be fed input from a pipe:

% gunzip -c ~/mail/archive.gz | urlify > archive.urlified

or by supplying files on the command line:

% urlify ~/mail/*.inbox > ~/allmail.urlified

The program is shown in Example 6-10.

Example 6-10. urlify

  #!/usr/bin/perl
  # urlify - wrap HTML links around URL-like constructs
  $protos = '(http|telnet|gopher|file|wais|ftp)';
  $ltrs   = '\w';
  $gunk   = ';/#~:.?+=&%@!\-';
  $punc   = '.:?\-';
  $any    = "${ltrs}${gunk}${punc}";
  while (<>) {
      s{
        \b                    # start at word boundary
        (                     # begin $1  {
         $protos   :          # need resource and a colon
         [$any] +?            # followed by on or more
                              #  of any valid character, but
                              #  be conservative and take only
                              #  what you need to....
        )                     # end   $1  }
        (?=                   # look-ahead non-consumptive assertion
         [$punc]*             # either 0 or more punctuation
         [^$any]              #   followed by a non-url char
         |                    # or else
         $                    #   then end of the string
      )
     }{<A HREF="$1">$1</A>}igox;
    print;
  }


Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.