home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Perl CookbookPerl CookbookSearch this book

1.13. Controlling Case

1.13.3. Discussion

The functions and string escapes look different, but both do the same thing. You can set the case of either just the first character or the whole string. You can even do both at once to force uppercase (actually, titlecase; see later explanation) on initial characters and lowercase on the rest.

$beast   = "dromedary";
# capitalize various parts of $beast
$capit   = ucfirst($beast);         # Dromedary
$capit   = "\u\L$beast";            # (same)
$capall  = uc($beast);              # DROMEDARY
$capall  = "\U$beast";              # (same)
$caprest = lcfirst(uc($beast));     # dROMEDARY
$caprest = "\l\U$beast";            # (same)

These capitalization-changing escapes are commonly used to make a string's case consistent:

# titlecase each word's first character, lowercase the rest
$text = "thIS is a loNG liNE";
$text =~ s/(\w+)/\u\L$1/g;
print $text;
This Is A Long Line

You can also use these for case-insensitive comparison:

if (uc($a) eq uc($b)) { # or "\U$a" eq "\U$b"
    print "a and b are the same\n";
}

The randcap program, shown in Example 1-2, randomly titlecases 20 percent of the letters of its input. This lets you converse with 14-year-old WaREz d00Dz.

Example 1-2. randcap

  #!/usr/bin/perl -p
  # randcap: filter to randomly capitalize 20% of the letters
  # call to srand( ) is unnecessary as of v5.4
  BEGIN { srand(time( ) ^ ($$ + ($$<<15))) }
  sub randcase { rand(100) < 20 ? "\u$_[0]" : "\l$_[0]" }
  s/(\w)/randcase($1)/ge;
  % randcap < genesis | head -9
  boOk 01 genesis
  001:001 in the BEginning goD created the heaven and tHe earTh.
      
  001:002 and the earth wAS without ForM, aND void; AnD darkneSS was
          upon The Face of the dEEp. and the spIrit of GOd movEd upOn
          tHe face of the Waters.
  001:003 and god Said, let there be ligHt: and therE wAs LigHt.

In languages whose writing systems distinguish between uppercase and titlecase, the ucfirst( ) function (and \u, its string escape alias) converts to titlecase. For example, in Hungarian the "dz" sequence occurs. In uppercase, it's written as "DZ", in titlecase as "Dz", and in lowercase as "dz". Unicode consequently has three different characters defined for these three situations:

Code point  Written   Meaning
01F1        DZ        LATIN CAPITAL LETTER DZ
01F2        Dz        LATIN CAPITAL LETTER D WITH SMALL LETTER Z
01F3        dz        LATIN SMALL LETTER DZ

It is tempting but ill-advised to just use tr[a-z][A-Z] or the like to convert case. This is a mistake because it omits all characters with diacritical markings—such as diaereses, cedillas, and accent marks—which are used in dozens of languages, including English. However, correctly handling case mappings on data with diacritical markings can be far trickier than it seems. There is no simple answer, although if everything is in Unicode, it's not all that bad, because Perl's case-mapping functions do work perfectly fine on Unicode data. See the section on The Universal Character Code in the Introduction to this chapter for more information.

1.13.4. See Also

The uc, lc, ucfirst, and lcfirst functions in perlfunc(1) and Chapter 29 of Programming Perl; \L, \U, \l, and \u string escapes in the "Quote and Quote-like Operators" section of perlop(1) and Chapter 5 of Programming Perl



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.