home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Book HomeCGI Programming with PerlSearch this book

8.3. Encryption

Encryption can be an effective tool when developing secure solutions. There are two scenarios where it is especially useful for web applications. The first is to protect sensitive data so that it cannot be intercepted and viewed by others. A secure https connections using SSL (or TLS) provides this protection. The second scenario involves validation, such as ensuring that the user has not tampered with the values of hidden fields in a form. This is handled by generating hashes, or digests, that can be used like checksums to verify that the data matches what is expected.

You could use a hash algorithm, such as MD5 or SHA-1, to secure Example 8-3. You would do this by generating a digest for both the data on the page -- the product name and price -- and a secret phrase stored on the server:

use constant $SECRET_PHRASE => "ThIs phrAsE ShOUld bE DiFFiCUlT 2 gueSS.";
my $digest = generate_digest( $name, $price, $SECRET_PHRASE );

You could then insert the value of the digest into your form as an additional hidden field, as shown in Example 8-5.

Example 8-5. sb3000.html

<html>
  <head>
    <title>Super Blaster 3000</title>
  </head>
  
  <body bgcolor="#FFFFFF">
    <h2>Super Blaster 3000</h2>
    <hr>
    
    <form action="https://localhost/cgi/buy.cgi" method="GET">
      <input type="hidden" name="price" value="30.00">
      <input type="hidden" name="name" value="Super Blaster 3000">
      <input type="hidden" name="digest"
        value="a38b37b5c80a79d2efb31ad78e9b8361">
      .
      .

When the CGI script receives the input, it recalculates a digest from the product's name and price along with the secret phrase. If it matches the digest that was supplied from the form, then the user has not modified the data.

The value of your secret phrase must not be easy to guess, and it should be protected on your server. Like passwords and other sensitive data, you may wish to place your secret phrase in a file outside of your CGI directory and document root and have your CGI scripts read this value when it is needed. This way, if a misconfiguration in your web server allows users to view the source of your CGI scripts, then your secret phrase would not be compromised.

In this example, the simplest solution may be to simply look up the prices on the server and not pass them through hidden fields, but there are certainly circumstances when you must expose data like this, and digests are an effective way to verify your data.

Now let's look at how to actually generate digests. We will look at two algorithms: MD5 and SHA-1.

8.3.1. MD5

MD5 is a 128-bit, one-way hash algorithm. It produces a short message digest for your data that is extremely unlikely to be produced for other data. However, from a digest it is not possible to derive the original data. The Digest::MD5 module allows you to create MD5 digests in Perl.[15]

[15]You may also see references to the MD5.pm module; MD5.pm is deprecated and is now only a wrapper to the Digest::MD5 module.

The digest that Digest::MD5 generates for you is available in three different formats: as raw binary data, converted to hexadecimal, and converted to Base64 format. The latter two formats produce longer strings, but they can be safely inserted within HTML, email, etc. The hexadecimal digest is 32 characters; the Base64 digest is 22 characters. Base64 encoding uses characters A-Z, a-z, 0-9, +, /, and =.

You can use the Digest::MD5 module this way to generate a hexadecimal digest:

use Digest::MD5 qw( md5_hex );
my $hex_digest = md5_hex( @data );

You can use the Digest::MD5 module this way to generate a Base64 digest:

use Digest::MD5 qw( md5_base64 );
my $base64_digest = md5_base64( @data );

It is still possible for someone who has a digest and who knows possible original values to generate digests for each of the possible values to compare against the target digest. Thus, if you wish to generate digests that cannot be guessed, you should supply data that varies enough to not be predictable.

The MD5 algorithm has received criticism within the last few years because researchers discovered internal weaknesses, which may make it easier to find different sets of data that produce the same digest. No one has done this, because it is still quite challenging, but the challenge looks smaller than previously assumed, and it may happen in the near future. This does not mean that it is any easier for someone to generate the original data from a digest, only that it may eventually be possible to calculate other data that collides with the digest. The SHA-1 algorithm does not currently have this problem.



Library Navigation Links

Copyright © 2001 O'Reilly & Associates. All rights reserved.