16.7 Tips on Generating Random Numbers
Random numbers play an important role in
modern computer security. Many programs that use encryption need a
good source of random numbers for producing session keys. For
example, the PGP program uses random numbers for generating a random
key that is used to encrypt the contents of electronic mail messages;
the random key is then itself encrypted using the
recipient's public key.
Random numbers have other uses in computer security as well. A
variety of authentication protocols require that the computer create
a random number, encrypt it, and send it to the user. The user must
then decrypt the number, perform a mathematical operation on it,
re-encrypt the number, and send it back to the computer.
A great deal is known about random
numbers. Here are some general rules of thumb:
If a number is random, then each bit of that
number's binary representation should have an equal
probability of being a 0 or a 1.
If a number is random, then after each 0 bit in that
number's binary representation there should be an
equal probability that the following bit is a 0 or a 1. Likewise,
after each 1 there should be an equal probability that the following
bit is a 0 or a 1.
When examining a large set of random values, each with a large number
of bits, then roughly half of the bits should be 0s, and half of the
bits should be 1s.
For security-related purposes, a further requirement for random
numbers is unpredictability:
It should not be possible to predict the output of the random number
generator given previous outputs or other knowledge about the
computer generating the random numbers.
It should not be possible to determine the internal state of the
random number generator.
It should not be possible to replicate the initial state of the
random number generator, or to reseed the generator with the same
initial value.
One of the best ways of generating a stream of random numbers is to
make use of a random process, such as radioactive decay.
Unfortunately, most Unix computers are not equipped with Geiger
counters. Thus, they need to use something else. Often, they use
pseudorandom functions as random number generators.
A pseudorandom
function is a function that yields a series of outputs
that appears to be unpredictable. In practice, these functions
maintain a large internal state from which the output is calculated.
Each time a new number is generated, the internal state is changed.
The function's initial state is referred to as its
seed .
If you need a series of random numbers that is repeatable, you need a
pseudorandom generator that takes a seed and keeps an internal state.
If you need a nonreproducible series of random numbers, you should
avoid pseudorandom generators. Thus, successfully picking random
numbers in the Unix environment depends on two things: picking the
right random number generator, and then picking a different seed each
time the program is run.
16.7.1 Unix Pseudorandom Functions
The standard Unix C library provides two random number generators:
rand( ) and random( ). A
third random number generator, drand48( ), is
available on some versions of Unix. Although you
won't want to use any of these routines to produce
cryptographic random numbers, we'll briefly explain
each. Then, if you need to use one of them for something else,
you'll know something about its strengths and
shortcomings.
16.7.1.1 rand( )
The original Unix random number generator, rand(
), is not a very good random number generator. It uses a
32-bit seed and maintains a 32bit internal state. The output of the
function is also 32 bits in length, making it a simple matter to
determine the function's internal state by examining
the output. As a result, rand( ) is not very
random. Furthermore, the low-order bits of some implementations are
not random at all, but flip back and forth between 0 and 1 according
to a regular pattern. The rand( ) random number
generator is seeded with the function srand( ).
On some versions of Unix, a third function is provided,
rand_r( ), for multithreaded applications. (The
function rand( ) itself is not safe for
multithreading, as it maintains internal state.)
Do not use rand( ), even for
"trivial" programs.
16.7.1.2 random( )
The random( ) function is a more sophisticated
random number generator that uses nonlinear feedback and an internal
table that is 124 bytes (992 bits) long. The function returns random
values that are 32 bits in length. All of the bits generated by
random( ) are usable.
The random( ) function is adequate for
simulations and games, but should not be used for security-related
applications such as picking cryptographic keys or simulating
one-time pads.
16.7.1.3 drand48( ), lrand48( ), and mrand48( )
The drand48( ) function is one of many functions
that make up the System V random number generator. According to the
Solaris documentation, the algorithm uses "the
well-known linear congruential algorithm and 48-bit integer
arithmetic." The function drand48(
) returns a double-precision number that is greater than
or equal to 0.0 and less than 1.0, while the lrand48(
) and mrand48( ) functions return
random numbers within a specified integer range. As with
random( ), these functions provide excellent
random numbers for simulations and games, but should not be used for
security-related applications such as picking cryptographic keys or
simulating one-time pads; linear congruential algorithms are too easy
to break.
16.7.2 Picking a Random Seed
Using a good
random number generator is easy. Picking a random seed, on the other
hand, can be quite difficult. Conceptually, picking a random number
should be easy: pick something that is always different.
But in practice, picking a random number—especially one that
will be used as the basis of a cryptographic key—is quite
difficult. The practice is difficult because many things that change
all the time actually change in predictable ways.
A stunning example of a poorly chosen seed for a random number
generator was revealed on the front page of the New York
Times in September 1995. The
problem was in Netscape Navigator, a popular program
for browsing the World Wide Web. Instead of using truly random
information for seeding the random number generator,
Netscape's programmers used a combination of the
current time of day, the process ID (PID) of the running Netscape
program, and the parent process ID (PPID). Researchers at the
University of California at Berkeley discovered that they could,
through a process of trial and error, discover the numbers that any
copy of Netscape was using and crack the encrypted messages with
relative ease.
Another example of a badly chosen seed generation routine was used in
Kerberos Version 4. This routine was
based on the time of day XORed with other information. The XOR
effectively masked out the other information and resulted in a seed
of only 20 bits of unpredictable value. This reduced the key space
from more than 72 quadrillion possible keys to slightly more than 1
million, thus allowing keys to be guessed in a matter of seconds.
When this weakness was discovered at Purdue's COAST
Laboratory, conversations with personnel at MIT
revealed that they had known for years that this problem existed, but
the patch had somehow never been released.
In the book Network Security, Private Communication in a
Public World, Kaufman et al. identify three typical
mistakes when picking random-number seeds:
- Seeding a random number generator from a limited space
-
If you seed your random number generator with an 8-bit number, your
generator has only one of 256 possible initial seeds. You will have
only 256 possible sequences of random numbers coming from the
function (even if your generator has 128 bytes of internal state).
- Using a hash value of only the current time as a random seed
-
This practice was the problem with the Netscape security bug. The
problem was that even though the Unix operating system API appears to
return the current time to the nearest microsecond, most operating
systems have a resolution considerably coarser—usually within
one 1/60th of a second or less. As Kaufman et al. point out, if a
clock has only 1/60th of a second granularity, and the intruder knows to the
nearest hour when the current time was sampled, then there are only
60 x 60 x 60 = 216,000 possible values for the
supposedly random seed.
- Divulging the seed value itself
-
In one
case reported by Charlie Kaufman et al., and originally discovered by
Jeff Schiller of MIT, a program used the time of day to choose a
per-message encryption key. The problem in this case was that the
application included the time that the message was generated in its
unencrypted header of the message.
How do you pick a good random send? Here are some ideas:
Use a genuine source of randomness, such as a radioactive source,
static on the FM dial, thermal noise, or something similar (see the
sidebar When Good Calls Fail). Measuring the timing of hard
disk drives can be another source of randomness, provided that you
can access the hardware at a sufficiently low level.
Ask the user to type a set of text, and sample the time between
keystrokes. If you get the same amount of time between two
keystrokes, throw out the second value; the user is probably holding
down a key, and the key is repeating. (This technique is used by PGP
as a source of randomness for its random number generator.)
Monitor the user. Each time the user presses a key, take the time
between the current keypress and the last keypress, add it to the
current random number seed, and hash the result with a cryptographic
hash function. You can also use mouse movements to add still more
randomness.
Monitor the computer. Use readily available, constantly changing
information, such as the number of virtual memory pages that have
been paged in, the status of the network, and so forth. This is how
/dev/random works.
Some versions of Unix have integrated kernel random number sources
available through the device file abstractions of
/dev/random and
/dev/urandom . When present, these devices combine
a cryptographically secure random number generator using
non-deterministic sources of bits with seed information from many
random sources, such as network interrupts, user input, and other
external events.
/dev/random generally returns random bytes until
it exhausts the available noise in its entropy pool, and then blocks
until more entropy has been gathered. It is thus suitable for
cryptographic applications and one-time pads, but the time required
to generate a given number of random bytes may not be predictable.
For security applications, this is generally the best source of
random numbers available in the operating system without attaching
special hardware.
/dev/urandom returns as many bytes as requested;
when it has exhausted the available noise, the bytes it returns are
only pseudorandom. It never blocks, and thus has predictable time
requirements, but may not be suitable for cryptographic use.
Systems that don't have
/dev/random can try the Entropy Gathering Daemon
(http://egd.sourceforge.net), a
user-space daemon that performs a similar function.
|
RFC 1705,
by Donald Eastlake, Steve Crocker, and Jeffrey Schiller, makes many
observations about picking seeds for random number generators. Among
them are the following:
Avoid relying on the system clock. Many system clocks are
surprisingly non-random. Many clocks that claim to provide accuracy
actually don't, or they don't
provide good accuracy all the time.
Don't use Ethernet addresses or hardware
serial numbers. Such numbers are usually "heavily
structured" and have "heavily
structured subfields." As a result, attackers could
easily try all of the possible combinations, or guess the value based
on the date of manufacture.
Beware of using information such as the time of the arrival of
network packets. Such external sources of randomness could be
manipulated by an adversary.
Don't use random selections from a large database
(such as a CD-ROM) as a source of randomness. The reason, according
to RFC 1750, is that your adversary may have access to the same
database. The database may also contain unnoticed structure.
Consider using analog input devices already present on your system.
For example, RFC 1750 suggests using the
/dev/audio device present
on some Unix workstations as a source of random numbers. The stream
is further compressed to remove systematic skew. For example: $ cat /dev/audio | compress - >random-bit-stream RFC 1750 further advises that the microphone not be connected to the
audio input jack for fear that the /dev/audio
device will pick up random electrical noise. This rule may not be
true on all hardware platforms. You should check your hardware with
the microphone turned on and with no microphone connected to see
which way gives a "better" source
of random numbers. If you decide to use it without a microphone
connected, you should then label the jack so that somebody does not
accidentally plug a microphone into it.
16.7.3 A Good Random Seed Generator
As
we've mentioned, one way of generating a random seed
is to use a source message digest algorithm such as MD5 or SHA-1. As
input, give it as much data as you can based on temporary state. This
data might include the output of ps -efl, the
environment variables for the current process, its PID and PPID, the
current time and date, the output of the random number generator
given your seed, the seed itself, the state of network connections,
and perhaps a directory listing of the current directory. The output
of the function will be a string of bits that an attacker cannot
likely duplicate, but that is likely to meet all the other conditions
of randomness you might desire.
The Perl program in Example 16-1 is an example of such a program. It uses
several aspects of system state, network status, virtual memory
statistics, and process state as input to MD5. These numbers change
very quickly on most computers and cannot be anticipated, even by
programs running as superuser on the same computer. The entropy
(randomness) of these values is spread throughout the result by the
hashing function of MD5, resulting in an output that should be
sufficiently random for most uses.
Example 16-1. Generating a random seed string
#!/usr/bin/perl -w
#
# randbits -- Gene Spafford <spaf@purdue.edu>
# Generate a random seed string based on state of system.
#
# Inspired by a program from Bennett Todd, derived
# from original by Larry Wall
#
# Uses state of various kernel structures as random "seed"
# Mashes them together and uses MD5 to spread around
#
# Usage: randbits [-n] [-h | -H ] [keylen]
# In which
# -n means to emit no trailing linefeed
# -h means to give output in hex (default)
# -H means hex output, but use uppercase letters
# -s means to give output in base 64
# keylen is the number of bytes to the random key (default is 8)
# If you run this on a different kind of system, you should adjust the
# setting in the "noise" string to system-specific strings. Do it as another
# case in the "if...else" and email me the modification so I can keep a
# merged copy. (Hint: check in your manual for any programs with "stat" in
# the name or description.)
#
# You will need to install the Digest::MD5 module from CPAN if it is not already present.
use Digest::MD5;
use Getopt::Std;
# Augment the path to something that should contain all needed commands.
$ENV{'PATH'} .= "/bin:/usr/bin:/usr/etc:/usr/ucb:/etc:";
# We start with the observation that most machines have either a BSD-ish
# core command set, or a System V-ish command set. We'll build from those.
$BSD = "ps -agxlww ; netstat -s ; vmstat -s ;";
$SYSV = "ps -eflj ; netstat -s ; nfsstat -nr ;";
if ( -e "/sdmach" ) {
$_ = "NeXT";
} elsif ( -x "/usr/bin/uname" || -x "/bin/uname") {
$_ = `uname -sr`;
} elsif ( -x "/etc/version" ) {
$_ = `/etc/version`;
} else {
die "How do I tell what OS this is?";
}
/^AIX / && ( $noise = $BSD . 'pstat -fs')||
/^CYGWIN/ && ( $noise = "ps -alW ; netstat -a ; netstat -s")||
/^Darwin/ && ( $noise = "ps -agxlww ; netstat -s ; pstat -fsvt")||
/^FreeBSD/ && ( $noise = $BSD . 'vmstat -i')||
/^HP-UX 7/ && ( $noise = $SYSV)||
/^HP-UX A.09/ && ( $noise = $SYSV . "vmstat -s")||
/^IRIX(64)? [56]/ && ( $noise = $SYSV)||
/^Linux/ && ( $noise = "ps -agxlww ; netstat -i ; netstat -s; vmstat")||
/^NeXT/ && ( $noise = 'ps agxlww; netstat -s; vm_stat')||
/^OSF1/ && ( $noise = $SYSV . 'vmstat -i')||
/^SunOS 4/ && ( $noise = $BSD . 'pstat -afipSsT;vmstat -i')||
/^SunOS 5/ && ( $noise = $SYSV . 'vmstat -i;vmstat -s; nfsstat')||
/^ULTRIX 4/ && ( $noise = $BSD . 'vmstat -s')||
die "No 'noise' commands defined for this OS. Edit and retry!";
#### End of things you may need to modify
($prog = $0) =~ s|.*/||;
$usage = "usage: $prog [-n] [-h | -H | -s] [keylength]\n";
getopt('nhHs', \%opts) || die $usage;
defined($keylen = shift) || ($keylen = 8);
die $usage if ($keylen =~ /\D/);
die $usage if (($opts{s} and $opts{h} || $opts{H}) or ($opts{h} && $opts{H}));
die "Maximum keylength is 16 bytes (32 hex digits)\n" if ($keylen > 16);
# Run the noise command and include whatever other state we
# can conveniently (portably) find.
$hash = Digest::MD5->new;
$noise .= ";ls -lai . /tmp";
-d "/dev" and $noise .= " /dev";
open(NOISE, "$noise |") || die("Couldn't run noise commands: $!");
$hash->add(<NOISE>);
close(NOISE);
$hash->add(times(), $$, getppid(), time, join('+', %ENV));
# Format the output and finish.
$buf = $opts{s} ? $hash->b64digest : $hash->hexdigest;
($buf =~ y/a-f/A-F/) if $opts{H};
print substr($buf, 0, 2*$keylen);
print "\n" unless $opts{n};
Note that the techniques used in this script are similar to the
approaches used by some Unix systems to implement
/dev/random; they are also similar to the
techniques used by EGD. As these functions are not present on all
systems, we have decided to include this script here. (It is also
educational to see how such a script is written.)
This script is also an excellent method for generating Xauthority
keys (see Section 12.3.22.2) if you need them. Simply
execute it with an argument of 14 (you need 28 hex characters of key)
and use the result as your key.
|