Private key cryptography is most often used for protecting information stored on a computer's hard disk, or for encrypting information carried by a communications link between two different machines. Public key cryptography is most often used for creating digital signatures on data, such as electronic mail, to certify the data's origin and integrity.
This analysis gives rise to a third kind of system:
The following list summarizes the private key systems in common use today.
Table 6.1 lists all of the private and public key algorithms we've discussed
The following sections provide some technical information about a few of the algorithms mentioned above. If you are only interested in using encryption, you can skip ahead to the section called "Encryption Programs Available for UNIX" later in this chapter.
ROT13 is a simple substitution cipher that is traditionally used for distributing potentially objectionable material on the Usenet, a worldwide bulletin board system. It is a variation on the Caesar Cipher - an encryption method used by Caesar's troops thousands of years ago. In the ROT13 cipher, each letter of the alphabet is replaced with a letter that is 13 letters further along in the alphabet (with A following Z). Letters encrypt as follows:
ROT13 used to be the most widely used encryption system in the UNIX world. However, it is not secure at all. Many news and mail-reading programs automatically decrypt ROT13 -encoded text with a single keystroke. Some people are known to be able to read ROT13 text without any machine assistance whatsoever.
For example, here is a ROT13 message:
Jung tbrf nebhaq, pbzrf nebhaq.
And here is how the message decrypts:
What goes around, comes around.
If you are not blessed with the ability to read ROT13 files without computer assistance, you can use the following command to either encrypt or decrypt files with the ROT13 algorithm:
% tr "[a-z][A-Z]" "[n-z][a-m][N-Z][A-M]" < filename
Needless to say, do not use ROT13 as a means of protecting your files! The only real use for this "encryption" method is the one to which it is put on the Usenet: to keep someone who does not want to be exposed to material (such as the answer to a riddle, a movie spoiler in a review, or an offensive joke) from reading it inadvertently.
One of the most widely used encryption systems today is the Data Encryption Standard ( DES ), developed in the 1970s and patented by researchers at IBM . The DES was an outgrowth of another IBM cipher known as Lucifer. IBM made the DES available for public use, and the federal government issued Federal Information Processing Standard Publication ( FIPS PUB ) Number 46 in 1977 describing the system. Since that time, the DES has been periodically reviewed and reaffirmed (most recently in December 30, 1993), until 1998 as FIPS PUB 46-2. It has also been adopted as an American National Standard (X3.92-1981/R1987).
The DES performs a series of bit permutation, substitution, and recombination operations on blocks containing 64 bits of data and 56 bits of key (eight 7-bit characters). The 64 bits of input are permuted initially, and are then input to a function using static tables of permutations and substitutions (called S-boxes). The bits are permuted in combination with 48 bits of the key in each round. This process is iterated 16 times (rounds), each time with a different set of tables and different bits from the key. The algorithm then performs a final permutation, and 64 bits of output are provided. The algorithm is structured in such a way that changing any bit in the input has a major effect on almost all of the output bits. Indeed, the output of the DES function appears so unrelated to its input that the function is sometimes used as a random number generator.
Although there is no standard UNIX program that performs encryption using the DES , some vendors' versions of UNIX include a program called des which performs DES encryption. (This command may not be present in international versions of the operating system, as described in the next section.)
The DES was mandated as the encryption method to be used by all federal agencies in protecting sensitive but not classified information. The DES is heavily used in many financial and communication exchanges. Many vendors make DES chips that can encode or decode information fast enough to be used in data-encrypting modems or network interfaces. Note that the DES is not (and has never been) certified as an encryption method that can be used with U.S. Department of Defense classified material.
Export control rules restrict the export of hardware or software implementations of the DES , even though the algorithm has been widely published and implemented many times outside the United States. If you have the international version of UNIX , you may find that your system lacks a des command. If you find yourself in this position, don't worry; good implementations of the DES can be obtained via anonymous FTP from almost any archive service, including the Usenet comp.sources archives.
For more information about export of cryptography, see "Encryption and U.S. Law," later in this chapter.
FIPS PUB 81 explains how the DES algorithm can be used in four modes:
Each mode has particular advantages in some circumstances, such as when transmitting text over a noisy channel, or when it is necessary to decrypt only a portion of a file. The following provides a brief discussion of these four methods; consult FIPS PUB 81 or a good textbook on cryptography for details.
All of these modes require that byte and block boundaries remain synchronized between the sender and recipient. If information is inserted or removed from the encrypted data stream, it is likely that all of the following data from the point of modification can be rendered unintelligible.
Ever since DES was first proposed as a national standard, some people have been suspicious of the algorithm. DES was based on a proprietary encryption algorithm developed by IBM called Lucifer, which IBM had submitted to the National Bureau of Standards ( NBS ) for consideration as a national cryptographic standard. But whereas Lucifer had a key that was 112 bits long, the DES key was shortened to 56 bits at the request of the National Security Agency. The NSA also requested that certain changes be made in the algorithm's S-boxes. Many people suspected that NSA had intentionally weakened the Lucifer algorithm, so the final standard adopted by NBS would not pose a threat to the NSA 's ongoing intelligence collection activities. But nobody had any proof.
Today the DES is more than 20 years old, and the algorithm is definitely showing its age. Recently Michael Weiner, a researcher at Bell Northern Research, published a paper detailing how to build a machine capable of decrypting messages encrypted with the DES by conducting an exhaustive key search. Such a machine could be built for a few million dollars, and could break any DES -encrypted message in about a day. We can reasonably assume that such machines have been built by both governments and private industry.
In June 1994, IBM published a paper describing the design criteria of the DES . The paper claims that the choices of the DES key size, S-boxes, and number of rounds were a direct result of the conflicting goals of making the DES simple enough to fit onto a single chip with 1972 chip-making technology, and the desire to make it resistant to differential cryptanalysis.
These two papers, coupled with many previously published analyses, appear to have finally settled a long-running controversy as to whether or not NSA had intentionally built in weaknesses to the DES . The NSA didn't build a back door into DES that would have allowed it to forcibly decrypt any DES -encrypted transmission: it didn't need to. Messages encrypted with DES can be forcibly decrypted simply by trying every possible key, given the appropriate hardware.
You can improve the security of DES by performing multiple encryptions, known as superencryption . The two most common ways of doing this are with double encryption ( Double DES ) and with triple encryption ( Triple DES ).
While double DES appears to add significant security, research has found some points of attack, and therefore experts recommend Triple DES for applications where single DES is not adequate.
In Double DES , each 64-bit block of data is encrypted twice with the DES algorithm, first with one key, then with another, as follows:
Plaintext (key1) (key2) ciphertext
Double DES is not significantly more secure than single DES . In 1981, Ralph Merkle and Martin Hellman published an article in which they outlined a so-called "meet-in-the-middle attack."
The meet-in-the-middle attack is a known plaintext attack which requires that an attacker have both a known piece of plaintext and a block of that same text that has been encrypted. (These pieces are surprisingly easily to get.) The attack requires storing 2 56 intermediate results when trying to crack a message that has been encrypted with DES (a total of 2 59 bytes), but it reduces the number of different keys you need to check from 2 112 to 2 57 . "This is still considerably more memory storage than one could comfortably comprehend, but it's enough to convince the most paranoid of cryptographers that double encryption is not worth anything," writes Bruce Schneier in his landmark volume, Applied Cryptography.
In other words, because a message encrypted with DES can be forcibly decrypted by an attacker performing an exhaustive key search today, an attacker might also be able to forcibly decrypt a message encrypted with Double DES using a meet-in-the-middle attack at some point in the future.
The dangers of the Merkle-Hellman meet-in-the-middle attack can be circumvented by performing three block encryption operations. This method is called Triple DES .
In practice, the most common way to perform Triple DES is:
The advantage of this technique is that it can be backward compatible with single DES , simply by setting all three keys to be the same value.
To decrypt, reverse the steps:
For many applications, you can use the same key for both key1 and key3 without creating a significant vulnerability.
Triple DES appears to be roughly as secure as single DES would be if it had a 112-bit key. How secure is this really? Suppose you had an integrated circuit which could perform one million Triple DES encryptions per second, and you built a massive computer containing one million of these chips to forcibly try all Triple DES keys. This computer, capable of testing 10 12 encryptions per second, would require:
2 112 = 5.19 x 10 33 encryption operations 5.19 x 10 33 encryption operations / 10 12 operations/sec = 5.19 x 10 21 sec = 1.65 x 10 14 years.
This is more than 16,453 times older than the currently estimated age of the universe (approximately 10 10 years).
Apparently, barring new discoveries uncovering fundamental flaws or weaknesses with the DES algorithm, or new breakthroughs in the field of cryptanalysis, Triple DES is the most secure private key encryption algorithm that humanity will ever need (although niche opportunities may exist for faster algorithms).
RSA is the most widely known algorithm for performing public key cryptography. The algorithm is named after its inventors, Ronald Rivest, Adi Shamir, and Leonard Adleman, who made their discovery in the spring of 1977.
Unlike DES , which uses a single key, RSA uses two cryptographic keys: a public key and a secret key. The public key is used to encrypt a message and the secret key is used to decrypt it. (The system can also be run in reverse, using the secret key to encrypt data that can be decrypted with the public key.)
The RSA algorithm is covered by U.S. Patent 4,405,829 ("Cryptographic Communications System and Method"), which was filed for on December 14, 1977; issued on September 20, 1983; and expires on September 20, 2000. Because a description of the algorithm was published before the patent application was filed, RSA can be used without royalty everywhere in the world except the United States (international patent laws have different coverage of prior disclosure and patent applicability). Not surprisingly, RSA is significantly more popular in Europe and Japan than in the United States, although its popularity in the U.S. is increasing.
The strength of RSA is based on the difficulty of factoring a very large number. The following brief treatment does not fully explain the mathematical subtleties of the algorithm. If you are interested in more detail, you can consult the original paper or a text such as those listed in Appendix D .
RSA is based on well-known, number-theoretic properties of modular arithmetic and integers. One property makes use of the Euler Totient Function, (n) . The Totient function of a number is defined as the count of integers less than that number that are relatively prime to that number. (Two numbers are relatively prime if they have no common factors; for example, 9 and 8 are relatively prime.) The Totient function for a prime number is one less than the prime number itself: every positive integer less than the number is relatively prime to it.
The property used by RSA was discovered by Euler and is this: any integer i relatively prime to n raised to the power of (n) and taken mod n is equal to 1. That is:
equation goes here
Suppose e and d are random integers that are inverses modulo (n) , that is:
equation goes here
A related property used in RSA was also discovered by Euler. His theorem says that if M is any number relatively prime to n , then:
and equation goes here
Cryptographically speaking, if M is part of a message, we have a simple means for encoding it with one function:
equation goes here
and decoding it with another function:
equation goes here
So how do we get appropriate values for n , e , and d ? First, two large prime numbers p and q , of approximately the same size, are chosen, using some appropriate method. These numbers should be large - on the order of several hundred digits - and they should be kept secret.
Next, the Euler Totient function ( pq ) is calculated. In the case of n being the product of two primes, ( p q ) = ( p - 1 ) ( q - 1 ) = ( n ).
Next, we pick a value e that is relatively prime to ( n ). A good choice would be to pick something in the interval max ( p + 1, q + 1 ) < e <; ( n ). Then we calculate a corresponding d , such that e d mod (n) 1 . That is, we find the modular inverse of e mod (n) . If d should happen to be too small (i.e., less than about log2(n)) , we pick another e and d .
Now we have our keys. To encrypt a message m, we split m into fixed-size integers M less than n. Then we find the value (M e ) mod n = s for each portion of the message. This calculation can be done quickly with hardware, or with software using special algorithms. These values are concatenated to form the encrypted message. To decrypt the message, it is split into the blocks, and each block is decrypted as (s d ) mod n = M .
For this example, assume we pick two prime numbers p and q :
p = 251 q = 269
The number n is therefore:
n = 251 * 269 = 67519
The Euler Totient function for this value is:
(n) = (251-1) (269-1) = 67000
Let's arbitrarily pick e as 50253. d is then:
d = e -1 mod 67000 = 27917
50253 * 27917 = 1402913001 = 20939 * 67000 + 1 = 1 ( mod 67000 )
Using n = 67519 allows us to encode any message M that is between 0 and 67518. We can therefore use this system to encode a text message two characters at a time. (Two characters have 16 bits, or 65536 possibilities.)
Using e as our key, let's encode the message " RSA works!" The sequence of ASCII characters encoding " RSA works!" is shown in the following table
As you can see, the encoded values do not resemble the original message.
To decrypt, we raise each of these numbers to the power of d and take the remainder mod n. After translating to ASCII , we get back the original message.
When RSA is used for practical applications, it is used with numbers that are hundreds of digits long. Because doing math with hundred-digit-long strings is time consuming, modern public key applications are designed to minimize the number of RSA calculations that need to be performed. Instead of using RSA to encrypt the entire message, RSA is used to encrypt a session key, which itself is used to encrypt the message using a high-speed, private key algorithm such as DES or IDEA .
The numbers n and either e or d can be disclosed without seriously compromising the strength of an RSA cipher. For an attacker to be able to break the encryption, he or she would have to find ( n ), which, to the best of anyone's knowledge, requires factoring n .
Factoring large numbers is very difficult - no known method exists to do it efficiently. The time required to factor a number can be several hundred years or several billion years with the fastest computers, depending on how large the number n is. If n is large enough, it is, for all intents and purposes, unfactorable. The RSA encryption system is therefore quite strong, provided that appropriate values of n , e , and d are chosen, and that they are kept secret.
To see how difficult factoring a large number is, let's do a little rough calculation of how long factoring a 200 decimal-digit number would take; this number is more than 70 digits longer than the largest number ever factored, as of the time this book went to press.
All 200-digit values can be represented in at most 665 binary bits.
(In general, to factor a 665-bit number using one of the fastest-known factoring algorithms would require approximately 1.2 x 10 23 operations.)
Let's assume you have a machine that will do 10 billion (10 10 ) operations per second. (Somewhat faster than today's fastest parallel computers.) To perform 1.2 x 10 23 operations would require 1.2 x 10 13 seconds, or 380,267 years worth of computer time. If you feel uneasy about having your number factored in 380,267 years, simply double the size of your prime number: a 400-digit number would require a mere 8.6 x 10 15 years to factor. This is probably long enough; according to Stephen Hawking's A Brief History of Time , the universe itself is only about 2 x 10 10 years old.
To give you another perspective on the size of these numbers, assume that you (somehow) could precalculate the factors of all 200 decimal digit numbers. Simply to store the unfactored numbers themselves would require approximately (9 x 10 200 ) x 665 bits of storage (not including any overhead or indexing). Assume that you can store these on special media that hold 100GB (100 x 1024 4 or approximately 1.1 x 10 14 ) of storage. You would need about 6.12 x 10 189 of these disks.
Now assume that each of those disks is only one millionth A Brief History of Time , the universe itself is only about 2 x 1010 of a gram in weight (1 pound is 453.59 grams). The weight of all your storage would come to over 6.75 x 10 177 tons of disk. The planet Earth weighs only 6.588 x 10 21 tons. The Chandrasekhar limit, the amount of mass at which a star will collapse into a black hole, is about 1.5 times the mass of our Sun, or approximately 3.29 x 10 27 tons. Thus, your storage, left to itself, would collapse into a black hole from which your factoring could not escape! We are not sure how much mass is in our local galaxy, but we suspect it might be less than the amount you'd need for this project.
Again, it looks fairly certain that without a major breakthrough in number theory, the RSA mechanism (and similar methods) are almost assuredly safe from brute-force attacks, provided that you are careful in selecting appropriately prime numbers to create your key.
One type of encryption system is truly unbreakable: the so-called one-time pad mechanism . A one-time pad is illustrated in Figure 6.4 .
The one-time pad often makes use of the mathematical function known as exclusive OR ( XOR ,). If you XOR a number with any value V , then you get a second, encrypted number. XOR the encrypted number with value V a second time, and you'll get your starting number back. That is:
message = M
ciphertext = MV
plaintext = ciphertextV = ((MV)V)
A system based on one-time pads is mathematically unbreakable (provided that the key itself is truly random) because you can't do a key search: if you try every possible key, you will get every possible message of the same length back. How do you tell which one is correct?
Unfortunately, there is a catch: to use this system, you need to have a stream of values - a key, if you will - that is at least as long as the message you wish to encrypt. Each character of your message is XOR 'ed, bit by bit, with each successive character of the stream.
One-time pads have two important vulnerabilities:
Most one-time pads are generated with machines based on nuclear radioactive decay, a process that is believed to be unpredictable, if not truly random. Almost every "random number generator" you will find on any standard computer system will not generate truly random numbers: the sequence will eventually repeat.
To see how this encryption mechanism works in practice, imagine the following one-time pad:
23 43 11 45 23 43 98 43 52 86 43 87 43 92 34
With this sequence, the message:
UNIX is secure.
Might encrypt as follows:
Which is really a printed representation of the following string of values:
69 98 85 55 66 17 11 71 51 72 34 89 57 12
To use a one-time pad system, you need two copies of the pad: one to encrypt the message, and the other to decrypt the message. Each copy must be destroyed after use; after all, if an attacker should ever obtain a copy of the pad, then any messages sent with it would be compromised.
One of the main uses of one-time pads has been for sensitive diplomatic communications that are subject to heavy monitoring and intensive code breaking attempts - such as the communication lines between the U.S. Embassy in Moscow and the State Department in Washington, D.C. However, the general use of one-time pads is limited because generating the sequence of random bits is difficult, and distributing it securely to both the sender and the recipient of the message is expensive.
Because of the problems associated with one-time pads, other kinds of algorithms are normally employed for computer encryption. These tend to be compact, mathematically based functions. The mathematical functions are frequently used to generate a pseudorandom sequence of numbers that are XOR 'ed with the original message - which is similar to the way the one-time pad works. (For example, Jim Bidzos, president of RSA Data Security, likes to call its RC4 stream cipher a "one-time pad generator," although the cipher is clearly not such a thing.) The difference between the two techniques is that, with mathematical functions, you can always - in principle, at least - translate any message encrypted with these methods back into the original message without knowing the key.
The RC4 algorithm mentioned in the previous section is an example of a so-called proprietary encryption algorithm: an encryption algorithm developed by an individual or company which is not made publicly available.
There are many proprietary algorithms in use today. Usually, the algorithms are private key algorithms that are used in place of algorithms such as DES or IDEA . Although some proprietary systems are relatively secure, the vast majority are not. To make matters worse, you can rarely tell which are safe and which are not - especially if the company selling the encryption program refuses to publish the details of the algorithm.
A standard tenet in data encryption is that the security of the system should depend completely on the security of the encryption key. When choosing an encryption system, rely on formal mathematical proofs of security, rather than on secret proprietary systems. If the vendor of an encryption algorithm or technology will not disclose the algorithm and show how it has been analyzed to show its strength, you are probably better off avoiding it.
The RC4 algorithm is no exception to this tenet. In 1994, an unknown person or persons published source code that claimed to be RC4 on the Internet. By early 1996, a number of groups around the world had started to find minor flaws in the algorithm, most of them having to do with weak keys. Although RC4 appears to be secure for most applications at this time, the clock may be ticking.