[Chapter 6] 6.4 Common Cryptographic Algorithms

6.4 Common Cryptographic Algorithms

There are two basic kinds of encryption algorithms in use today:

Private key cryptography, which uses the same key to encrypt and decrypt the message. This type is also known as symmetric key cryptography.
Public key cryptography, which uses a public key to encrypt the message and a private key to decrypt it. The name public key comes from the fact that you can make the encryption key public without compromising the secrecy of the message or the decryption key. Public key systems are also known as asymmetric key cryptography.

Private key cryptography is most often used for protecting information stored on a computer's hard disk, or for encrypting information carried by a communications link between two different machines. Public key cryptography is most often used for creating digital signatures on data, such as electronic mail, to certify the data's origin and integrity.

This analysis gives rise to a third kind of system:

Hybrid public/private cryptosystems. In these systems, slower public key cryptography is used to exchange a random session key , which is then used as the basis of a private key algorithm. (A session key is used only for a single encryption session and is then discarded.) Nearly all practical public key cryptography implementations are actually hybrid systems.

6.4.1 Summary of Private Key Systems

The following list summarizes the private key systems in common use today.

ROT13: A simple cryptography algorithm which is used, among other things, to obscure the content of risqué jokes on various Usenet groups. The ROT13 encryption algorithm has no key, and it is not secure.
crypt: The original UNIX encryption program which is modeled on the German Enigma encryption machine. crypt uses a variable-length key. Some programs can automatically decrypt crypt -encrypted files without prior knowledge of the key or the plaintext. crypt is not secure. (This program should not be confused with the secure one-way crypt program that UNIX uses for encrypting passwords.)
DES: The Data Encryption Standard ( DES ), an encryption algorithm developed in the 1970s by the National Bureau of Standards and Technology (since renamed the National Institute of Standards and Technology, or NIST ) and IBM . DES uses a 56-bit key.[8]

[8] Technically, we should refer to it as the DEA: Data Encryption Algorithm. Standard-conforming implementations are certified by NIST, and usually require a hardware implementation. However, nearly everyone refers to it as the DES, so we will too.
RC2: A block cipher originally developed by Ronald Rivest and kept as a trade secret by RSA Data Security. This algorithm was revealed by an anonymous Usenet posting in 1996 and appears to be reasonably strong (although there are some particular keys that are weak). RC2 is sold with an implementation that allows keys between 1 and 2048 bits. The RC2mail key length is often limited to 40 bits in software that is sold for export.[9]

[9] Unfortunately, a 40-bit key is vulnerable to a brute force attack.
RC4: A stream cipher originally developed by Ronald Rivest and kept as a trade secret by RSA Data Security. This algorithm was revealed by an anonymous Usenet posting in 1994 and appears to be reasonably strong (although there are some particular keys that are weak). RC4 is sold with an implementation that allows keys between 1 and 2048 bits. The RC4 key length is often limited to 40 bits in software that is sold for export.[10]

[10] Unfortunately, a 40-bit key is vulnerable to a brute force attack.
RC5: A block cipher developed by Ronald Rivest and published in 1994. RC5 allows a user-defined key length, data block size, and number of encryption rounds.
IDEA: The International Data Encryption Algorithm ( IDEA ), developed in Zurich, Switzerland by James L. Massey and Xuejia Lai and published in 1990. IDEA uses a 128-bit key, and is believed to be quite strong. IDEA is used by the popular program PGP (described later in this chapter) to encrypt files and electronic mail. Unfortunately,[11] wider use of IDEA may be hampered by a series of software patents on the algorithm which is currently held by Ascom-Tech AG, in Solothurn, Switzerland. Ascom-Tech supposedly will allow IDEA to be used royalty free in implementations of PGP outside the U.S., but concerned users should verify the terms with Ascom-Tech or their licensees directly.

[11] Although we are generally in favor of intellectual property protection, we are opposed to the concept of software patents, in part because they hinder the development and use of innovative software by individuals and small companies.
Skipjack: A classified ( SECRET ) algorithm developed by the National Security Agency ( NSA ). Reportedly, a Top Secret security clearance is required to see the algorithm's source code and design specifications. Skipjack is the algorithm used by the Clipper encryption chip. It uses an 80-bit key.

6.4.2 Summary of Public Key Systems

The following list summarizes the public key systems in common use today:

Diffie-Hellman

A system for exchanging cryptographic keys between active parties. Diffie-Hellman is not actually a method of encryption and decryption, but a method of developing and exchanging a shared private key over a public communications channel. In effect, the two parties agree to some common numerical values, and then each party creates a key. Mathematical transformations of the keys are exchanged. Each party can then calculate a third session key that cannot easily be derived by an attacker who knows both exchanged values.

Several versions of this protocol exist, involving a differing number of parties and different transformations. Particular care must be exercised in the choice of some of the numbers and calculations used or the exchange can be easily compromised. If you are interested, consult the references for all the gory details.

The Diffie-Hellman algorithm is frequently used as the basis for exchanging cryptographic keys for encrypting a communications link. The key may be any length, depending on the particular implementation used. Longer keys are generally more secure.

RSA

The well-known public key cryptography system developed by (then) MIT professors Ronald Rivest and Adi Shamir, and by USC professor Leonard Adleman. RSA can be used both for encrypting information and as the basis of a digital signature system. Digital signatures can be used to prove the authorship and authenticity of digital information. The key may be any length, depending on the particular implementation used. Longer keys are generally considered to be more secure.

ElGamal

Another algorithm based on exponentiation and modular arithmetic. ElGamal may be used for encryption and digital signatures in a manner similar to the RSA algorithm. Longer keys are generally considered to be more secure.

DSA

The Digital Signature Algorithm, developed by NSA and adopted as a Federal Information Processing Standard ( FIPS ) by NIST . Although the DSA key may be any length, only keys between 512 and 1024 bits are permitted under the FIPS . As specified, DSA can only be used for digital signatures, although it is possible to use DSA implementations for encryption as well. The DSA is sometimes referred to as the DSS , in the same manner as the DEA is usually referred to as the DES .

Table 6.1 lists all of the private and public key algorithms we've discussed

Table 6.1: Commonly Used Private and Public Key Cryptography Algorithms
Algorithm	Description
Private Key Algorithms:
ROT13	Keyless text scrambler; very weak.
crypt	Variable key length stream cipher; very weak.[12]
DES	56-bit block cipher; patented, but freely usable (but not exportable).
RC2	Variable key length block cipher; proprietary.
RC4	Variable key length stream cipher; proprietary.
RC5	Variable key length block cipher; proprietary.
IDEA	128-bit block cipher; patented.
Skipjack	80-bit stream cipher; classified.
Public Key Algorithms:
Diffie-Hellman	Key exchange protocol; patented.
RSA	Public key encryption and digital signatures; patented
ElGamal	Public key encryption and digital signatures; patented.
DSA	Digital signatures only; patented.

[12] Actually, crypt is a fair cipher for files of length less than 1024 bytes. Its recurrence properties only surface when used on longer inputs, thus providing more information for decrypting.

The following sections provide some technical information about a few of the algorithms mentioned above. If you are only interested in using encryption, you can skip ahead to the section called "Encryption Programs Available for UNIX" later in this chapter.

6.4.3 ROT13: Great for Encoding Offensive Jokes

ROT13 is a simple substitution cipher[13] that is traditionally used for distributing potentially objectionable material on the Usenet, a worldwide bulletin board system. It is a variation on the Caesar Cipher - an encryption method used by Caesar's troops thousands of years ago. In the ROT13 cipher, each letter of the alphabet is replaced with a letter that is 13 letters further along in the alphabet (with A following Z). Letters encrypt as follows:

[13] Technically, it is an encoding scheme - the "rotation" is fixed, and it does a constant encoding from a fixed alphabet.

Plaintext	Ciphertext
A	N
B	O
. . .
M	Z
N	A
. . .
Z	M

ROT13 used to be the most widely used encryption system in the UNIX world. However, it is not secure at all. Many news and mail-reading programs automatically decrypt ROT13 -encoded text with a single keystroke. Some people are known to be able to read ROT13 text without any machine assistance whatsoever.

For example, here is a ROT13 message:

Jung tbrf nebhaq, pbzrf nebhaq.

And here is how the message decrypts:

What goes around, comes around.

If you are not blessed with the ability to read ROT13 files without computer assistance, you can use the following command to either encrypt or decrypt files with the ROT13 algorithm:[14]

[14] On some versions of UNIX, you will need to remove the "[ ]" symbols.

% tr "[a-z][A-Z]" "[n-z][a-m][N-Z][A-M]" < 
filename

Needless to say, do not use ROT13 as a means of protecting your files! The only real use for this "encryption" method is the one to which it is put on the Usenet: to keep someone who does not want to be exposed to material (such as the answer to a riddle, a movie spoiler in a review, or an offensive joke) from reading it inadvertently.

6.4.4 DES

One of the most widely used encryption systems today is the Data Encryption Standard ( DES ), developed in the 1970s and patented by researchers at IBM . The DES was an outgrowth of another IBM cipher known as Lucifer. IBM made the DES available for public use, and the federal government issued Federal Information Processing Standard Publication ( FIPS PUB ) Number 46 in 1977 describing the system. Since that time, the DES has been periodically reviewed and reaffirmed (most recently in December 30, 1993), until 1998 as FIPS PUB 46-2. It has also been adopted as an American National Standard (X3.92-1981/R1987).

The DES performs a series of bit permutation, substitution, and recombination operations on blocks containing 64 bits of data and 56 bits of key (eight 7-bit characters). The 64 bits of input are permuted initially, and are then input to a function using static tables of permutations and substitutions (called S-boxes). The bits are permuted in combination with 48 bits of the key in each round. This process is iterated 16 times (rounds), each time with a different set of tables and different bits from the key. The algorithm then performs a final permutation, and 64 bits of output are provided. The algorithm is structured in such a way that changing any bit in the input has a major effect on almost all of the output bits. Indeed, the output of the DES function appears so unrelated to its input that the function is sometimes used as a random number generator.

Although there is no standard UNIX program that performs encryption using the DES , some vendors' versions of UNIX include a program called des which performs DES encryption. (This command may not be present in international versions of the operating system, as described in the next section.)

6.4.4.1 Use and export of DES

The DES was mandated as the encryption method to be used by all federal agencies in protecting sensitive but not classified information.[15] The DES is heavily used in many financial and communication exchanges. Many vendors make DES chips that can encode or decode information fast enough to be used in data-encrypting modems or network interfaces. Note that the DES is not (and has never been) certified as an encryption method that can be used with U.S. Department of Defense classified material.

[15] Other algorithms developed by the NSA are designed for use with classified information.

Export control rules restrict the export of hardware or software implementations of the DES , even though the algorithm has been widely published and implemented many times outside the United States. If you have the international version of UNIX , you may find that your system lacks a des command. If you find yourself in this position, don't worry; good implementations of the DES can be obtained via anonymous FTP from almost any archive service, including the Usenet comp.sources archives.

For more information about export of cryptography, see "Encryption and U.S. Law," later in this chapter.

6.4.4.2 DES modes

FIPS PUB 81 explains how the DES algorithm can be used in four modes:

Electronic Code Book ( ECB )
Cipher Block Chaining ( CBC )
Cipher Feedback ( CFB )
Output Feedback ( OFB )

Each mode has particular advantages in some circumstances, such as when transmitting text over a noisy channel, or when it is necessary to decrypt only a portion of a file. The following provides a brief discussion of these four methods; consult FIPS PUB 81 or a good textbook on cryptography for details.

ECB Mode. In electronic code book ( ECB ) mode, each block of the input is encrypted using the same key, and the output is written as a block. This method performs simple encryption of a message, a block at a time. This method may not indicate when portions of a message have been inserted or removed. It works well with noisy transmission channels - alteration of a few bits will corrupt only a single 64-bit block.
CBC Mode. In cipher block chaining ( CBC ) mode, the plaintext is first XOR 'ed with the encrypted value of the previous block. Some known value (usually referred to as the initialization vector , or IV) is used for the first block. The result is then encrypted using the key. Unlike ECB mode, long runs of repeated characters in the plaintext will be masked in the output. CBC mode is the default mode for Sun Microsystems' des program.
CFB Mode. In cipher feedback ( CFB ) mode, the output is fed back into the mechanism. After each block is encrypted, part of it is shifted into a shift register. The contents of this shift register are encrypted with the user's key value using (effectively) ECB mode, and this output is XOR 'd with the data stream to produce the encrypted result. This method is self synchronizing, and enables the user to decrypt only a portion of a large database by starting a fixed distance before the start of the desired data.
OFB Mode. In output feedback ( OFB ) mode, the output is also fed back into the mechanism. A register is initialized with some known value (again, the IV). This register is then encrypted with (effectively) ECB mode using the user's key. The result of this is used as the key to encrypt the data block (using an XOR operation), and it is also stored back into the register for use on the next block. The algorithm effectively generates a long stream of key bits that can be used to encrypt/decrypt communication streams, with good tolerance for small bit errors in the transmission. This mode is almost never used in UNIX -based systems.

All of these modes require that byte and block boundaries remain synchronized between the sender and recipient. If information is inserted or removed from the encrypted data stream, it is likely that all of the following data from the point of modification can be rendered unintelligible.

6.4.4.3 DES strength

Ever since DES was first proposed as a national standard, some people have been suspicious of the algorithm. DES was based on a proprietary encryption algorithm developed by IBM called Lucifer, which IBM had submitted to the National Bureau of Standards ( NBS )[16] for consideration as a national cryptographic standard. But whereas Lucifer had a key that was 112 bits long, the DES key was shortened to 56 bits at the request of the National Security Agency. The NSA also requested that certain changes be made in the algorithm's S-boxes. Many people suspected that NSA had intentionally weakened the Lucifer algorithm, so the final standard adopted by NBS would not pose a threat to the NSA 's ongoing intelligence collection activities. But nobody had any proof.

[16] NBS later became the National Institute of Standards and Technology.

Today the DES is more than 20 years old, and the algorithm is definitely showing its age. Recently Michael Weiner, a researcher at Bell Northern Research, published a paper detailing how to build a machine capable of decrypting messages encrypted with the DES by conducting an exhaustive key search. Such a machine could be built for a few million dollars, and could break any DES -encrypted message in about a day. We can reasonably assume that such machines have been built by both governments and private industry.

In June 1994, IBM published a paper describing the design criteria of the DES . The paper claims that the choices of the DES key size, S-boxes, and number of rounds were a direct result of the conflicting goals of making the DES simple enough to fit onto a single chip with 1972 chip-making technology, and the desire to make it resistant to differential cryptanalysis.

These two papers, coupled with many previously published analyses, appear to have finally settled a long-running controversy as to whether or not NSA had intentionally built in weaknesses to the DES . The NSA didn't build a back door into DES that would have allowed it to forcibly decrypt any DES -encrypted transmission: it didn't need to. Messages encrypted with DES can be forcibly decrypted simply by trying every possible key, given the appropriate hardware.

6.4.5 Improving the Security of DES

You can improve the security of DES by performing multiple encryptions, known as superencryption . The two most common ways of doing this are with double encryption ( Double DES ) and with triple encryption ( Triple DES ).

While double DES appears to add significant security, research has found some points of attack, and therefore experts recommend Triple DES for applications where single DES is not adequate.

6.4.5.1 Double DES

In Double DES , each 64-bit block of data is encrypted twice with the DES algorithm, first with one key, then with another, as follows:

Encrypt with (key 1).
Encrypt with (key 2).

Plaintext (key1) (key2) ciphertext

Double DES is not significantly more secure than single DES . In 1981, Ralph Merkle and Martin Hellman published an article[17] in which they outlined a so-called "meet-in-the-middle attack."

[17] R. C. Merkle and M. Hellman, "On the Security of Multiple Encryption," Communications of the ACM , Volume 24, Number 7, July 1981, pp. 465-467.

The meet-in-the-middle attack is a known plaintext attack which requires that an attacker have both a known piece of plaintext and a block of that same text that has been encrypted. (These pieces are surprisingly easily to get.) The attack requires storing 2⁵⁶ intermediate results when trying to crack a message that has been encrypted with DES (a total of 2⁵⁹ bytes), but it reduces the number of different keys you need to check from 2¹¹² to 2⁵⁷ . "This is still considerably more memory storage than one could comfortably comprehend, but it's enough to convince the most paranoid of cryptographers that double encryption is not worth anything," writes Bruce Schneier in his landmark volume, Applied Cryptography.

In other words, because a message encrypted with DES can be forcibly decrypted by an attacker performing an exhaustive key search today, an attacker might also be able to forcibly decrypt a message encrypted with Double DES using a meet-in-the-middle attack at some point in the future.

6.4.5.2 Triple DES

The dangers of the Merkle-Hellman meet-in-the-middle attack can be circumvented by performing three block encryption operations. This method is called Triple DES .

In practice, the most common way to perform Triple DES is:

Encrypt with (key1).
Decrypt with (key2).
Encrypt with (key3).

The advantage of this technique is that it can be backward compatible with single DES , simply by setting all three keys to be the same value.

To decrypt, reverse the steps:

Decrypt with (key3).
Encrypt with (key2).
Decrypt with (key1).

For many applications, you can use the same key for both key1 and key3 without creating a significant vulnerability.

Triple DES appears to be roughly as secure as single DES would be if it had a 112-bit key. How secure is this really? Suppose you had an integrated circuit which could perform one million Triple DES encryptions per second, and you built a massive computer containing one million of these chips to forcibly try all Triple DES keys. This computer, capable of testing 10¹² encryptions per second, would require:

2¹¹²
 = 5.19 x 10³³
 encryption operations 

5.19 x 10³³
 encryption operations / 10¹²
 operations/sec = 5.19 x 10²¹
 sec

= 1.65 x 10¹⁴
 years.

This is more than 16,453 times older than the currently estimated age of the universe (approximately 10¹⁰ years).

Apparently, barring new discoveries uncovering fundamental flaws or weaknesses with the DES algorithm, or new breakthroughs in the field of cryptanalysis, Triple DES is the most secure private key encryption algorithm that humanity will ever need (although niche opportunities may exist for faster algorithms).

6.4.6 RSA and Public Key Cryptography

RSA is the most widely known algorithm for performing public key cryptography. The algorithm is named after its inventors, Ronald Rivest, Adi Shamir, and Leonard Adleman, who made their discovery in the spring of 1977.

Unlike DES , which uses a single key, RSA uses two cryptographic keys: a public key and a secret key. The public key is used to encrypt a message and the secret key is used to decrypt it. (The system can also be run in reverse, using the secret key to encrypt data that can be decrypted with the public key.)

The RSA algorithm is covered by U.S. Patent 4,405,829 ("Cryptographic Communications System and Method"), which was filed for on December 14, 1977; issued on September 20, 1983; and expires on September 20, 2000. Because a description of the algorithm was published before the patent application was filed, RSA can be used without royalty everywhere in the world except the United States (international patent laws have different coverage of prior disclosure and patent applicability).[18] Not surprisingly, RSA is significantly more popular in Europe and Japan than in the United States, although its popularity in the U.S. is increasing.

[18] Ongoing controversy exists over whether this, or any other patent granted on what amounts to a series of mathematical transformations, can properly be patented. Some difference of opinion also exists about the scope of the patent protection. We anticipate that the courts will need a lot of time to sort out these issues.

6.4.6.1 How RSA works

The strength of RSA is based on the difficulty of factoring a very large number. The following brief treatment does not fully explain the mathematical subtleties of the algorithm. If you are interested in more detail, you can consult the original paper[19] or a text such as those listed in Appendix D .

[19] Rivest, R., Shamir, A., Adleman, L., "A Method for Obtaining Digital Signatures and Public Key Cryptosystems," Communications of the ACM , Volume 21, Number 2, February 1978.

RSA is based on well-known, number-theoretic properties of modular arithmetic and integers. One property makes use of the Euler Totient Function, (n) . The Totient function of a number is defined as the count of integers less than that number that are relatively prime to that number. (Two numbers are relatively prime if they have no common factors; for example, 9 and 8 are relatively prime.) The Totient function for a prime number is one less than the prime number itself: every positive integer less than the number is relatively prime to it.

The property used by RSA was discovered by Euler and is this: any integer i relatively prime to n raised to the power of (n) and taken mod n is equal to 1. That is:

equation goes here

Suppose e and d are random integers that are inverses modulo [phi] (n) , that is:

equation goes here

A related property used in RSA was also discovered by Euler. His theorem says that if M is any number relatively prime to n , then:

and equation goes here

Cryptographically speaking, if M is part of a message, we have a simple means for encoding it with one function:

equation goes here

and decoding it with another function:

equation goes here

So how do we get appropriate values for n , e , and d ? First, two large prime numbers p and q , of approximately the same size, are chosen, using some appropriate method. These numbers should be large - on the order of several hundred digits - and they should be kept secret.

Next, the Euler Totient function [phi] ( pq ) is calculated. In the case of n being the product of two primes, [phi] ( p q ) = ( p - 1 ) ( q - 1 ) = [phi] ( n ).

Next, we pick a value e that is relatively prime to [phi] ( n ). A good choice would be to pick something in the interval max ( p + 1, q + 1 ) < e <; [phi] ( n ). Then we calculate a corresponding d , such that e d mod (n) 1 . That is, we find the modular inverse of e mod (n) . If d should happen to be too small (i.e., less than about log2(n)) , we pick another e and d .

Now we have our keys. To encrypt a message m, we split m into fixed-size integers M less than n. Then we find the value (M^e ) mod n = s for each portion of the message. This calculation can be done quickly with hardware, or with software using special algorithms. These values are concatenated to form the encrypted message. To decrypt the message, it is split into the blocks, and each block is decrypted as (s^d ) mod n = M .

6.4.6.2 An RSA example

For this example, assume we pick two prime numbers p and q :

p = 251 
q = 269

The number n is therefore:

n = 251 * 269 = 67519

The Euler Totient function for this value is:


(n)
 = (251-1) (269-1) = 67000

Let's arbitrarily pick e as 50253. d is then:

d = e^-1
 mod 67000 = 27917

because:

50253 * 27917 = 1402913001 = 20939 * 67000 + 1 = 1 ( mod 67000 )

Using n = 67519 allows us to encode any message M that is between 0 and 67518. We can therefore use this system to encode a text message two characters at a time. (Two characters have 16 bits, or 65536 possibilities.)

Using e as our key, let's encode the message " RSA works!" The sequence of ASCII characters encoding " RSA works!" is shown in the following table

Table 6.2: RSA Encoding Example
ASCII	Decimal Value	Encoded Value
"RS"	21075	48467
"A"	16672	14579
"wo"	30575	26195
"rk"	29291	58004
"s!"	29473	30141

As you can see, the encoded values do not resemble the original message.

To decrypt, we raise each of these numbers to the power of d and take the remainder mod n. After translating to ASCII , we get back the original message.

When RSA is used for practical applications, it is used with numbers that are hundreds of digits long. Because doing math with hundred-digit-long strings is time consuming, modern public key applications are designed to minimize the number of RSA calculations that need to be performed. Instead of using RSA to encrypt the entire message, RSA is used to encrypt a session key, which itself is used to encrypt the message using a high-speed, private key algorithm such as DES or IDEA .

6.4.6.3 Strength of RSA

The numbers n and either e or d can be disclosed without seriously compromising the strength of an RSA cipher. For an attacker to be able to break the encryption, he or she would have to find [phi] ( n ), which, to the best of anyone's knowledge, requires factoring n .

Factoring large numbers is very difficult - no known method exists to do it efficiently. The time required to factor a number can be several hundred years or several billion years with the fastest computers, depending on how large the number n is. If n is large enough, it is, for all intents and purposes, unfactorable. The RSA encryption system is therefore quite strong, provided that appropriate values of n , e , and d are chosen, and that they are kept secret.

To see how difficult factoring a large number is, let's do a little rough calculation of how long factoring a 200 decimal-digit number would take; this number is more than 70 digits longer than the largest number ever factored, as of the time this book went to press.

All 200-digit values can be represented in at most 665 binary bits.

has digits.equation goes here

(In general, to factor a 665-bit number using one of the fastest-known factoring algorithms would require approximately 1.2 x 10²³ operations.)

Let's assume you have a machine that will do 10 billion (10¹⁰ ) operations per second. (Somewhat faster than today's fastest parallel computers.) To perform 1.2 x 10²³ operations would require 1.2 x 10¹³ seconds, or 380,267 years worth of computer time. If you feel uneasy about having your number factored in 380,267 years, simply double the size of your prime number: a 400-digit number would require a mere 8.6 x 10¹⁵ years to factor. This is probably long enough; according to Stephen Hawking's A Brief History of Time , the universe itself is only about 2 x 10¹⁰ years old.

To give you another perspective on the size of these numbers, assume that you (somehow) could precalculate the factors of all 200 decimal digit numbers. Simply to store the unfactored numbers themselves would require approximately (9 x 10²⁰⁰ ) x 665 bits of storage (not including any overhead or indexing). Assume that you can store these on special media that hold 100GB (100 x 1024⁴ or approximately 1.1 x 10¹⁴ ) of storage. You would need about 6.12 x 10 ¹⁸⁹ of these disks.

Now assume that each of those disks is only one millionth A Brief History of Time , the universe itself is only about 2 x 1010 of a gram in weight (1 pound is 453.59 grams). The weight of all your storage would come to over 6.75 x 10¹⁷⁷ tons of disk. The planet Earth weighs only 6.588 x 10²¹ tons. The Chandrasekhar limit, the amount of mass at which a star will collapse into a black hole, is about 1.5 times the mass of our Sun, or approximately 3.29 x 10²⁷ tons. Thus, your storage, left to itself, would collapse into a black hole from which your factoring could not escape! We are not sure how much mass is in our local galaxy, but we suspect it might be less than the amount you'd need for this project.

Again, it looks fairly certain that without a major breakthrough in number theory, the RSA mechanism (and similar methods) are almost assuredly safe from brute-force attacks, provided that you are careful in selecting appropriately prime numbers to create your key.

Not a year goes by without someone claiming to have made a revolutionary breakthrough in the field of factoring - some new discovery which "breaks" RSA and leaves any program based upon it vulnerable to decoding.

To help weed out the frauds from the real findings, RSA Data Security has created a challenge list of numbers. Each number on the list is the product of two large prime numbers. When the folks at RSA are contacted by somebody who claims a factoring breakthrough (and usually promises to keep the breakthrough a secret in exchange for some cash), they give the person a copy of the RSA challenge numbers.

The first RSA challenge number was published in the August 1977 issue of Scientific American. The number was 129 digits long, and $100 was offered for finding its factors. The RSA -129 number was solved in the fall of 1994 by an international team of more than 600 volunteers.

A month later, a researcher at the MIT AI Laboratory contacted one of the authors of this chapter, claiming a new factoring breakthrough. But there was a catch: to prove that he had solved the factoring problem, the researcher had factored that same RSA -129 number.

Nice try. That one has already been solved. If you are interested in the fame and fortune that comes from finding new factoring functions, try factoring this:

RSA-140 =
2129024631825875754749788201627151749780670396327721627823338321538194
9984056495911366573853021918316783107387995317230889569230873441936471

If you factor this number, you'll get more than fame: you'll get cash. RSA Data Security keeps a "jackpot" for factoring winners. The pot grows by $1750 every quarter. The first quarter of 1995, it was approximately $15,000.

You can get a complete list of all the RSA challenge numbers by sending an electronic mail message to challenge-rsa-list@rsa.com .

6.4.7 An Unbreakable Encryption Algorithm

One type of encryption system is truly unbreakable: the so-called one-time pad mechanism . A one-time pad is illustrated in Figure 6.4 .

The one-time pad often makes use of the mathematical function known as exclusive OR ( XOR , (+) ). If you XOR a number with any value V , then you get a second, encrypted number. XOR the encrypted number with value V a second time, and you'll get your starting number back. That is:

message = M

ciphertext = M (+) V

plaintext = ciphertext (+) V = ((M (+) V) (+) V)

A system based on one-time pads is mathematically unbreakable (provided that the key itself is truly random) because you can't do a key search: if you try every possible key, you will get every possible message of the same length back. How do you tell which one is correct?

Unfortunately, there is a catch: to use this system, you need to have a stream of values - a key, if you will - that is at least as long as the message you wish to encrypt. Each character of your message is XOR 'ed, bit by bit, with each successive character of the stream.

Figure 6.4: A one-time pad

One-time pads have two important vulnerabilities:

The stream of random values must be truly random. If there is any regularity or order, that order can be measured and exploited to decode the message.
The stream must never be repeated. If it is, a sophisticated cryptanalyst can find the repeats and use them to decode all messages that have been encoded with the no-longer one-time pad.

Most one-time pads are generated with machines based on nuclear radioactive decay, a process that is believed to be unpredictable, if not truly random. Almost every "random number generator" you will find on any standard computer system will not generate truly random numbers: the sequence will eventually repeat.

To see how this encryption mechanism works in practice, imagine the following one-time pad:

23 43 11 45 23 43 98 43 52 86 43 87 43 92 34

With this sequence, the message:


UNIX
 is secure.

Might encrypt as follows:

:\:\:s*:3:\rEs[drNERwe.

Which is really a printed representation of the following string of values:

69 98 85 55 66 17 11 71 51 72 34 89 57 12

To use a one-time pad system, you need two copies of the pad: one to encrypt the message, and the other to decrypt the message. Each copy must be destroyed after use; after all, if an attacker should ever obtain a copy of the pad, then any messages sent with it would be compromised.

One of the main uses of one-time pads has been for sensitive diplomatic communications that are subject to heavy monitoring and intensive code breaking attempts - such as the communication lines between the U.S. Embassy in Moscow and the State Department in Washington, D.C. However, the general use of one-time pads is limited because generating the sequence of random bits is difficult, and distributing it securely to both the sender and the recipient of the message is expensive.

Because of the problems associated with one-time pads, other kinds of algorithms are normally employed for computer encryption. These tend to be compact, mathematically based functions. The mathematical functions are frequently used to generate a pseudorandom sequence of numbers that are XOR 'ed with the original message - which is similar to the way the one-time pad works. (For example, Jim Bidzos, president of RSA Data Security, likes to call its RC4 stream cipher a "one-time pad generator," although the cipher is clearly not such a thing.) The difference between the two techniques is that, with mathematical functions, you can always - in principle, at least - translate any message encrypted with these methods back into the original message without knowing the key.

6.4.8 Proprietary Encryption Systems

The RC4 algorithm mentioned in the previous section is an example of a so-called proprietary encryption algorithm: an encryption algorithm developed by an individual or company which is not made publicly available.

There are many proprietary algorithms in use today. Usually, the algorithms are private key algorithms that are used in place of algorithms such as DES or IDEA .[20] Although some proprietary systems are relatively secure, the vast majority are not. To make matters worse, you can rarely tell which are safe and which are not - especially if the company selling the encryption program refuses to publish the details of the algorithm.

[20] While creating a public key encryption system is difficult, creating a private key system is comparatively easy. Making a private key system that is secure , on the other hand, is a considerably more difficult endeavor.

A standard tenet in data encryption is that the security of the system should depend completely on the security of the encryption key. When choosing an encryption system, rely on formal mathematical proofs of security, rather than on secret proprietary systems. If the vendor of an encryption algorithm or technology will not disclose the algorithm and show how it has been analyzed to show its strength, you are probably better off avoiding it.

The RC4 algorithm is no exception to this tenet. In 1994, an unknown person or persons published source code that claimed to be RC4 on the Internet. By early 1996, a number of groups around the world had started to find minor flaws in the algorithm, most of them having to do with weak keys. Although RC4 appears to be secure for most applications at this time, the clock may be ticking.