Chapter 7. Introduction to Cryptography

So far, we've examined the basic level of Java's security paradigm--essentially, those features that make up the Java sandbox. We're now going to shift gears somewhat and begin our examination of the cryptographic features in the Java security package itself. The Java security package is a set of classes that were added to Java 1.1 (and expanded in 1.2[1]); these classes provide additional layers of security beyond the layers we've examined so far. Although these classes do play a role in the Java sandbox--they are the basis by which Java classes may be signed, and expanding the sandbox based on signed classes is a key goal of Java security--they may play other roles in secure applications.

[1]1.2 is now Java 2.

A digital signature, for example, can authenticate a Java class so that the security policy can allow that class greater latitude in the operations it can perform, but a digital signature is a useful thing in its own right. An HR department may want to use a digital signature to verify requests to change payroll data, an online subscription service might require a digital signature to process a change order, and so on. Thus, while we'll examine the classes of the Java security package from the perspective of what we'll be able to do with a signed class, the techniques we'll show will have broader applicability.

In order to use the classes of the security package, you don't need a deep understanding of cryptographic theory. This chapter will explain the basic concepts of the operations involved, which should be sufficient to understand how to use the APIs involved. On the other hand, one feature of the security package is that different implementations of different algorithms may be provided by third-party vendors. We'll explain how to go about providing such implementations, but it is assumed that readers who are interested in writing such an implementation already understand the mechanics of cryptography. Hence, we won't give any cryptographically valid examples in those sections.

If you already have an understanding of the basics of digital signatures, encryption, and the need for authentication, you can skip this chapter, which provides mainly background information.

7.1. The Need for Authentication

We are primarily concerned with one goal of the security package: the ability to authenticate classes that have been loaded from the network. The components of the Java API that provide authentication may have other uses in other contexts (including within your own Java applications), but their primary goal is to allow a Java application (and a Java-enabled browser) to load a class from the network and be assured of two things:

The identity of the site from which the class was loaded can be verified (author authentication).
The class was not modified in transit over the network (data authentication).

As we've seen, Java applications typically assume that all classes loaded over the network are untrusted classes, and these untrusted classes are generally given permissions consistent with that assumption. Classes that meet the above two criteria, however, need not necessarily be so constrained. If you walk into your local software store and buy a shrink-wrapped piece of software, you're generally confident that the software will not contain viruses or anything else that's harmful. This is part of the implied contract between a commercial software producer and a commercial software buyer. If you download code from that same software producer's web site, you're probably just as confident that the code you're downloading is not harmful; perhaps it should be given the same access rights as the software you obtained from that company through a more traditional channel.

There's a small irony here, because many computer viruses are spread through commercial software. That's one reason why the fact that a class has been authenticated does not necessarily mean it should be able to access anything on your machine that it wants to. It's also a reason why the fine-grained nature of the access controller is important: if you buy classes from acme.com, but only give them access to certain things on your machine, you are still somewhat protected if by mistake acme.com includes a virus in their software.

Even if all commercial software were virus free, however, there is a problem with assuming that code downloaded from a commercial site is safe to run on your machine. The problem with that assumption--and the reason that Java by default does not allow that assumption to be made--has to do with the way in which the code you execute makes its way through the Internet. If you load some code from www.xyz.com onto your machine, that code will pass through many machines that are responsible for routing the code between your site and XYZ's site. Typically, we like to think that the data that passes between our desktop and www.xyz.com enters some large network cloud; it's called a cloud because it contains a lot of details, and the details aren't usually important to us. In this case, however, the details are important. We're very interested to know that the data between our desktop and xyz.com passes through, for example, our Internet service provider, two other sites on the Internet backbone, and XYZ's Internet service provider. Such a transmission is shown in Figure 7-1. The two types of authentication that we mentioned above provide the necessary assurance that the data passing through all these sites is not compromised.

Figure 7-1. How data travels through a network

7.1.1. Author Authentication

First we must prove that the author of the data is who we expect it to be. When you send data that is destined for www.xyz.com, that data is forwarded to site2, who is supposed to forward it to site1, who should simply forward it to XYZ's Internet service provider. You trust site1 to forward the data to XYZ's Internet service provider unchanged; however, there's nothing that causes site1 to fulfill its part of this contract. A hacker at site1 could arrange for all the data destined for www.xyz.com to be sent to the hacker's own machine, and the hacker could send back data through site2 that looked as if it originated from www.xyz.com. The hacker is now successfully impersonating the www.xyz.com site. Hence, although the URL in your browser says www.xyz.com, you've been fooled: you're actually receiving whatever data the impersonator of XYZ Corporation wants to send to you.

There are a number of ways to achieve this masquerade, the most well-known of which is DNS (or IP) spoofing. When you want to surf to www.xyz.com, your desktop asks your DNS server (which is typically your Internet service provider) for the IP address of www.xyz.com and you then send off the request to whatever address you receive. If your Internet service provider knows the IP address of www.xyz.com, it tells your desktop what the correct address is; otherwise, it has to ask another DNS server (e.g., site1) for the correct IP address. If a hacker has control of a machine anywhere along the chain of DNS servers, it is relatively simple for that hacker to send out his own address in response to a DNS request for www.xyz.com.

Now say that you surf to www.xyz.com and request a Java class (or set of classes) to run a spellchecker for your Java-based word processor. The request you send to www.xyz.com will be misaddressed by your machine--your machine will erroneously send the request to the hacker's machine, since that's the IP address your machine has associated with www.xyz.com. Now the hacker is able to send you back a Java class. If that Java class is suddenly trusted (because, after all, it allegedly came from a commercial site), it has access that you wouldn't necessarily approve: perhaps while it's spellchecking your document, it is also searching your hard disk to find the data file of your financial planning software so that it can read that file and send its contents back over the network to the hacker's machine.

Yes, we've made this sound easier than it is--the hacker would have to have intimate knowledge of the xyz.com site to send you back the classes you requested, and those classes would have to have the expected interface in order for any of their code to be executed. But such situations are not difficult to set up either; if the hackers stole the original class files from www.xyz.com--which is usually extremely easy--all they need to do is set themselves up at the right place in the DNS chain.

In the strict Java security model we explored earlier, this sort of situation is possible, but it isn't dangerous. Because the classes loaded from the network are never trusted at all, the class that was substituted by the hackers is not able to damage anything on your machine. At worst, the substituted class does not behave as you expect and may in fact do something quite annoying--like play loud music on your machine instead of spellchecking your document. But the class is not able to do anything dangerous, simply because all classes from the network are untrusted.

In order to trust a class that is loaded from the network, then, we must have some way to verify that the class actually came from the site it said it came from. This authentication comes from a digital signature that comes with the class data--an electronic verification that the class did indeed come from www.xyz.com.

7.1.2. Data Authentication

The second problem introduced by the fact that our transmissions to www.xyz.com must pass through several hosts is the possibility of snooping. In this scenario, assume that site2 on the network is under control of a hacker. When you send data to www.xyz.com, the data passes through the machine on site2, where the hacker can modify it; when data is sent back to you, it travels the same path, which means that the hacker on site2 can again modify the data.

This lack of privacy in data transmission is one reason you might want data over the network to be encrypted--certainly if the spellchecking software you're using from www.xyz.com is something you must pay for, you don't want to send your unencrypted credit card through the network so that site2 can read it. However, for authentication purposes, encrypting the data is not strictly necessary. All that is necessary is some sort of assurance that the data that has passed through the network has not been modified in transit. This can be achieved by various cryptographic algorithms even though the data itself is not encrypted. The simpler path is to use such a cryptographic algorithm (known as a message digest algorithm or a digital fingerprint) instead of encrypting the data.

Encryption Versus Data Authentication

When you send data through a public network, you can use a digital fingerprint of that data to ensure that the data was not modified while it was in transit over the network. This fingerprint is sufficient to prevent a snooper from substituting new data (e.g., a new Java class file) for the original data in your transmission.

However, this authentication does not prevent a snooper from reading the data in your transmission; authenticated data is not encrypted data. If you are worried about someone stealing your data, the security provided by data authentication is insufficient. Data authentication prevents writing of data, but not reading of data.

Java only provides authentication and not encryption because of export laws various countries apply to encryption technology. When we discuss the Java Cryptography Extension in Chapter 13, "Encryption", we'll expand upon these restrictions.

Without some cryptographic mechanism in place, the hacker at site2 has the option of modifying the classes that are sent from www.xyz.com. When the classes are read by the machine at site2, the hacker could modify them in memory before they are sent back onto the network to be read by site1 (and ultimately to be read by your machine). Hence, the classes that are sent need to have a digital fingerprint associated with them. As it turns out, the digital fingerprint is required to sign the class as well.

7.1.3. Java's Role in Authentication

When Java was first released and touted as being "secure," it surprised many people to discover that the types of attacks we've just discussed were still possible. As we've said, security means many things to many people, but a reasonable argument could be made that the scenarios we've just outlined should not be possible in a secure environment.

The reasons Java did not solve these problems in its first release are varied, but they essentially boil down to one practical reason and one philosophical reason.

The practical reason is that all the solutions we're about to explore depend to a high degree on technologies that are just beginning to become viable. As a practical matter, authentication relies on everyone having public keys available--and as we'll discuss in Chapter 11, "Key Management", that's not necessarily the case. Without a robust mechanism to share public keys, Java had two options:

Provide no security at all, and allow applets full use of the resources of the user's computer. By now, we know all the possible problems with that route.
Provide the very strict security that was implemented in 1.0-based versions of Java, with a view toward ways of enhancing that model as technologies evolved. While not the best of all possible worlds, this compromise allowed Java to be adopted much sooner than it would otherwise have been.

On a philosophical level, however, there's another argument: Java shouldn't solve these problems because they are not confined to Java itself. Even if Java classes were always authenticated, that would not prevent the types of attacks we've outlined here from affecting non-Java-related transmissions. If you surf to www.xyz.com and that site is subject to DNS spoofing, you'll be served whatever pages the spoofer wants to substitute. If you engage in a standard non-Java, forms-based transmission with www.xyz.com, a snooper along the way can steal and modify the data you're sending over the standard HTTP transmission mechanism.

In other words, the attacks we've just outlined are inherent in the design of a public network, and they affect all traffic equally--email traffic, web traffic, ftp traffic, Java traffic, and so on. In a perfect world, solving these problems at the Java level is inefficient, as it means that the same problem must still be solved for all the other traffic on the public network. Solving the problem at the network level, on the other hand, solves the problem once and for all, so that every protocol and every type of traffic are protected.

There are a number of popular technologies that solve this problem in a more general case. If all the traffic between your site and www.xyz.com occurs over SSL using an https-based URL, then your browser and the www.xyz.com web server will take care of the details of authentication of all web-based traffic, including the Java-related traffic. That solves the problem at the level of the web browser, but that still is not a complete solution. If the applet needs to open a connection back to www.xyz.com, it must use SSL for this communication as well. And we still have other, non-web-related traffic that is not authenticated.

It would be better still to solve this problem at the network level itself. There are many products from various vendors that allow you to authenticate (and encrypt) all data between your site and a remote site on the network. Using such a product is really the ideal from a design point of view; in that way, all data is protected, no matter what the source of the traffic is. Either of these solutions makes authentication and fingerprinting of Java classes redundant (and they may offer the benefit that the data is actually encrypted when it passes through the network).

Unfortunately, these solutions lead us back to practical considerations: if it's hard for Java environments to share digital keys and to manage cryptographic technology, it's harder still to depend on the network software to manage this process. So while it might be ideal for this problem to be solved for the network as a whole, it's impractical to expect such a solution. Hence, the Java security package offers a reasonable compromise: it allows you to deploy and use trusted (i.e., authenticated) classes, but their use is not mandated, in case you prefer to employ a broader solution to this problem.