Enforcement of the Java Language Rules (Java Security)

2.2. Enforcement of the Java Language Rules

The list of rules we outlined above are fine in theory, but they must be enforced somehow. We've always been taught that overwriting the end of an array in C code is a bad thing, but I somehow still manage to do it accidentally all the time. There are also those who willfully attempt to overwrite the ends of arrays in an attempt to breach the security of a system. Without mechanisms to enforce these memory rules, they become simply guidelines and provide no sort of security at all.

This necessary enforcement happens at three different times in the development and deployment of a Java program: at compile time, at link time (that is, when a class is loaded into the virtual machine), and at runtime. Not all rules can be checked at each of these points, but certain checks are necessary at each point in order to ensure the memory security that we're after. As we'll see, enforcement of these rules (which is really the construction of this part of the Java sandbox) varies depending on the origin of the class in question.

2.2.1. Compiler Enforcement

The Java compiler is the first thing that is tasked with the job of enforcing Java's language rules. In particular, the compiler is responsible for enforcing all of the rules we outlined above except for the last two: the compiler cannot enforce array bound checking nor can it enforce all cases of illegal object casts.

The compiler does enforce certain cases of illegal object casts--namely, casts between objects that are known to be unrelated, such as the following code:

Class Definition

Vector v = new Vector();
String s = (String) v;

But the validity of a cast between an object of type X to type Y where Y is a subclass of X cannot be known at compile time, so the compiler must let such a construct pass.

2.2.2. The Bytecode Verifier

Okay, the compiler has produced a Java program for us, and we're about to run the Java bytecode of that program. But if the program came from an unknown source, how do we know that the bytecodes we've received are actually legal?

Bytecode Verification of Other Languages

Throughout this section, we're discussing the bytecode verifier as if it were tied to the Java language. This is somewhat imprecise: the bytecode verifier is actually independent of the original source language of the program. If we had a C++ compiler that generated Java bytecodes from C++ source, the bytecode verifier would still be able to verify (or not) the bytecodes.

However, the verification of the bytecodes would still depend upon the semantics of the Java language, and not the semantics of C++; just because the bytecodes in question originated from C++ code is no reason that they should suddenly be allowed to cast an arbitrary memory location into an object.

For this reason, I prefer to think of the bytecodes in terms of the Java language itself. There are tools to produce Java bytecodes from other languages (like Scheme), but in general, producing Java bytecodes from another language severely limits the constructs that can be written in that other language.

This brings us to the need for the bytecode verifier--the second link in the chain of responsibility of enforcing the rules of the Java language. Normally when the need for the bytecode verifier is discussed, it's in terms of an evil compiler--that is, a compiler that someone has written in such a way that the code produced by the compiler is not legal Java code. The theory is that code from such a compiler could be constructed in order to create and exploit a security hole by ignoring a rule in the Java language. Such an attack might seem to be difficult to achieve, in that it would require some detailed knowledge of the Java compiler.

It turns out that the evil compiler issue is a red herring--it doesn't really matter whether such an attack is likely or not, because it's trivial to create non-conforming Java code with any standard Java compiler. Assume that we have these classes:

Class Definition

public class CreditCard {
	public String acctNo = "0001 0002 0003 0004";
}

public class Test {
	public static void main(String args[]) {
		CreditCard cc = new CreditCard();
		System.out.println("Your account number is " + cc.acctNo);
	}
}

If we run this code, we'll create a CreditCard object and print out its account number. Now say that we realize the account number should really have been private, so we go back and change the definition of acctNo to be private and recompile only the CreditCard class. We then have two class files, and the Test class file contains Java code that illegally accesses the private instance variable acctNo of the CreditCard class.

The above example shows an innocent mistake, but a malicious programmer could use just this technique to produce illegal Java bytecodes. In order to modify the contents of a string, for example, all we need to do is:

Copy the java.lang.String source file into our CLASSPATH.
In the copy of the file, modify the definition of value--the private array that holds the actual characters of the string--to be public.
Compile this modified class, and replace the String.class file in the JDK.
Compile some new code against this modified version of the String class. The new code could include something like this:

Class Definition
```
public class CorruptString {
	public static void modifyString(String src, String dst) {
		for (int i = 0; i < src.length; i++) {
				if (i == dst.length)
						return;
				src.value[i] = dst.value[i];
		}
	}
}
```
Now any time you want to modify a string in place, simply call this modifyString() method with the string you want to corrupt (src) and the new string you want it to have (dst).
Remove the modified version of the String class.

Now the CorruptString class can be referenced by a Java program, which can use it to attempt to corrupt any string that it has a reference to. Even though the program will run with the original version of the String class, the CorruptString class will be able to access the private value array within the String class--unless the bytecode verifier rejects the CorruptString class.

2.2.2.1. Inside the bytecode verifier

The bytecode verifier is an internal part of the Java virtual machine and has no interface: programmers cannot access it and users cannot interact with it. The verifier automatically examines most bytecodes as they are built into class objects by the class loader of the virtual machine (see Figure 2-1). We'll give just a brief overview of how the bytecode verifier actually works.

Figure 2-1. The bytecode verifier

The verifier is often referred to as a mini-theorem prover (a term first used in several documents from Sun). This sounds somewhat more impressive than it is; it's not a generic, all-purpose theorem prover by any means. Instead, it's a piece of code that can prove one (and only one) thing--that a given series of (Java) bytecodes represents a legal set of (Java) instructions.

Specifically, the bytecode verifier can prove the following:

The class file has the correct format. The full definition of the class file format may be found in the Java virtual machine specification; the bytecode verifier is responsible for making sure that the class file has the right length, the correct magic numbers in the correct places, and so on.
Final classes are not subclassed, and final methods are not overridden.
Every class (except for java.lang.Object) has a single superclass.
There is no illegal data conversion of primitive data types (e.g., int to Object).
No illegal data conversion of objects occurs. Because the casting of a superclass to its subclass may be a valid operation (depending on the actual type of the object being cast), the verifier cannot ensure that such casting is not attempted--it can only ensure that before each such attempt is made, the legality of the cast is tested.
There are no operand stack overflows or underflows.

In Java, there are two stacks for each thread. One stack holds a series of method frames, where each method frame holds the local variables and other storage for a particular method invocation. This stack is known as the data stack and is what we normally think of as the stack within a traditional program. The bytecode verifier cannot prevent overflow of this stack--an infinitely recursive method call will cause this stack to overflow. However, each method invocation requires a second stack (which itself is allocated on the data stack) that is referred to as the operand stack; the operand stack holds the values that the Java bytecodes operate on. This secondary stack is the stack that the bytecode verifier can ensure will not overflow or underflow.

Hence, when the bytecode verifier has completed its task, we know that the code in question follows many of the constraints of the Java language--including most of the rules that the compiler was also responsible for ensuring. The remaining rules are verified during the actual running of the program.

2.2.2.2. Delayed bytecode verification

When we began this section, we said that the bytecode verifier is responsible for examining all the bytecodes of the class--we explicitly did not say that the verifier is responsible for verifying all the bytecodes. This is because the bytecode verifier may delay some of the checks it is responsible for, as long as those checks are performed before the code is actually executed. In typical verifier implementations, the bytecode verifier does not immediately test to see if all field and method accesses are legal according to the access modifiers associated with that field or method.

This is driven by a desire to be efficient--our Test class may reference the acctNo field of our CreditCard class, but it may do so only if a particular branch in the code is taken. In the following code, there's no need to verify that the access to acctNo is legal unless an IllegalArgumentException has been generated:

Class Definition

CreditCard cc = getCreditCard();
try {
	Wallet.makePurchase(cc);
} catch (IllegalArgumentException iae) {
	System.out.println("Can't process for account " + cc.acctNo);
}

Hence, the bytecode verifier delays all tests for field and method access until the code is actually executed. The process by which this happens is implementation independent; one technique that is often used is to ensure during verification that all accesses test the validity of the field access. If the access is valid, the standard bytecodes are then replaced during execution with a special bytecode indicating that the test has been performed and access to the field in question no longer needs to be tested. On the other hand, if the validity test fails, the virtual machine throws an IllegalAccessException.

This gives us the best of both worlds--verification of the access is performed during the actual running of the program (after traditional bytecode verification has occurred), but the verification is still only performed once (unlike the runtime verification we'll examine later).

2.2.2.3. Controlling bytecode verification

Bytecode verification seems like a great thing: not only can it help to prevent malicious attacks from violating rules of the Java language, it can also help detect simple programmer errors--such as when we changed the access modifier of acctNo in our CreditCard class, but forgot to recompile our Test class.

Nonetheless, bytecode verification is not used on all classes. Like many security-related features of Java, bytecode verification only applies to certain classes. In Java 1.1 and earlier, classes that are loaded from the CLASSPATH are deemed to be trusted and are not subject to bytecode verification, whereas classes that are loaded from another location (e.g., a file- or HTTP-based URL) are not deemed to be trusted and must be verified. In Java 1.2,[3] this policy has changed and all classes except those in the core Java API are verified. This difference really reflects the class loader that is used to load the class, as we'll see in the next chapter.

[3]1.2 is now Java 2.

In typical usage, this is a workable policy. Browsers always ensure that the code imported to run an applet is verified, and Java applications are typically not verified. Of course, this may or may not be the perfect solution:

If a remote site can talk an end user into installing a local class into the browser's CLASSPATH, the local class will not be verified and may violate the rules we've discussed here. In 1.2, this is much harder, since the class must be added to the JAR file containing the core API classes.
You may implicitly rely upon the verifier to help you keep files in sync so that when one is changed, other files are verified against it.

As a user, you (theoretically) have limited control over the verifier--though such control depends on the browser you are using. If you are running a Java application, you can run java with the -verify option, which will verify all classes. Similarly, if you are using a browser written in Java--including the appletviewer--you can arrange for the java command to run with the -noverify option, which turns verification off for all classes. Occasionally, a browser not written in Java will allow the user to disable bytecode verification as well--e.g., Internet Explorer^TM 3.0 for the Mac had this capability, although it was present only because the bytecode verifier could not run in certain limited memory configurations.

However, although these options to the virtual machine are well-documented, they are not implemented on all platforms. One way to ensure that application code is run through the bytecode verifier is to use the final version of the JavaRunner program (once we add a class loader to it in the next chapter) or the Launcher in Java 1.2.

2.2.3. Runtime Enforcement

Like the compiler, the bytecode verifier cannot completely guarantee that the bytecodes follow all of the rules we outlined earlier in this chapter: it can only ensure that the first four of them are followed. The virtual machine must still take responsibility for ultimately determining that the Java bytecodes provide the security we expect them to.

The remaining security protections of the Java language must be enforced at runtime by the virtual machine.

Array bounds checking

In theory, the bytecode verifier can detect certain cases of array bounds checking, but in general, this check must take place at runtime. Consider the following code:

Class Definition

void initArray(int a[], int nItems) {
	for (int i = 0; i < nItems; i++) {
		a[i] = 0;
	}
}

Since nItems and a are parameters, the bytecode verifier has no way of determining whether this code is legal. Hence, array bounds checking is always done at runtime. Failure to meet this rule results in an ArrayIndexOutOfBoundsException.

Object casting

The verifier can and will detect the legality of certain types of casts, specifically, whenever unrelated classes are cast to each other. The virtual machine must monitor when a superclass is cast into a subclass and test that cast's validity; failure to execute a legal cast results in a ClassCastException. This holds for casts involving interfaces as well, since objects that are defined as an interface type (rather than a class type) are considered by the verifier to be of type Object.