Chapter 2. Java Language Security

The first components of the Java sandbox that we will examine are those components that are built into the Java language itself. These components primarily protect memory resources on the user's machine, although they have some benefit to the Java API as well. Hence, they are primarily concerned with guaranteeing the integrity of the memory of the machine that is hosting a program: in a nutshell, the security features within the Java language want to ensure that a program will be unable to discern or modify sensitive information that may reside in the memory of a user's machine. In terms of applets, these protections also mean that applets will be unable to determine information about each other; each applet is given, in essence, its own memory space in which to operate.

In this chapter, we'll look at the features of the Java language that provide this type of security. We'll also look at how these features are enforced, including a look at Java's bytecode verifier. With a few exceptions, the information in this chapter is largely informational; because the features we are going to discuss are immutable within the Java language, there are fewer programming considerations than we'll find in later chapters. However, the information we'll present here is crucial in understanding the entire Java security story; it is very helpful in ensuring that your Java environment is secure and in assessing the security risks that Java deployment might pose. The security of the Java environment is dependent on the security of each of its pieces, and the Java language forms the first fundamental piece of that security.

As we discuss the language features in this chapter, keep in mind that we're only dealing with the Java language itself--as is the common thread of this book, all security features we're going to discuss do not apply when the language in question is not Java. If you use Java's native interface to run arbitrary C code, that C code will be able to do pretty much anything it wants to do, even when it violates the precepts we're outlining in this chapter.

2.1. Java Language Security Constructs

In this chapter, we're going to be concerned primarily with how Java operates on things that are in memory on a particular machine. Within a Java program, every entity--that is, every object reference and every primitive data element--has an access level associated with it. To review, this access level may be:

private: The entity can only be accessed by code that is contained within the class that defines the entity.
Default (or package): The entity can be accessed by code that is contained within the class that defines the entity, or by a class that is contained in the same package as the class that defines the entity.
protected: The entity can only be accessed by code that is contained within the class that defines the entity, by classes within the same package as the defining class, or by a subclass of the defining class.
public: The entity can be accessed by code in any class.

The notion of assigning data entities an access level is certainly not exclusive to Java; it's a hallmark of many object-oriented languages. Since the Java language borrows heavily from C++, it's not surprising that it would borrow the basic notion of these access levels from C++ as well (although there are slight differences between the meanings of these access modifiers in Java and in C++).

As a result of this borrowing, the use of these access modifiers is generally thought of in terms of the advantage such modifiers bring to program design: one of the hallmarks of object-oriented design is that it permits data hiding and data encapsulation. This encapsulation ensures that objects may only be operated upon through the interface the object provides to the world, instead of being operated upon by directly manipulating the object's data elements. These and other design-related advantages are indeed important in developing large, robust, object-oriented systems. But in Java, these advantages are only part of the story.

In a language like C++, if I create a CreditCard object that encapsulates my mother's maiden name and my account number, I would probably decide that those entities should be private to the object and provide the appropriate methods to operate on those entities. But nothing in C++ prevents me from cheating and accessing those entities through a variety of back-door operations. The C++ compiler is likely to complain if I write code that attempts to access a private variable of another class, but the C++ runtime isn't going to care if I convert a pointer to that class into an arbitrary memory pointer and start scanning through memory until I find a location that contains a string with 16 digits--a possible account number. In C++ systems, no one typically worried about such occurrences because all parts of the system were presumed to originate from the same place: it's my program, and if I want to work around my data model to get access to that data, so be it.[1]

[1]In a large project with multiple programmers, there's a strong argument that such an attitude on the part of an individual programmer is not to be dismissed so lightly, but we'll let that pass.

Things change with Java. I might be surfing to play some cool game applet on www.EvilSite.org, and then I might go shopping at www.Acme.com. When my Java wallet applet runs, I'd hate for the applet that is still running from www.EvilSite.org to be able to access the private CreditCard object that's contained in my Java wallet--and while it's necessary for www.Acme.com to know that I have a valid CreditCard object, I don't necessarily feel comfortable telling them my mother's maiden name. Because I'm now in the midst of a dynamic system with active programs from multiple sites, I need to make sure that the data entities are accessed by only those objects that are supposed to have access to them. It's obvious that I want protection from EvilSite.org, whom I don't want to know about the CreditCard object contained in my Java wallet. But I also want to be protected from Acme.com, a site I feel relatively comfortable about, but who should not be granted access to all the data elements of an object that it must use.

This is only one example of why the Java platform must provide memory integrity--that is, it must ensure that entities in memory are accessed only when they are allowed to be, and that these entities cannot be somehow corrupted. To that end, Java always enforces the following rules:

Access methods are strictly adhered to.

In Java, you cannot be allowed to treat a private entity as anything but private: the intentions of the programmer must always be respected. Object serialization involves an exception to this rule; we'll give more details about that a little bit later.

Programs cannot access arbitrary memory locations.

This is easy to ensure, as Java does not have the notion of a pointer. For example, casting between an int and an Object is strictly illegal in Java.

Entities that are declared as final must not be changed.

Final variables in Java are considered constants; they are immutable once they are initialized. Consider the havoc that could ensue if the final modifier were not respected:

A publicfinal variable could be changed, drastically altering the behavior of a program. If a rogue applet swapped the values of the variables EAST and WEST in the GridBagConstraints class, for example, any new applets would be laid out incorrectly (and probably incomprehensibly). That's a rather benign example of what could potentially be a dramatic security flaw.
A subclass could override a final method, altering the behavior of a class. One of the features of the Java API is that threads are not allowed to raise their priority above a certain maximum priority (typically, the priority of the thread group to which the thread belongs). This feature is enforced by the setPriority() method of the Thread class, which is a final method; allowing that method to be overridden would defeat the security mechanisms.

This feature is used for virtually all of Java's security checks: performing an operation requires calling a final method in a Java class; only that final method can trap into the operating system to execute the operation. That final method is responsible for making sure the operation does not proceed if it would violate the security policy in place.
A subclass could be created from a final class, with similar results. In Java, strings are considered as constants--their value may not be changed once the string has been created. If the String class could be subclassed, this rule could not be enforced.

Variables may not be used before they are initialized.

If a program were able to read the value of an uninitialized variable, the effect would be the same as if it were able to read random memory locations. A Java class wishing to exploit this defect might then declare a huge uninitialized section of variables in an attempt to snoop the memory contents of the user's machine. To prevent this type of attack, all local variables in Java must be initialized before they are used, and all instance variables in Java are automatically initialized to a default value.

Array bounds must be checked on all array accesses.

Like the access modifiers that started this discussion, bounds checking is generally thought of in terms other than security: the prime benefit to bounds checking is that it leads to fewer bugs and more robust programs. But it has security benefits as well: if an array of integers happens to reside in memory next to a string (which, in memory, is an array of characters), writing past the end of the array of integers would change the value of the string. The effect of this is generally a bug, but it could be exploited as a security hole as well: if the string held the destination account number for an electronic funds transfer, we could change the destination account number by willfully writing past the end of the array of integers.[2]

[2]This type of attack is not as far-fetched as it might seem; an early version of Netscape Navigator suffered from just this type of security hole. When long URLs were typed into the Goto field, the Netscape C code that read the string overwrote the bounds of the array where the characters were to be stored and clobbered a key location in memory, which allowed a security breach.

Objects cannot be arbitrarily cast into other objects.

Given the class fragment:

Class Definition

public class CreditCard {
	private String acctNo;
}

and the rogue class:

Class Definition

public class CreditCardSnoop {
	public String acctNo;
}

then the following code cannot be allowed to execute:

Class Definition

CreditCard cc = Wallet.getCreditCard();
CreditCardSnoop snoop = (CreditCardSnoop) cc;
System.out.println("Ha!  Your account number is " + snoop.acctNo);

Hence, Java does not allow arbitrary casting between objects; an object can only be cast to one of its superclasses or its subclasses (if, in the latter case, the object actually is an instance of that subclass). Note that the Java virtual machine is much stricter about this rule than the Java compiler is. In the example above, the compiler would complain about an illegal cast. We could satisfy the compiler by changing the code as follows:

Class Definition

Object cc = Wallet.getCreditCard();
CreditCardSnoop snoop = (CreditCardSnoop) cc;

Only the virtual machine will know if the returned object actually is of type CreditCard or not. In this case, then, the virtual machine is responsible for throwing a ClassCastException when the snoop variable is assigned to thwart the attack.

These are the techniques by which the Java language ensures that memory locations are read and written only when such access should normally be allowed. This restriction protects the user's machine from the outside: if I download an applet onto my machine, I don't want that applet accessing the private variables of my CreditCard class. However, if that applet has a private variable within it, nothing prevents me (depending on my operating system) from using a program outside of the browser to scan the memory on my system and figure out somehow what value that particular variable has. Similarly, nothing prevents me from having another program outside the browser change the value of a particular variable that is held in memory on my machine.

If you're an applet developer and are worried about this type of problem, you're pretty much on your own to come up with a solution to it. This might be particularly troublesome if you had, say, a variable somewhere in your applet that held a Boolean value indicating whether or not the user was licensed for a particular operation; a very clever user can go outside the browser and manipulate the machine's memory so that the integrity of your licensing scheme is violated. This problem is not new to Java, but it's not solved by Java either.

2.1.1. Object Serialization and Memory Integrity

There is one general exception to the rules about public, private, and protected access in Java. Object serialization is a feature of Java that allows an object to be written as a series of bytes; when those bytes are read someplace else, a new object is created that has the same state as the original object. Object serialization has two main purposes: it's used extensively in the RMI API to allow clients and servers to exchange objects, and it's used whenever you need to save a particular object to disk and want to recreate the object at some later point in time.

The murky issue here is just what constitutes an object's state. In the case of our CreditCard object, the account number is pretty basic to creating that object, but it's a variable that needs to be private for the reasons we've been discussing. In order for object serialization to work, it must have access to those private variables so it can correctly save and restore the object's state. That's why the object serialization API can access and save all private variables of an object (as well as its default, protected, and public variables). Similarly, the object serialization API is able to store those values back into the private data members when the object is actually reconstituted.

Depending on your perspective, this is a good thing or a bad thing. From a security perspective, it can be a bad thing: if the CreditCard object is saved to disk, something else can come along and read all that information from the disk file. Worse yet, the file could be edited in such a way that the object will be recreated in a completely different state than it originally had, with potentially damaging results.

In theory, this is the same problem we just discussed about influences outside the browser being able to read and write the private data of objects that are held in memory (which may help to explain why object serialization works this way by default). In practice, however, it's much easier to change the data in a binary file than to figure out how to access and change the value of an object in memory. Hence, object serialization has two additional mechanisms associated with it that make it more secure.

The first of these is that object serialization can only occur on objects that implement the java.io.Serializable interface (or its subclass, the java.io.Externalizable interface). The Serializable interface requires no methods, so it can be thought of simply as a flag to the virtual machine that says: "Hey, virtual machine--I've thought about the security aspects of this class, and it's okay if you serialize it by writing out all its data." By default, an object is not serializable, lest its internal private state be violated.

The second of these mechanisms is that object serialization respects the transient keyword associated with a variable: if our account number in the CreditCard class were declared as private transient, then object serialization would not be allowed to read or write that particular variable. This lets us design classes that can be stored and reconstituted without showing their private data to the world.

Of course, a CreditCard object without an account number is worthless; what we really need is something that can save and reconstitute the transient data in such a way that the data can't be compromised. This can be achieved by having our class implement the writeObject() and readObject() methods. The writeObject() method is responsible for writing out all data in the class; it typically uses the defaultWriteObject() method to write out all non-transient data, and then it writes the transient data out in any format it desires. Similarly, the readObject() method uses the defaultReadObject() method to read the data and then must restore the corresponding transient data. It's your decision how to save and reconstitute the transient data so that its integrity is preserved, but this will mean that you'll want to use one of the encryption APIs we'll discuss in Chapter 13, "Encryption".

Storing and reconstituting the transient data can also be achieved by implementing the Externalizable interface and implementing the writeExternal() and the readExternal() methods of that interface. The difference in this case is that these two methods are now responsible for saving and reconstituting the entire state of the object--no data can be stored or reconstituted by any default methods.

Using either of these techniques, you have the ability to protect any sensitive data contained in your objects, even if you choose to share those objects over the network or save those objects to some sort of persistent storage.

Chapter 2. Java Language Security

Contents:

2.1. Java Language Security Constructs

Class Definition

Class Definition

Class Definition

Class Definition

2.1.1. Object Serialization and Memory Integrity