[Chapter 1] Getting Started with Java

When it was introduced in late 1995, Java took the Internet by storm. Java 1.1, released in early 1997, nearly doubles the speed of the Java interpreter and includes many important new features. With the addition of APIs to support database access, remote objects, an object component model, internationalization, printing, encryption, digital signatures, and many other technologies, Java is now poised to take the rest of the programming world by storm.

Despite all the hype surrounding Java and the new features of Java 1.1, it's important to remember that at its core, Java is just a programming language, like many others, and its APIs are just class libraries, like those of other languages. What is interesting about Java, and thus the source of much of the hype, is that it has a number of important features that make it ideally suited for programming in the heavily networked, heterogenous world of the late 1990s. The rest of this chapter describes those interesting features of Java and demonstrates some simple Java code. Chapter 4, What's New in Java 1.1 explores the new features that have been added to version 1.1 of the Java API.

1.1 Why Is Java Interesting?

In one of their early papers about the language, Sun described Java as follows:

Java: A simple, object-oriented, distributed, interpreted, robust, secure, architecture neutral, portable, high-performance, multithreaded, and dynamic language.

Sun acknowledges that this is quite a string of buzzwords, but the fact is that, for the most part, they aptly describe the language. In order to understand why Java is so interesting, let's take a look at the language features behind the buzzwords.

Object-Oriented

Java is an object-oriented programming language. As a programmer, this means that you focus on the data in your application and methods that manipulate that data, rather than thinking strictly in terms of procedures. If you're accustomed to procedure-based programming in C, you may find that you need to change how you design your programs when you use Java. Once you see how powerful this new paradigm is, however, you'll quickly adjust to it.

In an object-oriented system, a class is a collection of data and methods that operate on that data. Taken together, the data and methods describe the state and behavior of an object. Classes are arranged in a hierarchy, so that a subclass can inherit behavior from its superclass. A class hierarchy always has a root class; this is a class with very general behavior.

Java comes with an extensive set of classes, arranged in packages, that you can use in your programs. For example, Java provides classes that create graphical user interface components (the java.awt package), classes that handle input and output (the java.io package), and classes that support networking functionality (the java.net package). The Object class (in the java.lang package) serves as the root of the Java class hierarchy.

Unlike C++, Java was designed to be object-oriented from the ground up. Most things in Java are objects; the primitive numeric, character, and boolean types are the only exceptions. Strings are represented by objects in Java, as are other important language constructs like threads. A class is the basic unit of compilation and of execution in Java; all Java programs are classes.

While Java is designed to look like C++, you'll find that Java removes many of the complexities of that language. If you are a C++ programmer, you'll want to study the object-oriented constructs in Java carefully. Although the syntax is often similar to C++, the behavior is not nearly so analogous. For a complete description of the object-oriented features of Java, see Chapter 3, Classes and Objects in Java.

Interpreted

Java is an an interpreted language: the Java compiler generates byte-codes for the Java Virtual Machine (JVM), rather than native machine code. To actually run a Java program, you use the Java interpreter to execute the compiled byte-codes. Because Java byte-codes are platform-independent, Java programs can run on any platform that the JVM (the interpreter and run-time system) has been ported to.

In an interpreted environment, the standard "link" phase of program development pretty much vanishes. If Java has a link phase at all, it is only the process of loading new classes into the environment, which is an incremental, lightweight process that occurs at run-time. This is in contrast with the slower and more cumbersome compile-link-run cycle of languages like C and C++.

Architecture Neutral and Portable

Because Java programs are compiled to an architecture neutral byte-code format, a Java application can run on any system, as long as that system implements the Java Virtual Machine. This is a particularly important for applications distributed over the Internet or other heterogenous networks. But the architecture neutral approach is useful beyond the scope of network-based applications. As an application developer in today's software market, you probably want to develop versions of your application that can run on PCs, Macs, and UNIX workstations. With multiple flavors of UNIX, Windows 95, and Windows NT on the PC, and the new PowerPC Macintosh, it is becoming increasingly difficult to produce software for all of the possible platforms. If you write your application in Java, however, it can run on all platforms.

The fact that Java is interpreted and defines a standard, architecture neutral, byte-code format is one big part of being portable. But Java goes even further, by making sure that there are no "implementation-dependent" aspects of the language specification. For example, Java explicitly specifies the size of each of the primitive data types, as well as its arithmetic behavior. This differs from C, for example, in which an int type can be 16, 32, or 64 bits long depending on the platform.

While it is technically possible to write non-portable programs in Java, it is relatively easy to avoid the few platform-dependencies that are exposed by the Java API and write truly portable or "pure" Java programs. Sun's new "100% Pure Java" program helps developers ensure (and certify) that their code is portable. Programmers need only to make simple efforts to avoid non-portable pitfalls in order to live up to Sun's trademarked motto "Write Once, Run Anywhere."

Dynamic and Distributed

Java is a dynamic language. Any Java class can be loaded into a running Java interpreter at any time. These dynamically loaded classes can then be dynamically instantiated. Native code libraries can also be dynamically loaded. Classes in Java are represented by the Class class; you can dynamically obtain information about a class at run-time. This is especially true in Java 1.1, with the addition of the Reflection API, which is introduced in Chapter 12, Reflection.

Java is also called a distributed language. This means, simply, that it provides a lot of high-level support for networking. For example, the URL class and OArelated classes in the java.net package make it almost as easy to read a remote file or resource as it is to read a local file. Similarly, in Java 1.1, the Remote Method Invocation (RMI) API allows a Java program to invoke methods of remote Java objects, as if they were local objects. (Java also provides traditional lower-level networking support, including datagrams and stream-based connections through sockets.)

The distributed nature of Java really shines when combined with its dynamic class loading capabilities. Together, these features make it possible for a Java interpreter to download and run code from across the Internet. (As we'll see below, Java implements strong security measures to be sure that this can be done safely.) This is what happens when a Web browser downloads and runs a Java applet, for example. Scenarios can be more complicated than this, however. Imagine a multi-media word processor written in Java. When this program is asked to display some type of data that it has never encountered before, it might dynamically download a class from the network that can parse the data, and then dynamically download another class (probably a Java "bean") that can display the data within a compound document. A program like this uses distributed resources on the network to dynamically grow and adapt to the needs of its user.

Simple

Java is a simple language. The Java designers were trying to create a language that a programmer could learn quickly, so the number of language constructs has been kept relatively small. Another design goal was to make the language look familiar to a majority of programmers, for ease of migration. If you are a C or C++ programmer, you'll find that Java uses many of the same language constructs as C and C++.

In order to keep the language both small and familiar, the Java designers removed a number of features available in C and C++. These features are mostly ones that led to poor programming practices or were rarely used. For example, Java does not support the goto statement; instead, it provides labelled break and continue statements and exception handling. Java does not use header files and it eliminates the C preprocessor. Because Java is object-oriented, C constructs like struct and union have been removed. Java also eliminates the operator overloading and multiple inheritance features of C++.

Perhaps the most important simplification, however, is that Java does not use pointers. Pointers are one of the most bug-prone aspects of C and C++ programming. Since Java does not have structures, and arrays and strings are objects, there's no need for pointers. Java automatically handles the referencing and dereferencing of objects for you. Java also implements automatic garbage collection, so you don't have to worry about memory management issues. All of this frees you from having to worry about dangling pointers, invalid pointer references, and memory leaks, so you can spend your time developing the functionality of your programs.

If it sounds like Java has gutted C and C++, leaving only a shell of a programming language, hold off on that judgment for a bit. As we'll see in Chapter 2, How Java Differs from C, Java is actually a full-featured and very elegant language.

Robust

Java has been designed for writing highly reliable or robust software. Java certainly doesn't eliminate the need for software quality assurance; it's still quite possible to write buggy software in Java. However, Java does eliminate certain types of programming errors, which makes it considerably easier to write reliable software.

Java is a strongly typed language, which allows for extensive compile-time checking for potential type-mismatch problems. Java is more strongly typed than C++, which inherits a number of compile-time laxities from C, especially in the area of function declarations. Java requires explicit method declarations; it does not support C-style implicit declarations. These stringent requirements ensure that the compiler can catch method invocation errors, which leads to more reliable programs.

One of the things that makes Java simple is its lack of pointers and pointer arithmetic. This feature also increases the robustness of Java programs by abolishing an entire class of pointer-related bugs. Similarly, all accesses to arrays and strings are checked at run-time to ensure that they are in bounds, eliminating the possibility of overwriting memory and corrupting data. Casts of objects from one type to another are also checked at run-time to ensure that they are legal. Finally, and very importantly, Java's automatic garbage collection prevents memory leaks and other pernicious bugs related to memory allocation and deallocation.

Exception handling is another feature in Java that makes for more robust programs. An exception is a signal that some sort of exceptional condition, such as a "file not found" error, has occurred. Using the try/catch/finally statement, you can group all of your error handling code in one place, which greatly simplifies the task of error handling and recovery.

Secure

One of the most highly touted aspects of Java is that it's a secure language. This is especially important because of the distributed nature of Java. Without an assurance of security, you certainly wouldn't want to download code from a random site on the Internet and let it run on your computer. Yet this is exactly what people do with Java applets every day. Java was designed with security in mind, and provides several layers of security controls that protect against malicious code, and allow users to comfortably run untrusted programs such as applets.

At the lowest level, security goes hand-in-hand with robustness. As we've already seen, Java programs cannot forge pointers to memory, or overflow arrays, or read memory outside of the bounds of an array or string. These features are one of Java's main defenses against malicious code. By totally disallowing any direct access to memory, an entire huge, messy class of security attacks is ruled out.

The second line of defense against malicious code is the byte-code verification process that the Java interpreter performs on any untrusted code it loads. These verification steps ensure that the code is well-formed--that it doesn't overflow or underflow the stack or contain illegal byte-codes, for example. If the byte-code verification step was skipped, inadvertently corrupted or maliciously crafted byte-codes might be able to take advantage of implementation weaknesses in a Java interpreter.

Another layer of security protection is commonly referred to as the "sandbox model": untrusted code is placed in a "sandbox," where it can play safely, without doing any damage to the "real world," or full Java environment. When an applet, or other untrusted code, is running in the sandbox, there are a number of restrictions on what it can do. The most obvious of these restrictions is that it has no access whatsoever to the local file system. There are a number of other restrictions in the sandbox as well. These restrictions are enforced by a SecurityManager class. The model works because all of the core Java classes that perform sensitive operations, such as filesystem access, first ask permission of the currently installed SecurityManager. If the call is being made, directly or indirectly, by untrusted code, the security manager throws an exception, and the operation is not permitted. See Chapter 6, Applets for a complete list of the restrictions placed on applets running in the sandbox.

Finally, in Java 1.1, there is another possible solution to the problem of security. By attaching a digital signature to Java code, the origin of that code can be established in a cryptographically secure and unforgeable way. If you have specified that you trust a person or organization, then code that bears the digital signature of that trusted entity is trusted, even when loaded over the network, and may be run without the restrictions of the sandbox model.

Of course, security isn't a black-and-white thing. Just as a program can never be guaranteed to be 100% bug-free, no language or environment can be guaranteed 100% secure. With that said, however, Java does seem to offer a practical level of security for most applications. It anticipates and defends against most of the techniques that have historically been used to trick software into misbehaving, and it has been intensely scrutinized by security experts and hackers alike. Some security holes were found in early versions of Java, but these flaws were fixed almost as soon as they were found, and it seems reasonable to expect that any future holes will be fixed just as quickly.

High-Performance

Java is an interpreted language, so it is never going to be as fast as a compiled language like C. Java 1.0 was said to be about 20 times slower than C. Java 1.1 is nearly twice as fast as Java 1.0, however, so it might be reasonable to say that compiled C code runs ten times as fast as interpreted Java byte-codes. But before you throw up your arms in disgust, be aware that this speed is more than adequate to run interactive, GUI and network-based applications, where the application is often idle, waiting for the user to do something, or waiting for data from the network. Furthermore, the speed-critical sections of the Java run-time environment, that do things like string concatenation and comparison, are implemented with efficient native code.

As a further performance boost, many Java interpreters now include "just in time" compilers that can translate Java byte-codes into machine code for a particular CPU at run-time. The Java byte-code format was designed with these "just in time" compilers in mind, so the process of generating machine code is fairly efficient and it produces reasonably good code. In fact, Sun claims that the performance of byte-codes converted to machine code is nearly as good as native C or C++. If you are willing to sacrifice code portability to gain speed, you can also write portions of your program in C or C++ and use Java native methods to interface with this native code.

When you are considering performance, it's important to remember where Java falls in the spectrum of available programming languages. At one end of the spectrum, there are high-level, fully-interpreted scripting languages such as Tcl and the UNIX shells. These languages are great for prototyping and they are highly portable, but they are also very slow. At the other end of the spectrum, you have low-level compiled languages like C and C++. These languages offer high performance, but they suffer in terms of reliability and portability. Java falls in the middle of the spectrum. The performance of Java's interpreted byte-codes is much better than the high-level scripting languages (even Perl), but it still offers the simplicity and portability of those languages.

Multithreaded

In a GUI-based network application such as a Web browser, it's easy to imagine multiple things going on at the same time. A user could be listening to an audio clip while she is scrolling a page, and in the background the browser is downloading an image. Java is a multithreaded language; it provides support for multiple threads of execution (sometimes called lightweight processes) that can handle different tasks. An important benefit of multithreading is that it improves the interactive performance of graphical applications for the user.

If you have tried working with threads in C or C++, you know that it can be quite difficult. Java makes programming with threads much easier, by providing built-in language support for threads. The java.lang package provides a Thread class that supports methods to start and stop threads and set thread priorities, among other things. The Java language syntax also supports threads directly with the synchronized keyword. This keyword makes it extremely easy to mark sections of code or entire methods that should only be run by a single thread at a time.

While threads are "wizard-level" stuff in C and C++, their use is commonplace in Java. Because Java makes threads so easy to use, the Java class libraries require their use in a number of places. For example, any applet that performs animation does so with a thread. Similarly, Java does not support asynchronous, non-blocking I/O with notification through signals or interrupts--you must instead create a thread that blocks on every I/O channel you are interested in.