Chapter 1. Introduction

For the past decade, "distributed computing" has been one of the biggest buzz phrases in the computer industry. At this point in the information age, we know how to build networks; we use thousands of engineering workstations and personal computers to do our work, instead of huge behemoths in glass-walled rooms. Surely we ought to be able to use our networks of smaller computers to work together on larger tasks. And we do--an act as simple as reading a web page requires the cooperation of two computers (a client and a server) plus other computers that make sure the data gets from one location to the other. However, simple browsing (i.e., a largely one-way data exchange) isn't what we usually mean when we talk about distributed computing. We usually mean something where there's more interaction between the systems involved.

You can think about distributed computing in terms of breaking down an application into individual computing agents that can be distributed on a network of computers, yet still work together to do cooperative tasks. The motivations for distributing an application this way are many. Here are a few of the more common ones:

Computing things in parallel by breaking a problem into smaller pieces enables you to solve larger problems without resorting to larger computers. Instead, you can use smaller, cheaper, easier-to-find computers.
Large data sets are typically difficult to relocate, or easier to control and administer located where they are, so users have to rely on remote data servers to provide needed information.
Redundant processing agents on multiple networked computers can be used by systems that need fault tolerance. If a machine or agent process goes down, the job can still carry on.

There are many other motivations, and plenty of subtle variations on the ones listed here.

Assorted tools and standards for assembling distributed computing applications have been developed over the years. These started as low-level data transmission APIs and protocols, such as RPC and DCE, and have recently begun to evolve into object-based distribution schemes, such as CORBA, RMI, and OpenDoc. These programming tools essentially provide a protocol for transmitting structured data (and, in some cases, actual runnable code) over a network connection. Java offers a language and an environment that encompass various levels of distributed computing development, from low-level network communication to distributed objects and agents, while also having built-in support for secure applications, multiple threads of control, and integration with other Internet-based protocols and services.

This chapter gives an introduction to distributed application development, and how Java can be used as a tool towards this end. In the following chapters, we'll start by reviewing some essential background material on network programming, threads, and security. Then we'll move into a series of chapters that explore different distributed problems in detail. Where appropriate, we'll use RMI, CORBA, or a homegrown protocol to implement examples. If you are developing distributed applications, you need to be familiar with all possible solutions and where they're appropriate; so where we choose a particular tool, we'll try to discuss how things would be better or worse if you chose a different set of tools in building something similar.

1.1. Anatomy of a Distributed Application

A distributed application is built upon several layers. At the lowest level, a network connects a group of host computers together so that they can talk to each other. Network protocols like TCP/IP let the computers send data to each other over the network by providing the ability to package and address data for delivery to another machine. Higher-level services can be defined on top of the network protocol, such as directory services and security protocols. Finally, the distributed application itself runs on top of these layers, using the mid-level services and network protocols as well as the computer operating systems to perform coordinated tasks across the network.

At the application level, a distributed application can be broken down into the following parts:

Processes: A typical computer operating system on a computer host can run several processes at once. A process is created by describing a sequence of steps in a programming language, compiling the program into an executable form, and running the executable in the operating system. While it's running, a process has access to the resources of the computer (such as CPU time and I/O devices) through the operating system. A process can be completely devoted to a particular application, or several applications can use a single process to perform tasks.
Threads: Every process has at least one thread of control. Some operating systems support the creation of multiple threads of control within a single process. Each thread in a process can run independently from the other threads, although there is usually some synchronization between them. One thread might monitor input from a socket connection, for example, while another might listen for user events (keystrokes, mouse movements, etc.) and provide feedback to the user through output devices (monitor, speakers, etc.). At some point, input from the input stream may require feedback from the user. At this point, the two threads will need to coordinate the transfer of input data to the user's attention.
Objects: Programs written in object-oriented languages are made up of cooperating objects. One simple definition of an object is a group of related data, with methods available for querying or altering the data (getName(), set-Name()), or for taking some action based on the data (sendName(Out-putStreamo)). A process can be made up of one or more objects, and these objects can be accessed by one or more threads within the process. And with the introduction of distributed object technology like RMI and CORBA, an object can also be logically spread across multiple processes, on multiple computers.
Agents: For the sake of this book, we will use the term "agent" as a general way to refer to significant functional elements of a distributed application.[1] While a process, a thread, and an object are pretty well-defined entities, an agent (at least the definition we'll use for the sake of this book) is a higher-level system component, defined around a particular function, or utility, or role in the overall system. A remote banking application, for example, might be broken down into a customer agent, a transaction agent and an information brokerage agent. Agents can be distributed across multiple processes, and can be made up of multiple objects and threads in these processes. Our customer agent might be made up of an object in a process running on a client desktop that's listening for data and updating the local display, along with an object in a process running on the bank server, issuing queries and sending the data back to the client. There are two objects running in distinct processes on separate machines, but together we can consider them to make up one customer agent, with client-side elements and server-side elements.

[1]The term "agent" is overused in the technology community. In the more formal sense of the word, an agent is a computing entity that is a bit more intelligent and autonomous than an object. An agent is supposed to be capable of having goals that it needs to accomplish, such as retrieving information of a certain type from a large database or remote data sources. Some agents can monitor their progress towards achieving their goals at a higher level than just successful execution of methods, like an object. The definition of agent that we're using here is a lot less formal than this, and a bit more general.

So a distributed application can be thought of as a coordinated group of agents working to accomplish some goal. Each of these agents can be distributed across multiple processes on remote hosts, and can consist of multiple objects or threads of control. Agents can also belong to more than one application at once. You may be developing an automated teller machine application, for example, which consists of an account database server, with customer request agents distributed across the network submitting requests. The account server agent and the customer request agents are agents within the ATM application, but they might also serve agents residing at the financial institution's headquarters, as part of an administrative application.

Chapter 1. Introduction

Contents:

1.1. Anatomy of a Distributed Application