What's So Tough About Distributing Objects? (Java Distributed Computing)

3.2. What's So Tough About Distributing Objects?

OK, so we think distributing objects is a good idea, but why do distributed object systems like CORBA and, to a lesser degree, Java RMI, seem so big and complicated? In Chapter 2, "Networking in Java" we saw how the core Java API, especially the java.net and java.io packages, gives us easy access to the network and key network protocols. They also let us layer application-specific operations on top of the network pretty easily. It seems like all that we'd need to do is extend these packages to allow objects to invoke each other's methods over the network, and we'd have a basic distributed object system. To get a feeling for the complexity of distributed object systems, let's look at what it would take to put together one of our own using just the core Java API, without utilizing the RMI package or the object input/output streams in the java.io package.

3.2.1. Creating Remote Objects

The essential requirements in a distributed object system are the ability to create or invoke objects on a remote host or process, and interact with them as if they were objects within our own process. It seems logical that we would need some kind of message protocol for sending requests to remote agents to create new objects, to invoke methods on these objects, and to delete the objects when we're done with them. As we saw in Chapter 2, "Networking in Java", the networking support in the Java API makes it very easy to implement a message protocol. But what kinds of things does a message protocol have to do if it's supporting a distributed object system?

To create a remote object, we need to reference a class, provide constructor arguments for the class, and receive a reference to the created object in return. This object reference will be used to invoke methods on the object, and eventually to ask the remote agent to destroy the object when we are done with it. So the data we will need to send over the network include class references, object references, method references, and method arguments.

The first item is easy--we already saw in Chapter 2, "Networking in Java" how the ClassLoader can be used to send class definitions over the network. If we want to create a new remote object from a given class, we can send the class definition to the remote host, and tell it to build one using a default constructor. Object references require some thought, though. These are not the same as local Java object references. We need to have an object reference that we can package up and send over the network, i.e., one that's serializable. This object reference, once we receive it, will still need to refer back to the original object on the remote host, so that when we call methods on it the method invocations are deferred to the "source" object on the remote host. One simple way to implement remote object references is to build an object lookup table into the remote agent. When a client requests a new object, the remote agent builds the requested object, puts the object into the table, and sends the table index of the object to the client. If we use sockets and streams to send requests and object references back and forth between remote agents, a client might request a remote object with something like this:

Class myClass = Class.forName("Myclass");
Socket objConn = new Socket("object.server.net", 1234);
OutputStreamWriter out =
    new ObjectStreamWriter(objConn.getOutputStream());
DataInputStream in = new DataInputStream(objConn.getInputStream());

out.write("new " + myClass.getName());
int objRef = in.readInt();

The integer objRef returned by the remote server can be used to reference the new remote object. On the other end of the socket, the agent receiving the request for the remote object may handle the request like this:

Hashtable objTable = new Hashtable();
ServerSocket server = ...;
Socket conn;
// Accept the connection from the client
if ((conn = server.accept()) != null) {
    DataOutputStream out =
        new DataObjectStream(conn.getOutputStream());
    BufferedReader in = new BufferedReader(
        new InputStreamReader(conn.getInputStream()));
    String cmd = in.readLine();
    // Parse the command type from the command string
    if (parseCmd(cmd).compareTo("new") == 0) {
        // The client wants a new object created,
        // so parse the class name from the command string
        String classname = parseClass(cmd);
        // Create the Class object and make an instance
        Class reqClass = Class.forName(classname);
        Object obj = reqClass.newInstance();
        // Register the object and return the integer
        // identifier/reference to the client
        Integer objID = nextID();
        objTable.put(objID, obj);
        out.writeInt(objID.intValue());
    }
}

The object server reads the class name sent by the requestor, looks up the class using the static Class.forName() method, and creates a new instance of the class by calling the newInstance() method on the Class object. Once the object has been created, the server generates a unique identifier for the object and sends it back to the requestor. Note that we've already limited our remote object scheme, by forcing the use of default constructors, e.g., those with no arguments. The remote host creates the requested object by calling newInstance() on its class, which is equivalent to creating the object by calling the class constructor with no arguments. Since we don't (yet) have a way to specify methods on classes over the network, or a way to send arguments to these methods, we have to live with this limitation for now.

3.2.2. Remote Method Calls

Now that the requestor has a reference to an object on the remote host, it needs a way to invoke methods on the object. Since Java, as of JDK 1.1, allows us to query a class or object for its declared methods and data members, the local agent can get a direct reference to the method that it wants to invoke on the remote object, in the form of a Method object:

Class reqClass = Class.forName("Myclass");
Method reqMethod = reqClass.getDeclaredMethod("getName", null);

In this example, the local agent has retrieved a reference to a getName() method, with no arguments, on the class Myclass. It could now use this method reference to call the method on a local Myclass instance:

Myclass obj = new Myclass();
reqMethod.invoke(obj, null);

This may seem like a roundabout way to accomplish the same thing as calling obj.getName() on the Myclass object, and it is. But in order to call a method on our remote object, we need to send a reference to the method over the network to the remote host. One way to do this is to assign identifiers to all of the methods on the class, just like we did for remote objects. Since both the object requestor and the object server can get a list of the class's methods by calling the getDeclaredMethods() method on the class, we could simply use the index of the method in the returned list as its identifier. Then the object requestor can call a method on a remote object by simply sending the remote host the object's identifier, and the identifier for the method to call. Assuming that our local agent has the same object reference from the earlier example, the remote method call would look something like this:

Method reqMethod = reqClass.getDeclaredMethod("getName", null);
Method[] methodList = reqClass.getDeclaredMethods();
int methodIdx = 0;
for (int i = 0; i < methodList.length; i++) {
    if (reqMethod == methodList[i]) {
        methodIdx = i;
        break;
    }
}
String cmd = "call " + methodIdx + " on " + objRef;
out.writeUTF(cmd);

This approach to handling remote method invocation is a general one; it will work for any class that we want to distribute. So far so good. But what about the arguments to the remote methods? And what about the return value from the remote method? Our example used a getName() method with no arguments, but if the method does take arguments, we'll need to send these to the remote host as well. We can also assume that a method called "getName" will probably return some kind of String value, which we'll need to get back from the remote host. This same problem exists in the creation of the remote object. With our method reference scheme we can now specify which constructor to use when the remote host creates our object, but we still need a way to send the constructor arguments to the remote host.

By now this exercise is beginning to look a lot more serious than we might have expected. In distributed object systems, the task of packaging up method arguments for delivery to the remote object, and the task of gathering up method return values for the client, are referred to as data marshaling. One approach we can take to data marshaling is to turn every object argument in a remote method call into a remote object just like we did previously, by generating an object reference and sending that to the remote agent as the method argument. If the method returns an object value, then the remote host can generate a new object reference and send that back to the local host. So now the remote host and the local host are acting as both object servers and object requestors. We started out with the remote host creating objects for the local host to invoke methods on, but now the local host is "serving" objects for method arguments, and the remote host is serving a bunch of new objects for method return values. And if the remote host needs to call any methods on objects that are arguments to other methods, or if the local host needs to call methods on object return values, then we'll need to send method references back and forth for these remote method calls as well.

To further complicate matters, we also have to worry about situations where you don't want a remote object reference sent as the method argument. In some cases, you may want to send objects by copy rather than by reference. In other words, you may just need to send the value of the object from one host to another, and not want changes to an object propagated back to the original source object. How do we serialize and transmit an object's value to a remote agent? One way is to tell the other agent to create a new object of the right type, as we did to create our original remote object, and then indicate the new object as the method argument or return value.

3.2.3. Other Issues

Our hypothetical remote object scheme, using object tables, integer object references based on table location, and integer method references based on the method's index/position in the class definition, is a bit ad-hoc and not very elegant. It will work, but probably not very well. For one thing, it is not very scalable in terms of development complexity and runtime complexity. Each agent on the network is maintaining its own object table and its own set of object identifiers. Each remote method call could potentially generate more entries in the object tables on both ends of the call, for method arguments and for method return values. And since there's no guarantee that two agents won't use the same identifier for two different objects, each agent using remote objects will need to keep its own table of remote object identifiers and the agent they came from. So now each agent has to maintain two object reference tables: one for objects that it is serving to other agents, and another for objects that it is using remotely. A more elegant way to handle this would be to create a naming service for objects, where an agent serving an object could register its objects with the naming service and generate a unique name/address for the object. The naming service would be responsible for mapping named objects to where they actually live. Users of the object could then find the object with one name, rather than a combination of an object ID and the object's host.

Another issue with this remote object scheme is the distribution of workload across the distributed system. In returning an object by value as the result of a method call, for example, the object server instructs the client to create the returned object value. The creation of this object could be a significant effort, depending on the type of object. Under normal, non-distributed conditions the creation of the return value is considered a part of the overhead of calling the method. You would hope that when you invoke this method remotely, all of the overhead, including the creation of the return value, would be off-loaded to the remote host. Instead, we're pushing some of the work to the remote host and keeping some locally. The same issue comes up when an agent invokes a remote method and passes method arguments by value instead of by reference. The calling agent tells the serving agent to create the method argument values on its side, which increases the net overhead on the server side for the remote method call.

Hopefully this extended thought experiment has highlighted some of the serious issues that arise when trying to distribute objects over the network. In the next section, we'll look at the features that a distributed object system needs to have in order to address these issues.