Improved Performance with Session Beans (Enterprise JavaBeans)

9.3. Improved Performance with Session Beans

In addition to defining the interactions among entity beans and other resources (workflow), session beans have another substantial benefit: they improve performance. The performance gains from using session beans are related to the concept of granularity. Granularity describes the scope of a business component, or how much business territory the component covers. As you learned previously, very fine-grained dependent business objects are usually modeled as pass-by-value objects. At a small granularity, you are dealing with entity beans like Ship or Cabin. These have a scope limited to a single concept and can only impact the data associated with that concept. Session beans represent large, coarse-grained components with a scope that covers several business concepts--all the business concepts or processes that the bean needs in order to accomplish a task. In distributed business computing, you rely on fine-grained components like entity beans to ensure simple, uniform, reusable, and safe access to data. Coarse-grained business components like session beans capture the interactions of entities or business processes that span multiple entities so that they can be reused; in doing so, they also improve performance on both the client and the server. As a rule of thumb, client applications should do most of their work with coarse-grained components like session beans, and with limited direct interaction with entity beans.

To understand how session beans improve performance, we have to address the most common problems cited with distributed component systems: network traffic, latency, and resource consumption.

9.3.1. Network Traffic and Latency

One of the biggest problems of distributed component systems is that they generate a lot of network traffic. This is especially true of component systems that rely solely on entity-type business components, such as EJB's EntityBean component. Every method call on a remote reference begins a remote method invocation loop, which sends information from the stub to the server and back to the stub. The loop requires data to be streamed to and from the client, consuming bandwidth. If we built a reservation system for Titan Cruise Lines, we would probably use several entity beans like Ship, Cabin, Cruise, and Customer. As we navigate through these fine-grained beans, requesting information, updating their states, and creating new beans, we generate network traffic. One client probably doesn't generate very much traffic, but multiply that by thousands of clients and we start to develop problems. Eventually, thousands of clients will produce so much network traffic that the system as a whole will suffer.

Another aspect of network communications is latency. Latency is the delay between the time we execute a command and the time it completes. With enterprise beans there is always a bit of latency due to the time it takes to communicate requests via the network. Each method invocation requires a RMI loop that takes time to travel from the client to the server and back to the client. A client that uses many beans will suffer from a time delay with each method invocation. Collectively, the latency delays can result in very slow clients that take several seconds to respond to each user action.

Accessing coarse-grained session beans from the client instead of fine-grained entity beans can substantially reduce problems with network bandwidth and latency. In Chapter 6, "Entity Beans", we developed the bookPassage() method on the TravelAgent bean. The bookPassage() method encapsulates the interactions of entity beans that would otherwise have resided on the client. For the network cost of one method invocation on the client (bookPassage()), several tasks are performed on the EJB server. Using session beans to encapsulate several tasks reduces the number of remote method invocations needed to accomplish a task, which reduces the amount of network traffic and latency encountered while performing these tasks.

9.3.2. Resource Consumption

Make decisions about whether to access data directly or through entity beans with care. Listing behavior that is specific to a workflow should be provided by direct data access from a session bean. Methods like listAvailableCabins() in the TravelAgent bean use direct data access because it is less expensive than creating a find method in the Cabin bean that returns a list of Cabin beans. Every bean that the system has to deal with requires resources; by avoiding the use of components where their benefit is questionable, we can improve the performance of the whole system. A CTM is like a powerful truck, and each business component it manages is like a small weight. A truck is much better at hauling around a bunch of weights than an lightweight vehicle like a bicycle, but piling too many weights on the truck will make it just as ineffective as the bicycle. If neither vehicle can move, which one is better?

9.3.3. Striking a Balance

We don't want to abandon the use of entity business components, because they provide several advantages over traditional two-tier computing. They allow us to encapsulate the business logic and data of a business concept so that it can be used consistently and reused safely across applications. In short, entity business components are better for accessing business state because they simplify data access.

At the same time, we don't want to overuse entity beans on the client. Instead, we want the client to interact with coarse-grained session beans that encapsulate the interactions of small-grained entity beans. There are situations where the client application should interact with entity beans directly. If a client application needs to edit a specific entity--change the address of a customer, for example--exposing the client to the entity bean is more practical than using a session bean. If, however, a task needs to be performed that involves the interactions of more than one entity bean--transferring money from account to another, for example--then a session bean should be used.

When a client application needs to perform a very specific operation on an entity, like an update, it makes sense to make the entity available to client directly. If the client is performing a task that spans business concepts or otherwise involves more then one entity, that task should be modeled in a session bean as a workflow. A good design will emphasize the use of coarse-grained session beans as workflow and limit the number of activities that require direct client access to entity beans.

9.3.4. Listing Behavior

Chapter 7, "Session Beans" spends some time discussing the TravelAgent bean's listAvailableCabins() method as an example of a method that returns a list of tabular data. This section provides several different strategies for implementing listing behavior in your beans.

Tabular data is data that is arranged into rows and columns. Tabular data is often used to let application users select or inspect data in the system. Enterprise JavaBeans lets you use find methods to list entity beans, but this mechanism is not a silver bullet. In many circumstances, find methods that return remote references are a heavyweight solution to a lightweight problem. For example, Chapter 9, "Design Strategies" shows the schedule for a cruise.

Table 9-1. Hypothetical Cruise Schedule

Cruise ID	Port-of-Call	Arrive	Depart
233	San Juan	June 4, 1999	June 5, 1999
233	Aruba	June 7, 1999	June 8, 1999
233	Cartagena	June 9, 1999	June 10, 1999
233	San Blas Islands	June 11, 1999	June 12, 1999

It would be possible to create a Port-Of-Call entity object that represents every destination, and then obtain a list of destinations using a find method, but this would be overkill. Recognizing that the data is not shared and only useful in this one circumstance, we would rather present the data as a simple tabular listing.

In this case, we will present the data to the bean client as an array of String objects, with the values separated by a character delimiter. Here is the method signature used to obtain the data:.

public interface Schedule implements javax.ejb.EJBObject {   
    public String [] getSchedule(int ID) throws RemoteException;   
}

And here is the structure of the String values returned by the getSchedule() method:

233; San Juan; June 4, 1999; June 5, 1999  
233; Aruba; June 7, 1999; June 8, 1999  
233; Cartegena; June 9, 1999; June 10, 1999  
233; San Blas Islands; June 11, 1999; June 12, 1999

The data could also be returned as a multidimensional array of strings, in which each column represents one field. This would certainly make it easier to reference each data item, but would also complicate navigation.

One disadvantage to using the simple array strategy is that Java is limited to single type arrays. In other words, all the elements in the array must be of the same type. We use an array of Strings here because it has the most flexibility for representing other data types. We could also have used an array of Objects or even a Vector. The problem with using an Object array or a Vector is that there is no typing information at runtime or development time.

9.3.4.1. Implementing lists as arrays of structures

Instead of returning a simple array, a method that implements some sort of listing behavior can also return an array of structures. For example, to return the cruise ship schedule data illustrated in , you could return an array of schedule structures. The structures are simple Java objects with no behavior (i.e., no methods) that are passed in an array. The definition of the structure and the bean interface that would be used are:

// Definition of the bean that uses the Structure
public interface Schedule implements javax.ejb.EJBObject {   
  public CruiseScheduleItem [] getSchedule(int ID) throws RemoteException;   
} 

// Definition of the Structure
public class CruiseScheduleItem {   
    public int cruiseID; 
    public String portName;   
    public java.util.Date arrival;   
    public java.util.Date departure;   
}

Using structures allows the data elements to be of different types. In addition, the structures are self describing: it is easy to determine the structure of the data in the tabular set based on its class definition.

9.3.4.2. Implementing lists as ResultSets

A more sophisticated and flexible way to implement a list is to provide a pass-by-value implementation of the java.sql.ResultSet interface. Although it is defined in the JDBC package (java.sql) the ResultSet interface is semantically independent of relational databases; it can be used to represent any set of tabular data. Since the ResultSet interface is familiar to most enterprise Java developers, it is an excellent construct for use in listing behavior. Using the ResultSet strategy, the signature of the getSchedule() method would be:

public interface Schedule implements javax.ejb.EJBObject {   
    public ResultSet getSchedule(int cruiseID) throws RemoteException;   
}

In some cases, the tabular data displayed at the client may be generated using standard SQL through a JDBC driver. If the circumstances permit, you may choose to perform the query in a session bean and return the result set directly to the client through a listing method. However, there are many cases in which you don't want to return a ResultSet that comes directly from JDBC drivers. A ResultSet from a JDBC 1.x driver is normally connected directly to the database, which increases network overhead and exposes your data source to the client. In these cases, you can implement your own ResultSet object that uses arrays or vectors to cache the data. JDBC 2.0 provides a cached javax.sql.RowSet that looks like a ResultSet, but is passed by value and provides features like reverse scrolling. You can use the RowSet, but don't expose behavior that allows the result set to be updated. Data updates should only be performed by bean methods.

In some cases, the tabular data comes from several data sources or nonrelational databases. In these cases, you can query the data using the appropriate mechanisms within the listing bean, and then reformat the data into your ResultSet implementation. Regardless of the source of data, you still want to present it as tabular data using a custom implementation of the ResultSet interface.

Using a ResultSet has a number of advantages and disadvantages. First, the advantages:

Consistent interface for developers: The ResultSet interface provides a consistent interface that developers are familiar with and that is consistent across different listing behaviors. Developers don't need to learn several different constructs for working with tabular data; they use the same ResultSet interface for all listing methods.
Consistent interface for automation: The ResultSet interface provides a consistent interface that allows software algorithms to operate on data independent of its content. A builder can be created that constructs an HTML or GUI table based on any set of results that implements the ResultSet.
Metadata operations: The ResultSet interface defines several metadata methods that provide developers with runtime information describing the result set they are working with.
Flexibility: The ResultSet interface is independent of the data content, which allows tabular sets to change their schema independent of the interfaces. A change in schema does not require a change to the method signatures of the listing operations.

And now, the disadvantages of using a ResultSet:

Complexity: The ResultSet interface strategy is much more complex than returning a simple array or an array of structures. It normally requires you to develop a custom implementation of the ResultSet interface. If properly designed, the custom implementation can be reused across all your listing methods, but it's still a significant development effort.
Hidden structure at development time: Although the ResultSet can describe itself through metadata at runtime, it cannot describe itself at development time. Unlike a simple array or an array of structures, the ResultSet interface provides no clues at development time about the structure of the underlying data. At runtime, metadata is available, but at development time, good documentation is required to express the structure of the data explicitly.