In Legion, system level services such as object creation and LOID-OA binding are supported by a cooperating set of core objects. Before examining the interfaces and designs of the individual system-level Legion objects involved in these services, it would be useful to have a high-level understanding of the roles these objects play and their interrelationships.
To this end, this section describes how Legion implements a simplified RPC-style interaction between two Legion objects, Caller and Callee. The description introduces the basic functionality that the supporting Legion core objects must support, but does not describe them in detail, nor does it discuss what alternative policies and implementations are allowed by the architecture and object model. These discussions are deferred until section 13.0.
Suppose that Legion object Caller wishes to invoke member function func() on another Legion object, Callee (see Figure 27). The typical chain of events consists of several steps: determining Callee's LOID, obtaining a binding for this LOID, and creating/activating Callee.
The programmer, in writing the source code for object Caller, should be sure that func() is invoked on object Callee. For example, an MPL programmer would simply include the following line of code in the program
where return_value is the same type of variable that func() returns (in this case an integer). The compiler generates code to organize the integer argument, constructing a simple macro data-flow program graph [See Viles et al] representing the function call, then translating the graph into a Legion message, and retrieving the return value and placing it in the return_value variable.
The compiler must associate the context name Callee with the appropriate Legion object in order to send a message to Callee, which means that the compiler must know Callee's LOID. One possible approach is to statically translate the programmer-specified name for Callee (in this case Callee) into a context name that the compiler can use to identify the object. This context name can then be resolved--either within the compiler or within the object being created (with potentially different results in each case)--to determine Callee's LOID. The runtime library provides routines for using Legion context objects to resolve context names to LOIDs.
In this example we assume that Callee already exists, but an alternative scenario might require that Caller create a new object (in which case the object creation mechanism would return the new object's LOID to Caller). In any event, we can assume that the caller can learn Callee's LOID.
If Caller has communicated with Callee prior to the current method invocation, Caller may already have a binding for Callee stored in its local binding cache (maintained within Caller's address space) (Figure 28a). Binding caches contain bindings collected during repeated execution of the binding mechanism and takes advantage of the fact that once Caller invokes a method on Callee, Caller is likely to invoke other methods on Callee in the near future. Once Caller has cached Callee's binding, it can use the OA in that binding to locate Callee for future method invocations, rather than taking the time to run method invocations on other objects seeking the binding over and over again. If Callee becomes inert (i.e., is deactivated) or migrates to a new OA after its binding is cached in Caller, then the cached binding is stale and Caller must obtain the up-to-date OA. Detecting stale bindings and obtaining current OAs is discussed in section 12.4.
If Caller does not have a cached binding for Callee, Caller can contact its binding agent, whose job is to return bindings for its clients (Figure 28b). Many objects can be clients of the same binding agent, allowing shared caching of the results of time-consuming binding requests. If the binding agent does not have a cached binding for Callee, it can contact Callee's class, CalleeClass, to obtain the desired binding (Figure 28d). The binding agent must learn CalleeClass's LOID in order to communicate with it: learning an object's class LOID is called the class-of mechanism, and is described in section 12.3. Then the binding agent can contact CalleeClass and request Callee's binding.
However, the binding agent must know CalleeClass's OA, which means that it must get CalleeClass's binding and perhaps execute the class-of mechanism to find CalleeClass's class, CalleeMetaclass. This may continue through a chain of metaclasses, but the binding and class-of mechanisms are recursive, and because the class-of hierarchy is rooted at LegionClass, the mechanism is guaranteed to terminate.
In order to find an object's binding, a binding agent or another object may need to contact the desired object's class; first it needs to be able to determine that class's identity. The class-of mechanism maps an object's LOID to its class's LOID. As with bindings, objects and binding agents maintain class-of caches. If Callee is not itself a class object, objects that wish to contact CalleeClass can use the fact that CalleeClass's LOID contains the same class identifier as Callee's LOID, and contains an empty instance number (as mentioned in section 9.1). Thus, the binding agent can search through its binding cache for a LOID with these characteristics, and can assume that any such LOID is that of CalleeClass. As with bindings, binding agents provide a shared caching mechanism for class-of results.
If the desired class-of result is not cached locally or in the binding agent, the class-of caller (here Caller's binding agent) must consult the comprehensive and logically-global Legion class map. The class map is maintained by LegionClass, which is located at a well-known and unchanging object address. In practice, LegionClass will be distributed over multiple cooperating processes, and the class map will be highly replicated. It is worth noting that the class map is a "write once, read many" database; the Legion object model does not allow the class of an object to change. Therefore, replicating the class map need not incur the overhead of maintaining cache coherence.
If Callee is a valid object (i.e. Callee maps to the LOID of an object that was created and has not yet been destroyed), Caller will be able to obtain a binding. However, as noted earlier, bindings (whether they come from binding caches, binding agents or class objects) can become stale: the Callee's binding might contain an OA at which Callee no longer resides. When this happens, Caller determines that the binding is stale (typically by noticing repeated failed attempts to communicate with Callee at its old address) and invokes the re-binding mechanism.
The re-binding mechanism mirrors the regular binding mechanism, but it uses the stale OA to ensure that the same binding is not returned. Caller begins by checking its binding cache for Callee's LOID: if the only binding in the cache is the one that contains the stale OA, that binding is removed from the cache, and the binding agent is consulted. The stale OA is passed as a parameter to the binding agent. The binding agent may attempt to verify that the binding is stale or immediately defer to CalleeClass. In any event, assuming that Callee has simply moved, CalleeClass will be consulted and will again serve as the ultimate authority for locating its instances.
In section 12.2 and 12.4, we based our discussion of the binding process on the fundamental assumption that classes could always return a valid OA for their instances. However inert objects do not have an OA: they are stored at an OPA. If Callee is inert when Caller invokes func(), cached bindings for Callee will be stale. The binding process will require a call to CalleeClass to obtain a new binding for Callee. CalleeClass recognizes that Callee is inert and employs the object activation mechanism to move Callee into the active state. Only by activating Callee and learning its new OA can CalleeClass obtain a valid binding to return to either Caller or a binding agent operating on Caller's behalf. Figure 29 depicts the object activation process.
Before Callee can be activated, CalleeClass must determine where it should be placed, i.e. what host and vault should be used. CalleeClass has complete freedom in selecting an appropriate host for its instances. A very conservative and simple class object might place all of its instances on its own host but more typically a class object will employ more elaborate and flexible placement policies, often using external scheduling agents. This allows a simple, generic class object implementation to be combined with any number of separately defined scheduling policies. It also supports the dynamic replacement of a class object's scheduling policy.
When an external scheduling agent is used (as in Figure 29) it may implement any specialized placement policy appropriate for the class, such as, for example, specialized policies appropriate for a 2D finite difference class used in an ocean model or for a class designed for objects that execute only on machines local to an organization and use only those resources not currently supporting interactive sessions. The agent will typically interact with other information providers (objects that gather and dynamically update information about which hosts are available, their type and attributes, their current load, and so on). Note, though, that the placement process is guided by a set of restrictions determined by the class, such as a list of acceptable object implementations. For more details on the scheduling model see Karpovich. For more information on application specific scheduling agents see Berman et al.
The placement process is performed by the class object itself or by an external agent, and returns the LOID of the chosen host object. The class must then ensure that the instance will be able to access its OPR when it runs on that host. That is, it must be sure that the instance will have access to the vault object that will manage its OPR. Not all storage devices are accessible from all hosts, so before a class activates an instance on a given host, it must verify that the current vault object containing the instance's OPR is compatible with that host. Host and vault objects' attributes (see "Object attributes") indicate their compatibility. If a class wishes to execute an instance on a host that is incompatible with the instance's vault (i.e., the vault currently containing the instance's OPR), the OPR must move to a compatible vault before the object can be started. Once this is settled, the class object invokes the host object's startObject() method. Parameters to this method specify the LOID and OPA of the object to be started and the LOID of the implementation object (see "Implementation Objects" and "mplementation Objects and Caches") to be used.
Note, however, that the startObject() method may not succeed. The host object is free to refuse the activation request for policy or security reasons (e.g., only privileged users can use that particular resource or a host might decide that its load is too high to accept new object activations). In some cases a host may have simply crashed and will, at least temporarily, be unable to service the request. If the startObject() invocation fails, the class object must make another placement selection, possibly re-invoking the external scheduling agent. If the startObject() invocation fails, the class object must make another placement selection, possibly reinvoking the external scheduling agent.
Assuming that the host decides to accept the instantiation request, it must now start a process to represent the object. This means that it must obtain appropriate executable code: the Legion objects that contain the executable code for other objects are called implementation objects. Typically, an implementation object contains a binary executable file, although the model explicitly allows shell scripts and interpreted code, such as Java bytecode or Perl. Each class maintains a list of the implementation objects that are suitable for its instances. Several different implementation objects might be maintained by a class to support the use of multiple platforms--a class might have implementation objects for different architectures, for different operating systems, with different memory requirements, etc. When startObject() is invoked on a host the class passes the appropriate implementation object's LOID so that the host can retrieve appropriate executable code.
To service a startObject() request, the host object must find or make a local copy of the executable code contained within the specified implementation object. A simple host object could retrieve, via the member functions of implementation objects, the executable code on every startObject() invocation. However, retrieving executable code can be expensive, in both communication time and local storage space. Thus, groups of host objects typically share an external implementation cache, a Legion object that downloads executable code on behalf of a set of host objects and caches copies of the executables to save storage space and communication time. To use an implementation cache, the host object sends the cache object the desired implementation object's LOID. The cache object responds with the name of a local file that contains the cached executable code--the host need not be aware of whether the cache retrieved the executable in response to this request, or used an existing local copy.
Once the implementation is locally available, the host object can execute it. How an executable is used depends on its type and the host object's characteristics. For example, if the implementation consists of native executable code the host runs the executable as a normal process; if the host is a normal Unix workstation it uses the fork() and exec() system calls. If the implementation is Java bytecode the host executes it within a Java Virtual Machine. In yet another case, if the host represents a workstation farm that is managed by a queueing system such as Condor [See Litzkow et al] or LoadLeveler [See IBM], the host starts the object through the batch system's particular interface.
Once the host activates the object, the host object passes the newly started object its LOID and new OPA. The host object determines the activated object's local OA, and returns it to the calling class object. The class marks the instance as active, records the instance's OA, and can once again return accurate bindings for the instance.
The binding and activation processes described in this section can be time consuming. In practice, aggressive caching of bindings and object executables helps bypass much of the mechanism and its cost. The benefits of this design include flexibility and the convenient transparent binary migration, one-step system-wide binary replacement for objects, object-local policy autonomy, licensing and proxies, user-definable scheduling policies, user-definable persistent storage, and more.