Resource allocation is a negotiation process between resources and objects. Legion encompasses a variety of resources, including hosts, networks, vaults, and I/O devices such as printers and sensors. Users will require blocks of time on some or all of these resources. They must also be able to control placement of their objects' active and persistent states or Legion will be unsuitable for certain types of applications. Administrators, on the other hand, must regulate usage of and control access to their resources, as well as enforce local usage policies, security policies, etc. Different applications may require certain types and levels of parallelism, storage and memory space, I/O support, communication, and communication topology. A single, generic resource scheduling arrangement is simply not feasible under these conditions: in order to give equal power to resource providers and consumers both parties must be able to arrive at an arrangement.
Application- and object-specific resources allocation is crucial to achieving both user and administrator goals. Legion supports the development and use of allocation algorithms tailored to objects or to applications so that the resource objects suit the resource provider's interests. Each provider can create his or her own flavors of resource objects. Class objects enforce policies for their instances, and both parties can export interfaces to facilitate negotiations. This placement philosophy allows each allocation decision to be tailored to the application, so as to better match actual application run-time behavior and application-specific knowledge.
In the allocation of resources for a specific task there are three steps: decision, enactment, and monitoring. In the decision stage, the task's characteristics, requirements, and run-time behavior, the resource's properties and policies, and users' preferences must all be considered. Legion provides an information infrastructure and a resource negotiation framework for this stage. The allocation decision includes target hosts and storage space with assorted host objects and vaults. At the enactment stage, the allocation coordinator sends an activation request, including the desired host-vault mapping for the object, to the class object which will carry out the request. The class object checks that the placement is acceptable and then coordinates with the host object and vault to create and start the new object. The monitoring stage ensures that the new object is operating correctly and that it is using the correct allocation of resources. An object-mandatory interface includes functions to establish and remove triggers that will monitor object status.
There are three special objects involved in Legion resource management; the collection, the scheduler, and the enactor. The collection collects resource information, constantly monitoring the system's host and vault objects to determine which resources are in use and which are available for what kind of tasks. The scheduler determines possible resource use schedules for specific tasks and makes policy decisions about placing those tasks. The enactor negotiates with resources to carry out those schedules and acquires reservation tokens from successful negotiations.
Suppose, for example, that a user wants ClassFoo to start instance Foo on another host. The user sends a call (Figure 20, step 1) to the basic Legion scheduler.1 The scheduler then consults the collection to determine what resources are appropriate and available for Foo (step 2) and builds a sample schedule or series of schedules (step 3). It then sends a sample schedule to the enactor (step 4). The enactor contacts each resource on the schedule and requests an allocation of time (step 5). Once it has contacted each resource and reserved the necessary time it confirms the schedule with the scheduler (step 6), and then contacts ClassFoo and tells it to begin Foo on the appropriate resources (step 7). ClassFoo contacts the resource(s) and sends the order to create Foo (step 8).
Before the scheduler approaches the collection (step 2 in Figure 20) it must know Foo's specific needs: computing time, any dependency graphs, special requirements, etc. If it is class-specific, the scheduler may explicitly know this information, or it can obtain descriptive information from the class via the attributes interface.
With this information, it asks the collection for a list of the correct and available resources (Figure 21, step 1). Individual resources may limit outside usage to certain types of users, objects, hosts, etc., so the collection has information regarding which resources will accept instances of ClassFoo, on its particular host, started by a particular user. If the scheduler requests a SPARC, for example, the collection searches for a list of those SPARCs that are accessible, available, and have the proper amount of free space.
The collection returns a list of matching resources to the scheduler (Figure 21, step 2). The scheduler builds a set of possible schedules (Figure 21, step 3), prioritizes them, and sends an ordered list of schedules off to the enactor. This can vary, of course, depending on the individual scheduler. A different scheduler might send only one or several possible schedules.
When the enactor is ready to request reservations of time and space on the resources on a chosen schedule it methodically goes down its list and approaches each one individually. If any resource refuses the enactor's request, the enactor either starts on the next one or, if all the schedules on its list have failed, informs the scheduler that it needs a new schedule. In the scenario in Figure 20, host object Beta and vault object Beta are the only resources required, so the enactor approaches each one with the make_reservation() function (this function asks for an instant reservation). It requests reservation for the host and the vault separately. Host object Beta then decides whether or not to accept the reservation. Note that Beta is free to refuse the reservation at any point during the entire scheduling procedure if previous reservations or internal considerations require a cancellation, regardless of its usual policies towards the user and the user's host. If it refuses, the enactor tells the scheduler that the schedule is not possible, and requests another schedule, or, if the scheduler has sent a list of schedules, the enactor moves down to the next schedule. If this happens after other hosts objects on the failed schedule have been contacted and have agreed to make reservations, the enactor contacts them again to release its reservations. The prototype enactor allows schedule expression as a difference set from the previous schedule. Figure 22 shows the scheduler data structure.
In our example, however, host object Beta allows a reservation and sends the enactor a reservation token. The enactor then contacts vault object Beta. Once all resources on a schedule have been contacted and have sent in reservation tokens, the enactor notifies the scheduler that the task can be completed (Figure 23, step 1).
The scheduler then tells the enactor to contact ClassFoo (Figure 23, step 2). The enactor sends a call to create_instance(reserved host object name, reservation token) to ClassFoo (Figure 23, step 3), which in turn sends a start_object(reservation token) call to host object Beta (Figure 23, step 4) and the name of the vault Beta should use. Beta then creates instance Foo (Figure 23, step 5).
Both ClassFoo and the resource have veto power throughout the reservation process: Legion's resource management system handles denial or failure at any stage. For example, if a resource object refuses to grant a reservation the enactor asks the scheduler for another schedule and tries again. A reserved resource can also reject a start_object() call. The two parties remain autonomous during this entire procedure and are not obligated to honor each other's commitments.
Depending on the individual resource, a reservation may have a time-out period after which the reservation is released. The reservation guarantees that a block of time and space will remain open for a specific period, beginning with the creation of the token. Note that the token may expire by the time ClassFoo is ready to use it.
A system may have multiple schedulers, for different processes, problems, levels of granularity, etc. For information on writing your own scheduler, or using multiple schedulers, please contact the Legion group.
The scheduler can be called directly from a user-level program or it can be called from a Class Object. Writers of schedulers should support the following interface. For complete information, consult $LEGION/src/ServiceObjects/Schedulers. Scheduler writers are encouraged to derive new schedulers off of the existing schedulers, and to overload the placement generation methods.
The primary interface to the enactor is through the enact_schedule() call. There are additional functions for performing lower-level operations (such as making and canceling individual reservations, or to activate or create individual objects), but, from the point-of-view of a typical scheduler, enact_schedule() is the most important method.
As soon as a schedule is successful (i.e. reservations are obtained for all placements in the schedule), the schedule is returned to the caller with a success code. If all schedules fail, the return value includes error codes indicating which placements caused each schedule to fail.
For complete information on the data structures passed, schedule writers are encouraged to examine the header file $LEGION/include/legion/ LegionSchedule.h and to look at the source code for the example schedulers in $LEGION/src/ServiceObjects/Schedulers.
The collection supports five methods useful for resource management, although only one of them is used by schedulers. The QueryCollection() method call passes in a string containing a logical expression describing the systems of interest.
The Legion collection object uses the MESSIAHS Interface Language (MIL) . Collection queries can be constructed using the interface below.
In MIL, variables are of the form $VARNAME (e.g. $system_arch). The official variable names will be those exported by resource objects in their attributes. To get some idea of the current set of attributes, consult the documentation on resource objects (for the most up-to-date information, invoke the retrieve_all_attributes() method on a resource object and examine the results).