13.0 Batch queue host objects

The standard Legion host object creates objects using the process creation interface of the underlying operating system. However, some systems require using a queue management system to take full advantage of local resources. For example, some parallel computers contain a small number of "interactive" nodes, which can be accessed through normal means, and a large number of "compute" nodes, which can only be reached by submitting jobs to a local queue management system.

To make use of hosts that are managed by local queuing systems, Legion provides a modified host object implementation called the BatchQueueHost. BatchQueueHost objects submit jobs to the local queuing system, instead of using the standard process creation interface of the underlying operating system.

13.1 Starting a batch queue host object

To start a BatchQueueHost object, use legion_starthost with the -B flag to indicate the desired host object implementation. It would look something like this:

$ legion_starthost -B BatchQueueHost -N /hosts/SP2 \
  SP2.university.edu

Please see page 22 for more on starting new hosts.

13.2 Setting the local queue

A BatchQueueHost can be used with a variety of queue systems (LoadLeveler, Codine, PBS, and NQS are the currently supported queue types). You can specify what type of local queue a given BatchQueueHost object by editing the host object's host_queue_type attribute. For example, if you want your new BatchQueueHost object to use the local "LoadLeveler" queue, you would run legion_update_attributes and add the "LoadLeveler" attribute:

$ legion_update_attributes /hosts/SP2 \
  -a "host_queue_type('LoadLeveler')"

Currently, each BatchQueueHost can use only one queue type at a time (i.e., if multiple local queuing systems are available, they can not all be used by the same BatchQueueHost: an individual BatchQueueHost would need to be started to represent each queue). Typically, though, individual machines are managed by a single queue.

13.3 Before running objects on the new host

By default every Legion class contains a desired_host_property attribute specifying that it be run on an interactive host. You can use the legion_list_attributes command to check this particular attribute:

$ legion_list_attributes -c /class/my_class \ 
  desired_host_property

/class/my_class:
  (desired_host_property)
Total attributes retrieved 1
  desired_host_property('interactive')

This signals the scheduler that the class's instances should not run on BatchQueueHosts. This is based on the conservative assumption that any class can run on interactive hosts, but not all classes can run on batch hosts.

To allow instances of your class to run on BatchQueueHosts, you can just remove this attribute:

$ legion_update_attributes /class/my_class -d \
  "desired_host_property('interactive')"

13.4 Troubleshooting

If you are having trouble creating objects on a BatchQueueHost, there are several points of possible trouble. First be sure that you've removed the problem class's interactive desired_host_property (section 13.3). If you still have trouble, you may have a misconfigured host object. Check the following points to be sure that your host object is set up correctly.

  • The right "queue type" attribute should be set on the host. You can use the legion_list_attributes command to check this.
    $ legion_list_attributes -c /hosts/my_host host_queue_type
  • If the output shows the wrong queue type or no queue type, run legion_update_attributes to set host_queue_type correctly. For instance, if the host uses a LoadLeveler queue you could run the following:
    $ legion_update_attributes-c /hosts/my_host -a \
     "host_queue_type('LoadLeveler')"
  • This "queue type" attribute points the host object to the location of the local Legion queue management scripts in the $LEGION_HPC/bin/QueueManagementScripts directory. The above command tells the host to look in $LEGION_HPC/bin/ QueueManagementScripts/LoadLeveler. If the queue type attribute were set to Codine instead, the host would look for the queue management scripts in $LEGION_HPC/bin/QueueManagementScripts/ Codine.
  • The appropriate corresponding directory must be in the host's $LEGION_HPC/bin/QueueManagementScripts directory. It should contain the following queue management scripts.
    legion_proxy_queue_load
    legion_queue_cancel
    legion_queue_load
    legion_queue_status
    legion_queue_submit
  • These scripts should have execute permissions set for the user-id that will be running the BatchQueueHost. If all of this is set up correctly, the host should be calling the local scripts. If objects are still not being created correctly there may be a problem in the scripts.

You can get a better idea of whether or not the local scripts are being called and what they're doing by looking in the log file maintained by the scripts (look in $LEGION_OPR/Legion-BatchLog). You'll find this log on the host where the BatchQueueHost object is running. If the logs indicate that the scripts are never called there may be a scheduling problem.

There is also a six minute delay after you add a new host to the system before which it will not be selected for scheduling, so you may need to wait a few minutes before you can test a new batch queue host.

Directory of Legion 1.8 Manuals
[Home] [General] [Documentation] [Software]
[Testbeds] [Et Cetera] [Map/Search]

Free JavaScripts provided by The JavaScript Source

legion@Virginia.edu
http://legion.virginia.edu/