8.0 Executing programs remotely

Legion's remote execution allows you to run remotely executed programs from the Legion command line, taking advantage of Legion's distributed resources. Your workload can be spread out on several processors, improving job turn-around time and performance. You can execute multiple instances of a single program or execute several different programs in parallel.

8.1 Introduction

A remote program is an executable process that can be started with a command-line utility. Note that a remote program is not part of the Legion system, but is an independent program that may or may not be linked to Legion. Remote programs might be shell scripts, compilers, existing executables, etc. The term "remote" here means that the program is not or may not be executed on the local host.

8.2 Linked and independent programs

Legion permits two kinds of programs to be run remotely: Legion-linked and independent. A Legion-linked program is linked to the Legion libraries. An independent program is not linked to the Legion libraries. For example, a shell script or a program written and compiled on non-Legion systems are independent programs.

Before either a linked or independent remote program can be run from Legion it must be registered. The legion_register_runnable command registers Legion-linked programs. The legion_register_program command registers independent programs. Both of these commands require parameters containing a context path name, the program's binary path name, and the binary's architecture. You can register a program multiple times to run on different architectures.

A single command, legion_run, runs registered programs.

8.3 Registering independent programs

Independent remote programs are registered with the legion_register_program command. This includes information about the program's architecture (e.g., linux, sgi, solaris, etc.), its binary path name, and a program class. Usage is:


legion_register_program
   <program class>
   <Unix executable path>
   <legion arch> [-debug] [-help]

The <program class> parameter is a context path name for the class that will manage Legion objects related to a particular program. The path name can be whatever you find most convenient. You might want to use the name of the program in the <program class> parameter, so that it will be easier to remember (e.g., if I'm registering a program called "MyProgram" I will use MyProgram as the <program class> parameter). The context path name can be either new or a previously created path name (if you are registering multiple versions of the same program). It will refer to a (new or previously created) Legion object that permits an independent remote program to execute.

If multiple programs are registered with the same program class and architecture, the most recently registered version will be used when the program is run on that particular architecture. Be sure to reregister the program if you recompile it, since Legion makes a copy of your executable for context space only when you register it.

Once the program has been registered, it can be executed with the legion_run command.

An example of this command might look like this:

 
$ legion_register_program myProgram /bin/myProgram linux
Program class "myProgram" does not exist.
Creating class "myProgram".
Registering implementation for class "myProgram"
$ 

This output shows Legion creating the program class myProgram, as the command requested. If this class had been previously created, the output would look like this:

 
$ legion_register_program myProgram /bin/myProgram linux
Registering implementation for class "myProgram"
$ 

8.4 Registering Legion-linked programs

The legion_register_runnable command registers programs that are linked to the Legion libraries, and exports a runnable object interface. The command creates an interface between the Legion system and the program. Like legion_register_program, this command requires the remote program's program class, executable binary path, and the binary's architecture as parameters. Its usage is:

legion_register_runnable
   <program class> <executable path>
   <legion arch> [debug] [-help]

The <program class> parameter is a context path name for the Legion objects that will handle a particular program. You can choose a context path that suits your organizational scheme. If you reregister a program and use the same program class name, Legion will use the most recently registration information when running the program. Once the program has been registered, you can run it with the legion_run command. You must reregister the program if you recompile it (Legion copies your executable to context space when you register it, so the most recently registered copy will be run).

$ legion_register_runnable myProgram /bin/myProgram linux
Program class "myProgram" does not exist.
Creating class "myProgram".
Registering implementation for class "myProgram"
$ 

This output shows Legion creating the program class myProgram, as the command requested. If this class had been previously created, Legion will simply register the new information:

$ legion_register_runnable myProgram /bin/myProgram linux
Registering implementation for class "myProgram"
$ 

8.5 Running a program remotely

Use legion_run to start a single instance of a program on a remote host. If you are running a serial program with many input files and/or multiple executions you may prefer to use the legion_run_multi command (see page 91 in the Reference Manual). The program must be registered with either legion_register_program (for independent programs not linked to the Legion libraries) or legion_register_runnable (for Legion-linked programs) before you run it.

There are a number of optional parameters associated with legion_run. To try to keep things simple, we will not discuss all of them here: please see page 87 in the Reference Manual for the complete syntax and explanation of all options.

The legion_probe_run command is a useful aid when remote programs. It lets you pass input files to the remote host after the program has started running, pick up output files, check the job's status, and clean up the remote host after the job has finished.

8.5.1 Choosing a remote host

Legion will choose a host with an appropriate architecture if you do not specify one with the -h flag. The host must be part of your system (i.e., have a host object and be listed in the /hosts context: if necessary, see "Adding a new host" in the System Administrator Manual).

8.5.2 Command-line arguments

If your program requires command-line arguments, you must include them as the final parameters to legion_run. They can not be put in an option file.

8.5.3 Getting input files to the remote host

Your program may expect to find certain files in its local directory at run time, so when you execute it on a remote host you must pass copies of any expected input files to that host. If you know the name of the remote host, you can pass the files by hand, but otherwise you'll find it much easier to tell Legion the name of the files that need to be copied. You can do this before or after the program starts.

8.5.3.1 Before the program starts

You can use legion_run's -in and -IN flags to move files onto the remote host before the program starts and with legion_probe_run after it starts. The -in flag gets files from context space and the -IN file gets them from the local host.

  • From context space

    The -in flag tells Legion to copy the contents of a context file object into a new file in the remote program's current working directory before the program executes. The new file will have the same name as the context file object. E.g., if you put

    -in /home/me/myContext/inputFoo 

    on the command line, Legion will copy the contents of /home/me/mycontext/inputFoo to a new Unix file on the remote host called "inputFoo" (the file will be in the job's current working directory).

  • From Unix space
  • The -IN flag tells Legion to copy the contents of a local file into a new file in the remote program's current working directory before the program executes. The new file will have the same name as the local file.

8.5.3.2 After the program has started

Use legion_probe_run to move input files to the remote host after the program has started running (see section 8.5.8). If you wish to use this method, please note that you must have started the program with a probe file (see section 8.5.6). You can use the -in or -IN flags to pass input files from context space or local file space to the remote job's current working directory. These flags are identical to legion_run's -in and-IN flags.

8.5.4 Getting output files from the remote host

Once the program has finished, you need to retrieve the output files. If you know the name of the remote host and you are running in nonblocking mode you can get the files by hand (see section 8.5.7). Otherwise, you will need tell Legion the names of the output files that you wish to retrieve and whether you wish to put them on your local host or in context space. You can do this before or after the program starts.

8.5.4.1 Before the program starts

You can use legion_run's -out and -OUT flags to specify which files you want copied from the remote host. The -out flag copies files into context space and the -OUT file copies them onto your local host.

  • Into context space

    The -out flag tells Legion to copy the contents of a specified output file from the program's current working directory (on the remote host) into a new Legion file object in your context space once the program is terminated. The new file object will have the same name as the specified output file. E.g., if you tell Legion to copy outputFoo:

    -output outputFoo 

    Legion will look for a file called outputFoo in the remote directory and copy its contents into a new file object in your context space called outputFoo.

  • Onto your local host

    The -OUT flag tells Legion to copy the remote output file into a new file on your local host. The new file will have the same name as remote output file.

If the program has terminated because of a crash, Legion may not be able to copy all of the results.

8.5.4.2 After the program has started

You can use the -out and -OUT flags with legion_probe_run to specify which files you want copied from the remote host after the program has started running. If you wish to use this method, please note that you must have started the program with a probe file (see below). The -out flag copies files into context space and the -OUT file copies them onto your local host. These flags are identical to legion_run's -in and-IN flags.

8.5.5 Option file

If you find that you are trying to keep track of too many options when you start legion_run, you may wish to use an option file. This is a text file that holds a list of all of your flags and optional settings for that run. The file can be delimited with tabs, spaces, or new lines and can contain any of the legion_run flags except for the program class name and any command-line arguments: those must appear on the command line, along with the -f flag and the option file name. Please note that you can only use legion_run flags in this file.

8.5.6 Creating and using a probe file

A probe file is a Unix file on your local machine. The legion_probe_run command uses it to contact a remote job. To create one, simply use the -p flag when you start legion_run.

If you run in blocking mode, legion_run automatically will remove a job's probe file when the job is finished. If you run in nonblocking mode, legion_probe_run's -kill option will delete it as part of its cleaning-up operation. Otherwise, the file will remain in your local file space.The probe file is good for only one remote job, so if you reuse the name Legion will simply write over the previous file if it still exists.

8.5.7 Blocking vs. nonblocking

You can run the legion_run command in blocking or nonblocking mode.

The default mode is blocking. This means that legion_run will continue to run on your command line until the remote job has finished. All output files will be collected from the remote host, the remote directory that was used to run the job will be cleaned up, and the command will exit.

Blocking

The basic steps in running a blocking remote job are shown below in Figure 5.

Figure 5: A blocking remote run.

In this example, a user starts a remote job on his local host in blocking mode. The following events occur:

  1. The legion_run tool sends a copy of local input files (marked with the -IN flag) to the remote host.
  2. The legion_run asks the remote host to start the job. It then blocks and waits for the job to finish.
  3. The remote job gets a copy of context input files (marked with the -in flag).
  4. The job starts running.
  5. The remote job finishes and creates its output files. Any files that were marked with the -out flag are copied into context space.
  6. The remote job tells legion_run that it is finished. legion_run copies any files marked with the -OUT flag into the user's local file space and cleans up the remote job's working directory.

You can use ^C to kill the job prematurely.

Nonblocking

If you start legion_run with the -nonblock flag, on the other hand, the command will start the program on the remote host, wait ten seconds, verify that the program can start, and then die. When the remote host has finished executing the program, the job will copy any files named with the -out flag into context space but ignore any files named with the -OUT flag. The job's working directory will remain on the remote host. The basic steps are shown below in Figure 6.

Figure 6: A nonblocking remote run.

In this example, a user starts a remote job on his local host using the -nonblock flag. The following events occur:

  1. The legion_run tool sends a copy of local input files (marked with the -IN flag) to the remote host.
  2. The legion_run asks the remote host to start the job. It waits ten second, verifies that the job can start and exits.
  3. The remote job gets a copy of context input files (marked with the -in flag).
  4. The job starts running.
  5. The remote job finishes and creates its output files. Any files that were marked with the -out flag are copied into context space.

Since this is nonblocking mode, the remote job's working directory does not get cleaned up. The remote host will hold on to the job's working directory for six hours. During this period, the user can remove any remaining output files by hand. If he started a probe file, he can remove copy output files to his local host with legion_probe_run's -OUT flag.1 If the user does not take clean up the remote host in that period, the entire directory will be tarred and moved into the user's context scratch space (see section 8.5.9).

8.5.8 About legion_probe_run

The legion_probe_run command checks a remote job that was started with legion_run. You must know the name of the job's probe file (see above). You can use this command to pass input files, pick up output files, find what host is running your program, check the job's status, see what files are in the job's current working directory, kill the job, and clean up the remote host when the job is finished. An example is shown in Figure 7, below.

Figure 7: A nonblocking remote run with legion_probe_run.

In this example, a user starts a job using -nonblock to run it in nonblocking mode and -p to create a probe file. The following events occur:

  1. The legion_run tool sends a copy of local input files (marked with the -IN flag) to the remote host
  2. The legion_run asks the remote host to start the job.
  3. It also creates and stores a probe file on the local host. It then waits ten second, verifies that the job can start and exits.
  4. The remote job gets a copy of context input files (marked with the -in flag).
  5. The job starts running.
  6. The remote job finishes and creates its output files. Any files that were marked with the -out flag are copied into context space.
  7. The user runs legion_probe_run with the -statjob flag to check on the job. The tool looks for the probe file in its local execution environment.
  8. The legion_probe_run tool contacts the remote job and sees that it is finished.
  9. The user runs legion_probe_run with -OUT flag, telling Legion which files he wants copied onto his local host. (In this scenario it makes no difference whether or not the user ran legion_run with -OUT flags: if legion_run is set to nonblocking it will not stick around to move files back to the local host.)

The user can then run legion_probe_run with the -kill flag. This destroys the remote job's working directory and the probe file. Please note that if you use this flag before the job is finished you will terminate the job and lose all data.

Please see page 83 in the Reference Manual for more information about this command.

8.5.9 Context scratch space

Legion uses context space as backup storage (scratch space) for any remote jobs that finish and are not cleaned up by legion_run or legion_probe_run. If the job finishes and is not picked up or checked for six hours, the remote host will tar, compress, and move the job's working directory into your context scratch space.

The default scratch space is /tmp but you can set it to anywhere in your context space.2 There are several ways to do this.

  1. Set the LEGION_SCRATCH_CONTEXT variable from the command line:
    $ LEGION_SCRATCH_CONTEXT=/home/myContext/scratch/ 
  2. Use -setscratch or -D when you run legion_run.
  3. Use -setscratch when you run legion_probe_run.

If you set it multiple times, the most recent setting will be used.

8.5.10 Retrieving files from context scratch space

If your remote job's working directory was sent to context scratch space, you'll need to copy it to your local host then uncompress and untar it in order to use it. First, run legion_cp:

$ legion_cp -localdest <tar file> <local_file_name.tar.Z> 

Be sure to give the local file name a "tar.Z" suffix. You then need to run uncompress and tar:3

$ uncompress <local_file_name.tar.Z>
$ tar xvf <local_file_name.tar> 

8.6 Example

Suppose that we have a program called "Doris" that we wish to run on a remote linux host. The first step is to determine whether it is linked to the Legion libraries (Legion-linked) or not (independent). In this case, we'll suppose that Doris is not linked to the Legion libraries.

  1. The first step is to register the program with Legion. Since it's independent, we'll use legion_register_program. We'll call the program class "doris," to keep things simple.
    $ legion_register_program doris /bin/Doris linux
    Program class "doris" does not exist.
    Creating class "doris".
    legion_update_attributes: Added 1 attributes(s) to object
    legion_update_attributes: Added 1 attributes(s) to object
    Registering implementation for class "doris"
    $ 
  2. If we haven't already set a tty object for your shell, we'll do so now.
  3. We can now run the program. We have two options: to run it as in blocking mode (the default setting) or in nonblocking mode. In this case, Doris only takes a few minutes to run, so there's no reason not to use the default setting. If it took a long time to run and we wanted to free up the command line, we might choose to run in nonblocking mode. We will, though, make a probe file called "dorisProbe" so that we can check on Doris's progress.

    We'll further suppose that Doris expects two input files, input1 and input2, and produces two output files, output1 and output2, so we'll pass their names along with legion_run.

    $ legion_run  -IN input1 -IN input2 -OUT output1 \
       -OUT output2 -p dorisProbe doris  
  4. We didn't ask for a specific host, but we can get the remote host's name and the remote job's working directory with legion_probe_run.
    $ legion_probe_run -hostname -pwd -p dorisProbe
    gander.cs.virginia.edu
    /myDir/OPR/BootstrapVaultOPR/ReservedOPR-1.80
    $ 

    Note that you can use legion_run's -h and -d flags to specify a remote host and directory when you start the program.

  5. Once the program finishes, legion_run copies the -OUT files to the local host, deletes the probe file and cleans up the remote working directory. If we had run this in nonblocking mode we would have to pick up the -OUT files and then clean up, either with legion_probe_run or by hand.

8.7 Converting a C/C++ program

Any C or C++ program can be made into a Legion runnable object using the following steps:

  1. The program should export a C-linkable legion_main function in place of its main function.
  2. The program should be linked to the Legion libraries. Add the following to your link line:

1. If you run legion_probe_run on a terminated job, the clock will restart and give you another six hours. back

2. This may be necessary, since if security is enabled you may not have permission to work in the /tmp context or your system administrator may have removed the context altogether.back

3. These are common Unix tools that should be available on all Unix platforms. If you do not have access to these tools, please contact us at legion-help@virginia.edu.back

Directory of Legion 1.7 Manuals
[Home] [General] [Documentation] [Software]
[Testbeds] [Et Cetera] [Map/Search]

Free JavaScripts provided by The JavaScript Source

legion@Virginia.edu
http://legion.virginia.edu/