Legion's remote execution allows you to run remotely executed programs from the Legion command line, taking advantage of Legion's distributed resources. Your workload can be spread out on several processors, improving job turn-around time and performance. You can execute multiple instances of a single program or execute several different programs in parallel.
A remote program is an executable process that can be started with a command-line utility. Note that a remote program is not part of the Legion system, but is an independent program that may or may not be linked to Legion. Remote programs might be shell scripts, compilers, existing executables, etc. The term "remote" here means that the program is not or may not be executed on the local host.
Legion permits two kinds of programs to be run remotely: Legion-linked and independent. A Legion-linked program is linked to the Legion libraries. An independent program is not linked to the Legion libraries. For example, a shell script or a program written and compiled on non-Legion systems are independent programs.
Before either a linked or independent remote program can be run from Legion it must be registered. The legion_register_runnable command registers Legion-linked programs. The legion_register_program command registers independent programs. Both of these commands require parameters containing a context path name, the program's binary path name, and the binary's architecture. You can register a program multiple times to run on different architectures.
Independent remote programs are registered with the legion_register_program command. This includes information about the program's architecture (e.g., linux, sgi, solaris, etc.), its binary path name, and a program class. Usage is:
The <program class> parameter is a context path name for the class that will manage Legion objects related to a particular program. The path name can be whatever you find most convenient. You might want to use the name of the program in the <program class> parameter, so that it will be easier to remember (e.g., if I'm registering a program called "MyProgram" I will use MyProgram as the <program class> parameter). The context path name can be either new or a previously created path name (if you are registering multiple versions of the same program). It will refer to a (new or previously created) Legion object that permits an independent remote program to execute.
If multiple programs are registered with the same program class and architecture, the most recently registered version will be used when the program is run on that particular architecture. Be sure to reregister the program if you recompile it, since Legion makes a copy of your executable for context space only when you register it.
The legion_register_runnable command registers programs that are linked to the Legion libraries, and exports a runnable object interface. The command creates an interface between the Legion system and the program. Like legion_register_program, this command requires the remote program's program class, executable binary path, and the binary's architecture as parameters. Its usage is:
The <program class> parameter is a context path name for the Legion objects that will handle a particular program. You can choose a context path that suits your organizational scheme. If you reregister a program and use the same program class name, Legion will use the most recently registration information when running the program. Once the program has been registered, you can run it with the legion_run command. You must reregister the program if you recompile it (Legion copies your executable to context space when you register it, so the most recently registered copy will be run).
Use legion_run to start a single instance of a program on a remote host. If you are running a serial program with many input files and/or multiple executions you may prefer to use the legion_run_multi command (see page 91 in the Reference Manual). The program must be registered with either legion_register_program (for independent programs not linked to the Legion libraries) or legion_register_runnable (for Legion-linked programs) before you run it.
There are a number of optional parameters associated with legion_run. To try to keep things simple, we will not discuss all of them here: please see page 87 in the Reference Manual for the complete syntax and explanation of all options.
The legion_probe_run command is a useful aid when remote programs. It lets you pass input files to the remote host after the program has started running, pick up output files, check the job's status, and clean up the remote host after the job has finished.
Legion will choose a host with an appropriate architecture if you do not specify one with the -h flag. The host must be part of your system (i.e., have a host object and be listed in the /hosts context: if necessary, see "Adding a new host" in the System Administrator Manual).
Your program may expect to find certain files in its local directory at run time, so when you execute it on a remote host you must pass copies of any expected input files to that host. If you know the name of the remote host, you can pass the files by hand, but otherwise you'll find it much easier to tell Legion the name of the files that need to be copied. You can do this before or after the program starts.
You can use legion_run's -in and -IN flags to move files onto the remote host before the program starts and with legion_probe_run after it starts. The -in flag gets files from context space and the -IN file gets them from the local host.
The -IN flag tells Legion to copy the contents of a local file into a new file in the remote program's current working directory before the program executes. The new file will have the same name as the local file.
Use legion_probe_run to move input files to the remote host after the program has started running (see section 8.5.8). If you wish to use this method, please note that you must have started the program with a probe file (see section 8.5.6). You can use the -in or -IN flags to pass input files from context space or local file space to the remote job's current working directory. These flags are identical to legion_run's -in and-IN flags.
Once the program has finished, you need to retrieve the output files. If you know the name of the remote host and you are running in nonblocking mode you can get the files by hand (see section 8.5.7). Otherwise, you will need tell Legion the names of the output files that you wish to retrieve and whether you wish to put them on your local host or in context space. You can do this before or after the program starts.
You can use legion_run's -out and -OUT flags to specify which files you want copied from the remote host. The -out flag copies files into context space and the -OUT file copies them onto your local host.
You can use the -out and -OUT flags with legion_probe_run to specify which files you want copied from the remote host after the program has started running. If you wish to use this method, please note that you must have started the program with a probe file (see below). The -out flag copies files into context space and the -OUT file copies them onto your local host. These flags are identical to legion_run's -in and-IN flags.
If you find that you are trying to keep track of too many options when you start legion_run, you may wish to use an option file. This is a text file that holds a list of all of your flags and optional settings for that run. The file can be delimited with tabs, spaces, or new lines and can contain any of the legion_run flags except for the program class name and any command-line arguments: those must appear on the command line, along with the -f flag and the option file name. Please note that you can only use legion_run flags in this file.
If you run in blocking mode, legion_run automatically will remove a job's probe file when the job is finished. If you run in nonblocking mode, legion_probe_run's -kill option will delete it as part of its cleaning-up operation. Otherwise, the file will remain in your local file space.The probe file is good for only one remote job, so if you reuse the name Legion will simply write over the previous file if it still exists.
The default mode is blocking. This means that legion_run will continue to run on your command line until the remote job has finished. All output files will be collected from the remote host, the remote directory that was used to run the job will be cleaned up, and the command will exit.
If you start legion_run with the -nonblock flag, on the other hand, the command will start the program on the remote host, wait ten seconds, verify that the program can start, and then die. When the remote host has finished executing the program, the job will copy any files named with the -out flag into context space but ignore any files named with the -OUT flag. The job's working directory will remain on the remote host. The basic steps are shown below in Figure 6.
Since this is nonblocking mode, the remote job's working directory does not get cleaned up. The remote host will hold on to the job's working directory for six hours. During this period, the user can remove any remaining output files by hand. If he started a probe file, he can remove copy output files to his local host with legion_probe_run's -OUT flag.1 If the user does not take clean up the remote host in that period, the entire directory will be tarred and moved into the user's context scratch space (see section 8.5.9).
The legion_probe_run command checks a remote job that was started with legion_run. You must know the name of the job's probe file (see above). You can use this command to pass input files, pick up output files, find what host is running your program, check the job's status, see what files are in the job's current working directory, kill the job, and clean up the remote host when the job is finished. An example is shown in Figure 7, below.
The user can then run legion_probe_run with the -kill flag. This destroys the remote job's working directory and the probe file. Please note that if you use this flag before the job is finished you will terminate the job and lose all data.
Please see page 83 in the Reference Manual for more information about this command.
Legion uses context space as backup storage (scratch space) for any remote jobs that finish and are not cleaned up by legion_run or legion_probe_run. If the job finishes and is not picked up or checked for six hours, the remote host will tar, compress, and move the job's working directory into your context scratch space.
The default scratch space is /tmp but you can set it to anywhere in your context space.2 There are several ways to do this.
Be sure to give the local file name a "tar.Z" suffix. You then need to run uncompress and tar:3
Suppose that we have a program called "Doris" that we wish to run on a remote linux host. The first step is to determine whether it is linked to the Legion libraries (Legion-linked) or not (independent). In this case, we'll suppose that Doris is not linked to the Legion libraries.
1. If you run legion_probe_run on a terminated job, the clock will restart and give you another six hours. back
2. This may be necessary, since if security is enabled you may not have permission to work in the /tmp context or your system administrator may have removed the context altogether.back
3. These are common Unix tools that should be available on all Unix platforms. If you do not have access to these tools, please contact us at email@example.com