Legion 1.4
Basic User Manual


10.0 MPI

The current Legion implementation supports a core MPI interface, which includes messages passing, data marshaling and heterogeneous conversion. Legion supports legacy MPI codes, and provides an enhanced MPI environment using Legion features such as security and placement services.

A link-in replacement MPI library uses the primitive services provided by Legion to support the MPI interface. MPI processes map directly to Legion objects.

10.1 Task classes

MPI implementations generally require that MPI executables reside in a given place on disk. Legion's implementation objects serve a similar role. We have provided a tool, legion_mpi_register, to register executables of different architectures for use with Legion MPI.

10.2 Using Legion MPI

10.2.1 Installation

The Legion MPI library can be found in $LEGION/src/ServiceObjects/MPI.

10.2.2 Compilation

MPI code may be compiled as before. Link it against libmpi and the basic Legion libraries. The final linkage must be performed using a C++ compiler or by passing C++ appropriate flags to ld.

A sample Legion MPI makefile is shown in Figure 15.

Figure 15: Sample Legion MPI make file
C       =       gcc
CC      =       g++
F77     =       g77
OPT     =       -g
INC_DIR =       /home/appnet/Legion/include/MPI
FFLAGS  =       -I$(INC_DIR) -ffixed-line-length-none 

mpitest: mpitest.o
        legion_link -Fortran -mpi mpitest.o -o mpitest

mpitest.o: mpitest.f
        $(F77) -c mpitest.f -I$(INC_DIR) $(FFLAGS)

mpitest_c: mpitest_c.o
        legion_link -mpi mpitest_c.o -o mpitest_c

mpitest_c: mpitest_c.c
        $(C) $(CFLAGS) mpitest_c.c -o mpitest_c $(LDFLAGS) $(LIBS)

10.2.3 Register Compiled Tasks

Run legion_mpi_register. Usage of this tool is:

legion_mpi_register <class name> <binary path> <platform type>

The example below registers /myMPIprograms/vdelay (the binary path) as using a Linux architecture.

$ legion_mpi_register vdelay /myMPIprograms/vdelay linux

This creates MPI-specific contexts (/mpi, /mpi/programs, and /mpi/instances), a class object for this program, an implementation for this program, and registers the name in Legion context space.

The legion_mpi_register command may be executed several times if you have compiled your program on several architectures.

10.2.4 Running the MPI Application

MPI programs are started using the program legion_mpi_run:

legion_mpi_run  {-f <options file> [<flags>]} | 
	{-n <number of hosts> [<flags>] <command> [<arguments>]}

Parameters used for this command are:

-f <options file>Allows users to run multiple MPI binaries with a common MPI_COMM_WORLD
-n <number of processes>Specify the number of hosts on which the program will run

supported <flags>; are:

-h <host context path>Specify the set of hosts on which the program will run (default is the system's default placement)
-Ø <host context name>Runs the first process (i.e., process zero) on this node
-p <PID context>Specify a context name for PIDs (default is /mpi/instances/program_name)
-SPrint statistics at exit
-vVerbose option (up to four of these can be specified for increased detail)
-d <Unix path name>Specify that all children change to the specified directory before they begin to run.

Note that if the -f flag is used, the -n flag, program name, and any arguments will be ignored. Any <flags> used with -f will be treated as defaults and applied to all processes executed by this command, unless otherwise specified in the options file. The legion_mpi_run utility will expect one binary per line in the options file, each line including any necessary arguments as they would appear on the command line. Each line can also contain any of the legion_mpi_run flags except the -f flag.

If the -f flag is not used, the MPI application named in the legion_mpi_run command will then be started with the given flags.

For example:

$ legion_mpi_run -n 2 /mpi/programs/vdelay

Note that the <command> argument here is the Legion context space name for the class created by legion_mpi_register in the previous section. In this example the default path mpi/programs is used with the class name vdelay.

You can examine the running objects of your application using

$ legion_ls /mpi/instances/program_name

This context will have an entry for each object in this MPI application. If you run multiple versions of the application simultaneously, use the -p flag with legion_mpi_run to specify an alternate context in which to put the list of objects.

To view all instances of the program's class, use legion_ list_instances:

$ legion_list_instances -c /mpi/programs/program_name

To destroy all of the program class's instances, use the legion_ destroy_instances command. Note, though, that this command will search out and destroy all of the class's instances.

Note that MPI programs cannot save their state or be deactivated. They simply die. This may have an impact on fault-tolerance, although we will it in a future release.

10.3 Example

There is a sample MPI program included in a new Legion system, written in C (mpitest_c.c) and Fortran (mpitest.f). Go to the directory containing the program's binary files:

$ cd $LEGION/src/ServiceObjects/MPI/examples

You must then register whichever version you wish to use with the legion_mpi_register command. The sample output below uses mpitest_c and shows Legion creating three new contexts (mpi, programs, and instances, the last two being sub-contexts of mpi), creating and tagging a task class, and registering the implementation of the mpitest_c program.

$ legion_mpi_register mpitest_c $LEGION/bin/$LEGION_ARCH/mpitest_c linux
"/mpi" does not exist - creating it
"/mpi/programs" does not exist - creating it
"/mpi/instances" does not exist - creating it
Task class "/mpi/programs/mpitest_c" does not exist - creating it
"/mpi/instances/mpitest_c" does not exist - creating it
Set the class tag to: mpitest_c
Registering implementation of mpitest_c
$

In order to view the output of the program, you will need to create and set a tty object. The example below creates, sets, and watches a tty object. (Please see About Legion tty objects for more information on tty objects).

$ legion_tty /context_path/mytty

Having done all of this, you can now run the program, with the legion_mpi_run command to run the program. You must specify how many processes the program should use to run the program with the -n flag.

If you do not specify which hosts the program should run its processes on, Legion will randomly choose the hosts from the host objects that are in your system. Here it runs three processes on three different hosts.

$ legion_mpi_run -n 3 /mpi/programs/mpitest_c
Hello from node 0 of 3 hostname Host2
Hello from node 1 of 3 hostname Host3
Node 0 has a secret, and it's 42
Node 1 thinks the secret is 42
Hello from node 2 of 3 hostname BootstrapHost
Node 2 thinks the secret is 42
Node 0 thinks the secret is 42
Node 0 exits barrier and exits.
Node 1 exits barrier and exits.
Node 2 exits barrier and exits.
$

Another time it might run the three processes on two hosts.

$ legion_mpi_run -n 3 /mpi/programs/mpitest_c
Hello from node 1 of 3 hostname BootstrapHost
Hello from node 0 of 3 hostname Host3
Node 0 has a secret, and it's 42
Node 1 thinks the secret is 42
Hello from node 2 of 3 hostname BootstrapHost
Node 2 thinks the secret is 42
Node 0 thinks the secret is 42
Node 0 exits barrier and exits.
Node 1 exits barrier and exits.
Node 2 exits barrier and exits.
$

10.4 Accessing files in programs using MPI

When you run MPI, normally, you run on a single machine and all of your input files are locally placed. However, when you run an MPI program in Legion you will need to be able to transparently access other files for input or output. Legion has three subroutines that allow your program to read and write files in Legion context space: LIOF_LEGION_TO_TEMPFILE, LIOF_CREATE_TEMPFILE, and LIOF_TEMPFILE_TO_LEGION.

The first, LIOF_LEGION_TO_TEMPFILE, lets you copy a file from Legion context space into a local file. For example,

call LIOF_LEGION_TO_TEMPFILE ('input', INPUT, ierr)
open (10, file = INPUT, status = 'old')

will copy a file with the Legion filename input into a local file and store the name of that local file in the variable INPUT. The second line opens the local copy of the file.

The other two subroutine can be used to create and write to a local file. For example:

call LIOF_CREATE_TEMPFILE (OUTPUT, IERR)
open (11, file = OUTPUT, status = 'new')
call LIOF_TEMPFILE_TO_LEGION (OUTPUT, 'output', IERR)

The first line creates a local file and stores the file's name in the variable OUTPUT. The second line opens the local copy of the file. The third line copies the local file OUTPUT into a Legion file with the name output.

While this approach allows you to run MPI programs and transparently read and write files remotely, it does have one limitation: it does not support heterogeneous conversions of data. If you run this program on several machines which have different formats for an integer, such as Intel PC's (little-endian) and IBM RS/6000's (big-endian), the result of using unformatted I/O will be surprising. If you want to use such a heterogeneous system, you will have to either use formatted I/O (all files are text) or use the "typed binary" I/O calls instead of Fortran READ and WRITE statements. These "typed binary" I/O calls are discussed in Buffered I/O library, low impact interface in the Legion Developer Manual.

10.5 Scheduling MPI processes

As noted above, Legion may not choose the same number of hosts objects as processes specified in the -n parameter. If you specify three processes, Legion will randomly choose to run them on one, two, or three hosts.

If you wish to specify which hosts are used to run your processor, use the -h flag. In the example below, we will run mpitest_c on four hosts, which we will assume are already part of the system (see Host and vault objects for information about adding hosts to a Legion system) and which, in this example, are named in a single context called mpi-hosts.

First, you must create the mpi-hosts context with the legion_context_create command:

$ legion_context_create mpi-hosts
Creating context "mpi-hosts" in parent ".".
New context LOID = "1.01.05.44b53908.000001fc0940fbf7

We can now assign new context names to the four hosts that we wish to use for this program, using the legion_ln command. These new names map to the host objects' LOIDs, and provide a convenient way to specify different hosts for different MPI programs or to execute the same program on different groups of hosts.

$ legion_ln /hosts/Host1 /mpi-hosts/mpitest_Host1
$ legion_ln /hosts/Host2 /mpi-hosts/mpitest_Host2
$ legion_ln /hosts/Host3 /mpi-hosts/mpitest_Host3
$ legion_ln /hosts/Host4 /mpi-hosts/mpitest_Host4

The program can now be run with legion_mpi_run, using the -h flag to specify that the four processes should be run on the hosts in mpi-hosts. Note that the nodes are placed on the hosts in order.

$ legion_mpi_run -h /mpi-hosts -n 4 /mpi/programs/emptiest
Hello from node 0 of 4 hostname Host1.xxx.xxx
Hello from node 1 of 4 hostname Host2.xxx.xxx
Node 0 has a secret, and it's 42
Hello from node 2 of 4 hostname Host3.xxx.xxx
Node 1 thinks the secret is 42
Node 2 thinks the secret is 42
Hello from node 3 of 4 hostname Host4.xxx.xxx
Node 3 thinks the secret is 42
Node 0 thinks the secret is 42
Node 0 exits barrier and exits.
Node 1 exits barrier and exits.
Node 2 exits barrier and exits.
Node 3 exits barrier and exits.
$

Note also that the -n parameter is still used, even though there are only four host objects named in the mpi-hosts context. The number of processes and number of host object do not have to match: if you wanted to run only two processes on the hosts named in mpi-hosts you could use -n 2. The program would then run on the first two hosts listed in the mpi-hosts context.1

$ legion_mpi_run -h /mpi-hosts -n 2 /mpi/programs/mpitest
Hello from node 0 of 2 hostname Host1.xxx.xxx
Node 0 has a secret, and it's 42
Hello from node 1 of 2 hostname Host2.xxx.xxx
Node 1 thinks the secret is 42
Node 0 thinks the secret is 42
Node 0 exits barrier and exits.
Node 1 exits barrier and exits.
$

10.6 Debugging support

A utility program, legion_mpi_debug, allows the user to examine the state of MPI objects and print out their message queues. This is a handy way to debug deadlock.

legion_mpi_debug [-q] {-c <instance context path>}

The -q flag will list the contents of the queues. For example:

$ legion_mpi_debug -q -c /mpi/instances/vdelay
Process 0 state: spinning in MPI_Recv source 1 tag 0 comm?
Process 0 queue:

(queue empty)

Process 1 state: spinning in MPI_Recv source 0 tag 0 comm?
Process 1 queue:

MPI_message source 0 tag 0 len 8
$

Do not be alarmed that process 1 hasn't picked up the matching message in its queue: it will be picked up after the command has finished running.

There are a few limitations in legion_mpi_debug: If an MPI object doesn't respond, it will hang, and it won't go on to query additional objects. An MPI object can only respond when it enters the MPI library; if it is in an endless computational loop, it will never reply.

Output from a Legion MPI object to standard output or Fortran unit 6 is automatically directed to a tty object (see About Legion tty objects). Note, though, that this output will be mixed with other output in your tty object so you may wish to segregate MPI output. Legion MPI supports a special set of library routines to print on the Legion tty, which is created using the legion_tty command. This will cause the MPI object's output to be printed in segregated blocks.

LMPI_Console_Output (string) Buffers a string to be output later.

LMPI_Console_Output_Flush () Flushes all buffered strings to the tty.

LMPI_Console_Output_And_Flush (string) Buffers and flushes in one call.

The Fortran version of these routines adds a carriage return at the end of each string, and omits trailing spaces. The C version does not.

10.7 Functions supported

Please note that data packed/unpacked using the MPI_Pack and MPI_Unpack functions will not be properly converted in heterogeneous networks.

All MPI functions are currently supported in Legion MPI.


1. If you use the -h flag in this way, be sure to double-check the order in which the host objects are listed in your context. back


Back to Basic User Manual Table of Contents