Legion 1.2
Basic User Manual

10.0 MPI

The current Legion implementation supports a core MPI interface, which includes messages passing, data marshaling and heterogeneous conversion. Legion supports legacy MPI codes, and provides an enhanced MPI environment using Legion features such as security and placement services.

A link-in replacement MPI library uses the primitive services provided by Legion to support the MPI interface. MPI processes map directly to Legion objects.

10.1 Task classes

MPI implementations generally require that MPI executables reside in a given place on disk. Legion's implementation objects serve a similar role. We have provided a tool, legion_mpi_register, which registers executables of different architectures for use with Legion MPI.

10.2 Using Legion MPI

10.2.1 Installation

The Legion MPI library can be found in $LEGION/src/ServiceObjects/MPI.

10.2.2 Compilation

MPI code may be compiled as before. Link it against libmpi and the basic Legion libraries. The final linkage must be performed using a C++ compiler or by passing C++ appropriate flags to ld.

A sample Legion MPI makefile is shown below.

C       =       gcc
CC      =       g++
F77     =       g77
OPT     =       -g
LIB_DIR =       /home/appnet/Legion/lib/linux/g++
INC_DIR =       /home/appnet/Legion/include/MPI
INC_DIR2=       /home/appnet/Legion/include
CFLAGS  =       -I$(INC_DIR) -D$(LEGION_ARCH) -DGNU -I$(INC_DIR2)
FFLAGS  =       -I$(INC_DIR) -D$(LEGION_ARCH) -DGNU -ffixed-line-length-none 
LDFLAGS =       -L$(LIB_DIR) -lLegion -lm $(LFLAGS)
LIBS    =       -lmpif -lmpi -lLegionClientStubs -lBasicFiles
mpitest: mpitest.f
        $(F77) $(FFLAGS) mpitest.f -o mpitest $(LDFLAGS) $(LIBS)
mpitest_c: mpitest_c.c
        $(C) $(CFLAGS) mpitest_c.c -o mpitest_c $(LDFLAGS) $(LIBS)
10.2.3 Register Compiled Tasks

Run legion_mpi_register. Usage of this tool is:

legion_mpi_register <class name> <binary path> <platform type>

The example below registers xxx/myMPIprograms/vdelay (the binary path) as using a Linux architecture.

$ legion_mpi_register vdelay xxx/myMPIprograms/vdelay linux

This creates MPI-specific contexts (/mpi, /mpi/programs, and /mpi/instances), a class object for this program, an implementation for this program, and registers the name in Legion context space.

The legion_mpi_register command may be executed several times if you have compiled your program on several architectures.

10.2.4 Running the MPI Application

MPI programs are started using the program legion_mpi_run:

legion_mpi_run  {-f <options file> [<flags>]} | 
	{-n <number of hosts> [<flags>] <command> [<arguments>]}

Parameters used for this command are:

-f <options file>		Allows users to run multiple MPI binaries with a common MPI_COMM_WORLD
-n <number of processes>	Specify the number of hosts on which the program will run

supported <flags> are:

-h <host context path>	Specify the set of hosts on which the program will run (default is the system's default placement)
-Ø <host context name>	Runs the first process (i.e., process zero) on this node
-p <PID context>	Specify a context name for PIDs (default is /mpi/instances/program_name)
-s			Print statistics at exit
-v			Verbose option (up to four of these can be specified for increased detail)
-d <Unix path name>	Specify that all children change to the specified directory before they begin to run.

Note that if the -f flag is used, the -n flag, program name, and any arguments will be ignored. Any <flags> used with -f will be treated as defaults and applied to all processes executed by this command, unless otherwise specified in the options file. The legion_mpi_run utility will expect one binary per line in the options file, each line including any necessary arguments as they would appear on the command line. Each line can also contain any of the legion_mpi_run flags except the -f flag.

If the -f flag is not used, the MPI application named in the legion_mpi_run command will then be started with the given flags.

For example:

$ legion_mpi_run -n 2 /mpi/programs/vdelay

Note that the <command> argument here is the Legion context space name for the class created by legion_mpi_register in the previous section. In this example the default path mpi/programs is used with the class name vdelay.

You can examine the running objects of your application using

$ legion_context_list /mpi/instances/program_name

This context will have an entry for each object in this MPI application. If you run multiple versions of the application simultaneously, use the -p flag with legion_mpi_run to specify an alternate context in which to put the list of objects.

To view all instances of the program's class, use legion_ list_instances:

$ legion_list_instances -c /mpi/programs/program_name

To destroy all of the program class's instances, use the legion_ destroy_instances command. Note, though, that this command will search out and destroy all of the class's instances.

Note that MPI programs cannot save their state or be deactivated. They simply die. This may have an impact on fault-tolerance, although we will it in a future release.

10.3 Example

There are several sample MPI programs in your Legion system. First, go to the directory containing the program's binary files:

$ cd $LEGION/src/examples

To compile the example code, enter:

$ make $LEGION/bin/$LEGION_ARCH/mpitest_c

You must then register the program with the legion_mpi_register command. The sample output below shows Legion creating three new contexts (mpi, programs, and instances, the last two being sub-contexts of mpi), creating and tagging a task class, and registering the implementation of the mpitest_c program.

$ legion_mpi_register mpitest_c $LEGION/bin/$LEGION_ARCH/mpitest_c linux
"/mpi" does not exist - creating it
"/mpi/programs" does not exist - creating it
"/mpi/instances" does not exist - creating it
Task class "/mpi/programs/mpitest_c" does not exist - creating it
"/mpi/instances/mpitest_c" does not exist - creating it
Set the class tag to: mpitest_c
Registering implementation of mpitest_c
$

In order to view the output of the program, you will need to create and set a tty object. The example below creates a tty object, sets it as an environmental variable, and then directs its output to the background. (For more information on using these commands, please see the Legion Reference Manual.)

$ legion_create_object -c class/ttyObjectClass mytty
1.01.67000000.01000000.000001fc0a6876...
$ legion_set_tty mytty
$ legion_tty_watch &

Note that this sequence adds the context name mytty in your current context. Users might actually find it more convenient to create a context to hold tty objects, to help organize their context space.

Having done all of this, you can now run the program, with the legion_mpi_run command to run the program. You must specify how many processes the program should use to run the program with the -n flag.

If you do not specify which hosts the program should run its processes on, Legion will randomly choose the hosts from the host objects that are in your system. Here it runs three processes on three different hosts.

$ legion_mpi_run -n 3 /mpi/programs/mpitest_c
Hello from node 0 of 3 hostname Host2
Hello from node 1 of 3 hostname Host3
Node 0 has a secret, and it's 42
Node 1 thinks the secret is 42
Hello from node 2 of 3 hostname BootstrapHost
Node 2 thinks the secret is 42
Node 0 thinks the secret is 42
Node 0 exits barrier and exits.
Node 1 exits barrier and exits.
Node 2 exits barrier and exits.
$

Another time it might run the three processes on two hosts.

$ legion_mpi_run -n 3 /mpi/programs/mpitest_c
Hello from node 1 of 3 hostname BootstrapHost
Hello from node 0 of 3 hostname Host3
Node 0 has a secret, and it's 42
Node 1 thinks the secret is 42
Hello from node 2 of 3 hostname BootstrapHost
Node 2 thinks the secret is 42
Node 0 thinks the secret is 42
Node 0 exits barrier and exits.
Node 1 exits barrier and exits.
Node 2 exits barrier and exits.
$

10.4 Scheduling MPI processes

As noted above, Legion may not choose the same number of hosts objects as processes specified in the -n parameter. If you specify three processes, Legion will randomly choose to run them on one, two, or three hosts.

If you wish to specify which hosts are used to run your processor, use the -h flag. In the example below, we will run mpitest_c on four hosts, which we will assume are already part of the system (see See Host and vault objects for information about adding hosts to a Legion system) and which, in this example, are named in a single context called mpi-hosts.

First, you must create the mpi-hosts context with the legion_context_create command:

$ legion_context_create mpi-hosts
Creating context "mpi-hosts" in parent ".".
New context LOID = "1.01.05.44b53908.000001fc0940fbf7

We can now assign new context names to the four hosts that we wish to use for this program, using the legion_ln command. These new names map to the host objects' LOIDs, and provide a convenient way to specify different hosts for different MPI programs or to execute the same program on different groups of hosts.

$ legion_ln /hosts/Host1 /mpi-hosts/mpitest_Host1
$ legion_ln /hosts/Host2 /mpi-hosts/mpitest_Host2
$ legion_ln /hosts/Host3 /mpi-hosts/mpitest_Host3
$ legion_ln /hosts/Host4 /mpi-hosts/mpitest_Host4

The program can now be run with legion_mpi_run, using the -h flag to specify that the four processes should be run on the hosts in mpi-hosts. Note that the nodes are placed on the hosts in order.

$ legion_mpi_run -h /mpi-hosts -n 4 /mpi/programs/emptiest
Hello from node 0 of 4 hostname Host1.xxx.xxx
Hello from node 1 of 4 hostname Host2.xxx.xxx
Node 0 has a secret, and it's 42
Hello from node 2 of 4 hostname Host3.xxx.xxx
Node 1 thinks the secret is 42
Node 2 thinks the secret is 42
Hello from node 3 of 4 hostname Host4.xxx.xxx
Node 3 thinks the secret is 42
Node 0 thinks the secret is 42
Node 0 exits barrier and exits.
Node 1 exits barrier and exits.
Node 2 exits barrier and exits.
Node 3 exits barrier and exits.
$

Note also that the -n parameter is still used, even though there are only four host objects named in the mpi-hosts context. The number of processes and number of host object do not have to match: if a user wanted to run only two processes on the hosts named in mpi-hosts, he or she could use -n 2. The program would then run on the first two hosts listed in the mpi-hosts context.1

$ legion_mpi_run -h /mpi-hosts -n 2 /mpi/programs/mpitest
Hello from node 0 of 2 hostname Host1.xxx.xxx
Node 0 has a secret, and it's 42
Hello from node 1 of 2 hostname Host2.xxx.xxx
Node 1 thinks the secret is 42
Node 0 thinks the secret is 42
Node 0 exits barrier and exits.
Node 1 exits barrier and exits.
$

10.5 Debugging support

Output from a Legion MPI object to stdout or Fortran unit 6 is automatically directed to a tty object. Note, though, that this output will be mixed with other output in your tty object, and you may wish to segregate MPI output. Legion MPI supports a special set of library routines to print on the Legion tty, which is created using the legion_set_tty and legion_tty_watch commands. This will cause the MPI object's output to be printed in segregated blocks.

LMPI_Console_Output (string) Buffers a string to be output later.
LMPI_Console_Output_Flush () Flushes all buffered strings to the tty.
LMPI_Console_Output_And_Flush (string) Buffers and flushes in one call.

The Fortran version of these routines adds a carriage return at the end of each string, and omits trailing spaces. The C version does not.

A utility program, legion_mpi_debug, allows the user to examine the state of MPI objects and print out their message queues. This is a handy way to debug deadlock.

legion_mpi_debug [-q] {-c <instance context path>}

The -q flag will list the contents of the queues. For example:

$ legion_mpi_debug -q -c /mpi/instances/vdelay
Process 0 state: spinning in MPI_Recv source 1 tag 0 comm?
Process 0 queue:

(queue empty)

Process 1 state: spinning in MPI_Recv source 0 tag 0 comm?
Process 1 queue:

MPI_message source 0 tag 0 len 8
$

Do not be alarmed that process 1 hasn't picked up the matching message in its queue: it will be picked up after the command has finished running.

There are a few limitations in legion_mpi_debug: If an MPI object doesn't respond, it will hang, and it won't go on to query additional objects. An MPI object can only respond when it enters the MPI library; if it is in an endless computational loop, it will never reply.

10.6 Functions supported

Please note that data packed/unpacked using the MPI_Pack and MPI_Unpack functions will not be properly converted in heterogeneous networks.

All MPI functions are currently supported in Legion MPI.


1. If you use the -h flag in this way, be sure to double-check the order in which the host objects are listed in your context.Back


Back to Getting Started sectional index

Back to Basic User Manual Table of Contents