Adapting and Running Applications
Legion aims to provide an easy-to-use environment in which users have access to all the resources of the worldwide metacomputing environment for their applications. The four application examples illustrated here show how the advanced features of Legion, such as flexible security and transparent file access, can be used to extend today's applications on to larger sets of resources.
Transparent Remote Execution
Legion allows programs to execute transparently and securely on remote hosts, taking advantage of Legion's distributed resources. Users can quickly and easily adapt their programs to run in Legion, although modification may not even be necessary. Legion also includes a remote "make" tool, which compiles binaries for other machine architectures without requiring the user to log in to another machine.
Parameter Space Studies
Not all applications are best suited to simple remote execution, so we have extended this capability to attack the class of problems known as parameter space searches. For example, the NASA NAS effort demands thousands of combinations of CFD computations, using a variety of wing designs, wing angles, and air speeds. A single computation requires running five programs in sequence, each reading the output files generated by the previous program. No individual program consumes significant CPU time, but the total CPU time consumed by all runs adds up to tens of thousands of hours.
We have written a simple tool to facilitate running parameter space studies under Legion. Users can specify input and output files, the maximum number of jobs to run on a particular host, and the total number of jobs to be run at a time. Once the initial jobs have started on each host, future jobs are sent to the hosts that have finished previous runs. This dynamic scheduling allows for load balancing and faster processing time.
Wide-Area Parallel Applications
Some existing parallel applications run more efficiently on more CPUs than are available on a single machine. For example, a DSMC (Direct Simulation Monte-Carlo) code in use at the University of Virginia to study vapor deposition onto surfaces has a low ratio of communications to computation, so that it can run on a large number of widely separated machines. This approach allows the solution of much larger problems, but can cause difficulties with some conventional parallel tools. Most vendor-supplied versions of MPI, a popular communications library, cannot be parallelized on multiple supercomputers. Successfully running this type of problem requires overcoming MPI's limitations as well as transparent access to files and appropriate scheduling support. Legion provides these capabilities today.
Another example of a wide-area parallel application that can benefit from Legion's collection of metacomputing resources is neural network modeling. Some types of models use relatively low amounts of communications and are limited only by the amount of RAM or disk bandwidth of the machines that they run on. We have parallelized such a model using the MPI interface of Legion, and hope soon to set a world record for the largest neural network run.
The most difficult category of applications that Legion supports are coupled applications, in which several initially distinct programs are glued together. One example we are working with under the NPACI umbrella is a coupled system attempting to predict the effect of the El Niño weather pattern on California. Global climate models do not have a sufficiently fine resolution to accurately predict precipitation over the California mountains, so we are coupling the global model with a regional model that more accurately predicts area weather. However, the two models were written by different scientific groups and in different languages. The global model is parallelized but the regional model is not. Legion can couple these models in an easy-to-use fashion that can be extended to include other models, such as an estuary model of San Diego Bay.
[Testbeds] [Et Cetera] [Map/Search]
This work partially supported by DOE grant DE-FG02-96ER25290, Logicon (for the DoD HPCMOD/PET program) DAHC 94-96-C-0008, DOE D459000-16-3C, DARPA (GA) SC H607305A, NSF-NGS EIA-9974968, NSF-NPACI ASC-96-10920, and a grant from NASA-IPG.