The next several years will see the widespread introduction and use of gigabit wide-area and local-area networks. These networks have the potential to transform how people compute and, more importantly, how they interact and collaborate with one another. The increased bandwidth enables the construction of wide-area virtual computers, or metasystems, that provide a wide-area environment of interconnected workstations, personal computers, graphics rendering engines, supercomputers, and non-traditional devices (televisions, toasters, etc.). Realizing this environment requires new conceptual models to provide a solid, integrated foundation for applications that can unleash these diverse resources' potential. This kind of foundation must fulfill several conditions, such as rendering the underlying physical infrastructure transparent to users and most programmers; supporting access-, location-, and fault-transparency; allowing interoperation between its components; supporting construction of larger, integrated components; guaranteeing system security; and scaling to millions of autonomous hosts.
Legion will tap the enormous computational capabilities of resources connected by local-area networks, enterprise-wide networks, and the National Information Infrastructure. While you sit at a single workstation, Legion transparently schedules application components on processors, manages data transfer and coercion, and provides communication and synchronization. System boundaries, data location, and faults will be invisible; processor cycles, communication channels, and data will be shared; and a workstation across the continent will be as accessible as the workstation down the hall.
Legion is not a centralized system controlled by any single person or organization. Each Legion system is an independent entity that can connect to other systems as desired. Individual systems control access to their resources. Individual users, once logged on to a local system, are in a wide-ranging network of other Legion systems and, depending on security and access protocols, can use the resources of any other part of that network.
Legion was first started in the fall of 1993. Our goal is to provide a flexible, efficient, and scalable system based on solid principles. When complete, Legion will provide a single, coherent virtual machine that provides universal object and name spaces, application-adjustable fault tolerance, improved response time, and greater throughput.
Realizing this vision is not trivial, but several design objectives have been determined to be essential to its success. Among them are site autonomy, multiple languages and interoperability, high performance via parallelism, and a single persistent name space.
Legion is not a monolithic system. As the system grows it must integrate resources owned and controlled by a wide array of organizations. There is simply no way that thousands of organizations and millions of users will subject themselves to the dictates of a "big brother" centralized control mechanism or subject their resources and data to external management. Organizations, quite properly, will insist on having control over their own resources. They must be free to specify how much of a resource can be used, when it can be used, and who may or may not use the resource.
Legion must support applications written in a variety of languages. Heterogeneous source-language application components must be integrated in much the same manner that heterogeneous architectures are integrated. Legion must also support legacy codes and work with standards such as CORBA  and DCE .
This does not mean that all applications will be parallel -- Legion will necessarily best support relatively course-grain applications. This objective should not be misinterpreted to mean that we think a single application will ever use all of the computers in the country. Most parallel applications will use only a small subset of the total resource pool at any time.
One of the most significant obstacles to wide-area parallel processing is the lack of a single name space for data and resource access. The existing multitude of disjoint name spaces makes writing applications that span sites extremely difficult. Any Legion object should be able to transparently access (subject to security constraints) any other Legion object without regard to location or replication.
Object-orientated programming provides a coherent solution to these problems. In Legion, all components of interest to the system are objects and all objects are instances of defined classes. Thus users, data, applications, even class definitions are objects. An object-oriented foundation (including encapsulation and inheritance) offers many benefits, such as software reuse, fault containment, and reduction in complexity. There is a particularly pressing need for such a paradigm in a system as large and complex as Legion.
Objects written in an object-oriented language or a language such as HPF (High Performance Fortran) will encapsulate their implementation, data structures, and parallelism and interact with other objects via well-defined interfaces. They may have associated inherited timing, fault, persistence, priority, and protection characteristics as well. Naturally, these characteristics may be overloaded to provide different functionality on a class by class basis. Similarly, a class may have multiple implementations with the same interface.
Complementing the use of an object-oriented paradigm is one of Legion's driving philosophical themes: we cannot design a system that will satisfy every user's needs. Legion is designed to allow both users and class implementors flexibility in their applications' semantics. Whenever possible, users choose both the kind and the level of functionality and make their own trade-offs between function and cost.
For instance, users have widely varying standards of security. Some users are more concerned with privacy, some are content maintaining their data's integrity, and others want both privacy and integrity. Banks and hospitals, for example, must provide high levels of privacy and integrity. But there is a difference between the kind of security functionality and the degree of security. Some users are content with password authentication while others want more stringent user identification, such as signature analysis, fingerprint identification, etc. A user might be willing to pay a higher cost (in terms of CPU, bandwidth, or time) for a more powerful cryptographic key, so as to have stronger security without changing the basic nature of the type of security provided. An application that requires low overhead cannot afford such a policy and should not be forced to use it. Such an application might prefer a light-weight policy that merely verifies communication integrity or perhaps one with no security at all. The point is that users can decide what trade-offs to make, whether by implementing their own policies or using existing policies via inheritance , instead of coping with an inevitably unsatisfactory fixed security mechanism.
Next, consider the problem of maintaining consistent semantics in a distributed file system. To achieve good performance, it is often desirable to copy all or part of a file. If updates to the file are permitted, the different copies may begin to diverge. There are many ways to circumvent this problem: not replicating writable files, using a cache-invalidate protocol, using lazy updates to a master copy, etc. Some applications don't even require all copies to be the same, while others require a strict "reads deliver the last value written" semantics, and still others consider a file to be read-only. In the last case, consistency protocols are a waste of time. On the other hand, some applications may need different semantics for files in different regions of the application. Independent of file semantics, some users will want automatic backup and frequent archiving, while others may not. The user -- not the system -- should make such decisions.
This philosophy has been extended into the system architecture. The Legion object model specifies the composition and functionality of Legion's core objects (see page 144). It specifies the functionality but not the implementation of the system's core objects. The core therefore consists of extensible, replaceable components. The Legion project provides implementations of the core objects but you are not obligated to use them. Instead, users can select or construct objects that implement mechanisms and policies that meet specific requirements.
The object model provides a natural way to achieve this kind of flexibility. Files, for example, are not part of Legion itself. Anyone can define a new class whose general semantics would be recognized as a file but whose specifics match a particular set of semantics. The current Legion software system provides an initial collection of file classes that reflect the most common needs.
We released the first prototype Legion implementation, the Campus-Wide Virtual Computer (CWVC), in the summer of 1995. This first implementation was based on Mentat, an earlier object-oriented parallel processing system. Mentat was originally designed to operate in homogeneous, dedicated environments, but was extended to operate in an environment with heterogeneous hosts, disjoint file systems, local resource autonomy, and host failure. We could have continued to stretch Mentat but felt that a system can be manipulated only so far before it shows signs of stress, and that it would be better to design a new system and work with a clean, coherent architecture, rather than a patchwork of solutions for unrelated problems.
It was a direct extension of Mentat to a larger scale, and was a prototype for the nationwide Legion system. Even though the CWVC was much smaller and the components much closer together, it presented many of the same challenges as a nationwide Legion system. The computational resources of the University were operated by different departments and had no shared name space, few shared resources, heterogeneous processors, irregular interconnection networks, bandwidth and latency differences on the scale of orders of magnitude, and on-site applications that required protection. Furthermore, each department operated as an island of services, using individual NFS-mount structures and trusting only machines on its own island.
The CWVC consisted of over one hundred workstations and an IBM SP-2 in six buildings using two completely disjoint underlying file systems. We had developed a suite of tools to address common problems, and, in collaboration with domain scientists at the University of Virginia and elsewhere, we developed a set of applications which exploited the environment.
We ran the CWVC in the local environment, and demonstrated it on wide-area systems. During the Supercomputing '95 conference in San Diego it ran on the I-Way, an experimental network connecting the NSF supercomputer centers, several of the DOE and NASA labs, and a number of other sites. Many of the connections were DS-3 (45 mb/sec) and OC-3 (155 mb/sec) rates.
The CWVC was installed at three sites, using seven hosts of three different architectures. At NCSA (Urbana) we used four SGI Power Challenges and the Convex Exemplar, and at CTC (Cornell) and ANL (Argonne) we used IBM SP-2s.
Once the IP routing tables had been properly configured, moving the CWVC to the wide-area environment was relatively simple. We copied the CWVC to the platforms, adjusted the tables to use IP names that routed through the high-speed network, and tested the system. As expected, files in different locations could be transparently accessed, executables could be moved transparently from one location to another as needed, the scheduler worked, and the system automatically reconfigured on host failure. Utilities and tools such as the debugger also migrated easily. The real bonus, though, was that user applications required no changes to run in the new environment.
For our demonstration, we exercised our utilities and ran complib, one of our applications, on the I-Way. This application compares two DNA or protein sequence databases using one of several selectable algorithms . The first database was located at ANL, while the second was located at NCSA. The application transparently accessed the databases using the Legion file system while the underlying system schedulers placed application computation objects throughout the three-site system. All communication, placement, synchronization, and code and data migration were transparently handled by Legion.
Since then we have repeated the demonstration several times and have built npacinet, a testbed for Legion users. Please see <http://legion.virginia.edu/npacinet.html> if you are interested in getting a npacinet account.
The vision of a seamless metacomputer such as Legion is not new -- worldwide computers have been the territory of science fiction and distributed systems research for decades. To our knowledge, no other project has the same broad scope and ambitious goals that Legion has. However, a large body of relevant research in distributed systems, parallel computing, fault-tolerance, workstation farm management, and pioneering wide-area parallel processing projects provides a strong foundation for building a metacomputer.
Related efforts such as OSF/DCE (Open Software Foundation/Distributed Computing Environment) are rapidly becoming industry standards (<http://www.opengroup.org/dce/>) . Legion and DCE share many of the same objectives, and draw upon the same heterogeneous distributed computing literature for inspiration. Consequently, both projects use many of the same techniques (i.e., object-based architecture and model, Interface Descriptive Languages [IDLs] to describe object behavior, and wrappers to support legacy code). However, Legion and DCE differ in fundamental ways: DCE does not target high-performance computing and its underlying computational model is based on blocking remote procedure calls between objects. DCE also does not support parallel computing but places its emphasis on client-server-based distributed computing. Legion is based on a parallel computing model, and one of its primary objectives is high-performance via parallel computing. Legion also specifies very little about implementation, since users and resource owners are permitted and even encouraged to provide their own implementations of "system" services. Our core model is completely extensible, and provides choice at every opportunity, from security to scheduling.
Similarly, CORBA (Common Object Request Broker Architecture) defines an object-oriented model for accessing distributed objects (<http://www.corba.org/>) . CORBA includes an IDL, and a specification for the functionality of run-time systems that enable access to objects (object request brokers, or ORBs). But, like DCE, CORBA is based on a client-server model, rather than a parallel computing model, and less emphasis is placed on issues such as object persistence, placement, and migration.
The Globus project at Argonne National Laboratory and the University of Southern California has similar goals, as well as a number of similar design features (<http://www.globus.org/>) . Both systems also support a range of programming interfaces, including popular packages such as MPI. They differ significantly, however, in their basic architectural techniques and design principles.
Globus adds value to existing high-performance computing systems by rendering them interoperable and by extending their implementations to operate effectively within a wide-area distributed environment. This sum-of-services approach has a number of advantages, such as taking advantage of code reuse and allowing the user to retain familiar tools and work environments. However, as the amount of provided services grows the lack of a common programming interface and model becomes an increasingly significant burden for end users. Legion's common-object programming model allows users and tool builders to employ the many services needed to effectively use a metacomputing environment: schedulers, I/O services, application components, etc. Essentially, Legion feels that the short-term advantages of patching existing parallel and distributed computing services together do not outweigh the long-term necessity of constructing a metacomputing software system on an extensible design made up of orthogonal building blocks.
The Globe project, developed at Vrije Univeriteit, Netherlands, also shares many goals and attributes with Legion (<http://www.cs.vu.nl/~steen/globe/>) . Globe and Legion are both middleware metasystems that run on top of existing host operating systems and networks, both support flexible implementation, both have a single uniform object model and architecture, and both use class objects to abstract implementation details. But where a Globe object is passive and assumed to be physically distributed over potentially many resources in the system, a Legion object is active and is expected to physically reside within a single address space.
These conflicting views lead to different types of interobject communication: Globe loads part of the object into the caller's address space, but Legion sends a message of a specified format from the caller to the callee. Legion also has a set of core object types that are designed to provide abstractions for a wide variety of implementations. To our knowledge Globe does not offer an equivalent set of tools. It also does not address key issues such as security and site autonomy .
In its intended application for distributed collaboration and information systems, Legion might be compared to the World Wide Web. In particular, the object-oriented, secure, platform-independent, remote execution model afforded by the Java language  has added more Legion-like capabilities to the Web. The most significant differences between Java and Legion lie in Java's lack of a remote method invocation facility, lack of support for distributed memory parallelism, and its interpreted nature, which even in the presence of "just-in-time" compilation leads to significantly lower performance than can be achieved using compilation. Furthermore, the security and object placement models provided by Java are rigid, and are a poor fit for many applications.
Other projects share many of the same objectives, but not the scope of Legion. Nexus  provides communication and resource management facilities for parallel language compilers. Castle  is a set of related projects that aims to support scientific applications, parallel languages and libraries, and low-level communication issues. The NOW  project provides a somewhat more unified strategy for managing networks of workstations, but is intended to scale only to hundreds of machines instead of millions.
Although we are committed to the object-oriented paradigm, we recognize that Legion applications will be written in a variety of languages in order to support existing legacy code, permit organizations to use familiar languages (C, Fortran), and support parallel processing languages. We will provide support for multiple languages and interoperability between user objects written in these languages by generating object "wrappers" for code written in languages such as Fortran, Ada, and C by exporting the Legion run-time system interface and retargeting existing compilers and by combining these two methods.
Legion has both PVM and MPI interfaces. Both interfaces include functions for data marshaling, message passing, and task creation and control. Legion's PVM and MPI implementations provide an environment that includes Legion security and placement services. For information about the PVM and MPI libraries, please see sections 9.0 and 10.0 in the Basic User Manual.
Legacy codes and programs written in other languages can be incorporated into Legion applications by encapsulating them in an object wrapper, as shown in Figure 1, above. Object wrappers can be hand generated, using the Mentat Programming Language (please see the "Mentat Programming Language Reference Manual," available from the Legion Group) or generated with an IDL and an interface compiler. In the case of the IDL, the programmer provides a "class definition" of the object, which lists the functions, type and number of any parameters, the object's state, etc.; a compiler generates the interface. This is a common technique, and is used in the OMG ORB . Additional IDLs can be readily supported, because class objects have the capability to export multiple interfaces or views of a class.