Legion [5, 6] is a distributed computing platform
that combines very large collections of independently administered
machines into single, coherent environments. Like traditional
operating systems, it builds on a diverse set of lower-level
resources and provides convenient user abstractions, services, and policy
enforcement mechanisms. The difference is that Legion's
lower-level resources can include thousands of heterogeneous
processors, storage systems, databases, legacy codes, and user objects,
all distributed over wide-area networks spanning multiple administrative
domains. Legion provides the means to pull these scattered components
together into a single, object-based metacomputer that
accommodates high degrees of flexibility and site autonomy.
Security is an essential part of Legion's design. In a metacomputing
environment, the security problem can be divided into two main concerns: first, it must protect the metacomputer's high-level resources, services, and
users from each other and from corrupted underlying resources, and,
second, it must preserving the security policies of the underlying resources
that form the foundation of the metacomputer and minimizing their
vulnerability to attacks from the metacomputer level.
The first category includes determining who can configure a metacomputer-wide
scheduling service: the solution
requires metacomputer-specific definitions of identity, authorization,
and access control. The second category includes enforcing a policy
to permits only those metacomputer users with local accounts to run
jobs on a given host, which might require mapping
between local identities and metacomputer identities.
To satisfy users and administrators, a full security solution must
address and reconcile both of these concerns. Users must be
confident that the data and computations they create within the
metacomputer are adequately protected. Similarly, administrators
need assurance that adding their resources to a metacomputer, making those
resources more accessible and valuable to users, will not
make their system more vulnerabile.
Attempting to incorporate security as an add-on late in the
implementation process has proved problematic in a number of
first-generation metacomputing systems (such as PVM, MPI, and Mentat),
so the Legion group has addressed security issues since
the earliest design phases . Our metacomputing security model has three interrelated design goals:
These goals are motivated by our view that a fundamental
metacomputer characteristic is the ability to scale over and across
multiple trust domains. A Legion "system" is really a federation of
meta- and lower-level resources from multiple domains, each with its own
separately evaluated and enforced security policies. As such, there is
neither a central kernel or trusted code base to monitor and control all
user-resource interactions nor a "superuser" to control all of the
resources in a Legion system.
- The framework must adapt to many
different security policies and must allow multiple policies to coexist.
- Organizations and users within a metacomputing
environment should be able to independently select and enforce their desired
- The metacomputer's architectural framework must
enable a rich set of security policy features.
Besides these goals, our architecture
must allow the implementation of a number of different security
features if it is to satisfy a broad range of security needs. These include:
The remainder of this discussion looks at how Legion's security model supports a metacomputing architecture based on our design goals. It addresses both parts of the metacomputing security problem and presents examples of mechanisms that enable a number of useful security policies and that we have designed or implemented within the architecture.
- Components in the metacomputer should be able
to insulate themselves from security breaches in other parts of the
system. This feature is particularly important in large-scale
systems, where we must assume that at least some of the
underlying hosts have been compromised.
- Access control
- Resources typically require access control
mechanisms that embody authentication and authorization policies.
- The ability to assert and confirm identity is an
essential feature for access control, nonrepudiation, and other basic
- Detection and recovery
- A metacomputing environment should support
mechanisms for detecting intrusion and misuse of resources and for
recovering after a security breach.
- Communication privacy and integrity
- Communication over the
networks that bind the metacomputer together may need to be encrypted
or protected if the networks cannot themselves be trusted.
- Existing security standards such as Kerberos, ssh,
DCE, etc., may need to be integrated into the metacomputing environment to
satisfy local administrative policy and to handle legacy software.