Universtiy of Tennessee (J. Dongarra)
University of California, San Diego (H. Casanova)
Stage 2: today
Stage 3: 3rd quarter 1999
Stage 4: 3rd quarter 2000
Stage 5: 3rd quarter 2001

NetSolve allows users to access computational resources that are distributed across a heterogeneous network. The NetSolve system consists of computational servers that solve problems and communication servers that forward requests to computational servers. A server may join or leave the system at any time. The NetSolve system has load balancing and fault tolerance capabilities, which are important in a heterogeneous network. Given a problem, a NetSolve communication server attempts to decide which available computation server is the most suitable for that problem. This load balancing decision is based on a predictive performance model that takes into consideration network latency and bandwidth, size of input data and result, algorithm complexity, and the raw performance and estimated current workload of machines. Fault tolerance is implemented by having a communication server return a ranked list of possible computational servers for a given problem to a client so that the client can migrate a problem to a different computational server if it detects a failure. NetSolve can handle any computational problem whose inputs and outputs can be expressed interms of matrices, vectors, and scalars. New problems can be added by using a pseudo-compiler that takes as input a configuration file describing the problem, including what library routine(s) to call to solve it, and produces as output the actual C code of a computational process. The NetSolve system has both interactive and programming interfaces. Using one of the programming interfaces, a user can write a NetSolve application that uses coarse-grained parallelism.

Clients are built using underlying low level libraries and connect at run-time to a manager process. Although inter-client messages travel via the manager by default, it is possible to establish direct, point-to-point links when necessary. The manager processes are connected to each other via multiple I/O channels. Interprocess communication occurs via TCP/IP sockets, pipes, shared memory or pseudo-terminals, depending on what the two communicating components can support. In order to avoid being a bottleneck, the manager process generally uses shared memory to communicate with its clients and is multithreaded.

For additional information, see

[NPACI Roadmap]
[AppLeS] [ATLAS] [HyperSQL] [KeLP] [LAPACK]
[Legion] [Metadata browser] [NetSolve] [Planguages]
[QueryDesigner] [QML] [ScaLAPACK]

Last modified: Thu Jan 21 14:31:55 1999