The Process Introspection Project


Apparatus Aspicet Intus

[Introduction| Prototype Implementation| Current and Future Directions| More Information| Related Links]

Introduction

The Process Introspection project is a design and implementation effort, the main goal of which is to construct a general purpose, flexible, efficient checkpoint/restart mechanism appropriate for use in a high performance heterogeneous distributed systems. This checkpoint/restart mechanism has the primary constraint that it must be platform independent; that is, checkpoints produced on one architecture or operating system platform must be restartable on a different architecture or operating system platform. The Process Introspection mechanism is based on a design pattern for constructing interoperable checkpointable modules. Application of the design pattern is automated by two levels of software tools: a library of support routines that facilitate the use of the design pattern, and a source code translator that automatically applies the pattern to platform independent modules. A prototype implementation of library has been constructed and used to demonstrate that the design pattern can be applied effectively to construct platform independent checkpointable programs that operate efficiently.

Process Introspection is the ability of a process to examine and describe its own internal state in a logical, platform independent format. In some senses, all processes that employ a custom programmed checkpoint/restart implementation utilize the concept of Process Introspection. The Process Introspection extends this technique of hand coding checkpoint/restart functionality for individual processes into an integrated approach in which the development of checkpointable program modules is completely automated when possible, or is at least rendered significantly less complex through the use of library tools and a general design pattern when a handed coded checkpoint facility for a module is still appropriate. The system design consists of the following components:

Initial Implementation Efforts

Prototype implementations of the PIL, a simple CCC module, and the APrIL compiler have been constructed and used as the basis for feasibility demonstrations and initial performance and cost analysis. A set of sample applications were transformed using the APrIL source-to-source compiler, linked with the PIL and basic CCC, and executed to examine typical runtime overheads, to gain an initial insight into the impact on back end optimizations, to determine checkpoint request service wait times (i.e. the time between checkpoint request and service), and to measure basic checkpoint and restart costs. These initial tests also demonstrate the fundamental feasibility of the process introspection technique, as each of the example programs was verified as checkpointable/restartable across the following platforms:

The interface selected for the simple CCC overloads the "control-C" interrupt of a process to checkpoint and exit the running program instead of simply terminating it. Later, when the program is run again, the CCC notes the presence of a checkpoint, and uses it to implement a restart instead of allowing the process to start up normally. Initial performance results obtained using the prototype compiler and library indicate that the system can be used to achieve very low checkpoint request wait times (0.01 to 1.0 milliseconds on average), and that it introduces little or no run-time overhead into the normal operation of transformed programs. Overhead to to the code inserted by the compiler and it mpact on back-end optimizations has been found to be generally low (0%-15%), but is application dependent and tunable via certain trade-offs (for example, a slightly higher checkpoint request wait time can be traded off for less optimizer interference).

A very simple example of using process introspection to automatically implement the checkpoint mechanism for a simple matrix multiply program is available here. The original code is listed here. The APrIL transformed version of code is listed here. A key important feature to note about the transformed code is that it contains "consistency points" at which the process polls the "PIL_CheckpointStatus" variable to check the "PIL_StatusCheckpointNow" bit to determine if a checkpoint should be performed. If the checkpoint is requested (checkpoint requests interrupt the process and set the "PIL_CheckpointStatus" variable), the code uses the normal C "return" mechanism to traverse the stack saving all local variables and actual parameters. An equally important feature of the transformed code is the addition of a prologue to each function that checks the "PIL_CheckpointStatus" variable for the "PIL_StatusRestoreNow" bit. If this bit is set, the process uses the normal C function call mechanism to restore the stack (restoring the actual parameter and local variable values each function as it restores its stack frame), and the C goto mechanism to jump to the right code location in each stack frame.

Current and Future Directions

Ongoing work on the Process Introspection project centers on the following areas:

More Information

Links to Related Projects


ferrari@virginia.edu
Last modified Mon Sep 9 17:33:15 EDT 1996
Visitors so far: