The Mentat Programming Language (MPL) is an extension of C++  and is designed to simplify, via parallelism encapsulation, the task of writing parallel applications. Parallelism encapsulation takes two forms, intra-object encapsulation and inter-object encapsulation. In intra-object encapsulation callers of a Mentat object member function are unaware of whether the implementation of the member function is sequential or parallel--i.e., whether its program graph is a single node or a parallel graph. In inter-object encapsulation code fragments programmers (e.g., a Mentat object member function) need not concern themselves with the parallel execution opportunities between the different Mentat object member functions that they invoke.
MPL is an object-oriented programming language that masks the difficulty of the parallel environment from the programmer. The granule of computation is the Mentat class instance, which consists of contained objects (local and member variables), their procedures, and a thread of control. Programmers are responsible for identifying those object classes that are of sufficient computational complexity to allow efficient parallel execution. Instances of Mentat classes are used just like ordinary C++ classes, freeing the programmer to concentrate on the algorithm, not on managing the environment. The data and control dependencies between Mentat class instances involved in invocation, communication, and synchronization are automatically detected and managed by the compiler and run-time system without further programmer intervention. By splitting the responsibility between the compiler and the programmer we exploit the strengths of each, and avoid their weaknesses. Our underlying assumption is that the programmer can make better granularity and partitioning decisions, while the compiler can correctly manage synchronization. This simplifies the task of writing parallel programs, making the power of parallel and distributed systems more accessible.
This allows the programmer to specify which C++ classes are of sufficient computational complexity to warrant parallel execution. This is accomplished using the mentat keyword in the class definition. Instances of Mentat classes are called Mentat objects: the programmer uses instances of Mentat classes much as he would any other C++ class instance. All of the communication and synchronization is managed by the compiler. The compiler generates code to construct and execute data dependency graphs whose nodes are Mentat object member function invocations and whose arcs are the data dependencies found in the program. This generates inter-object parallelism encapsulation (Figure 2a) in a manner largely transparent to the programmer. Of course, any one of the nodes in a generated program graph may itself be transparently implemented in a similar manner by a subgraph: this is intra-object parallelism encapsulation (Figure 2b), in which the caller only sees the member function invocation.
On a sequential machine there are three steps required for these statements: 1) multiply the matrices B and C and store the result in X, 2) multiply matrices D and E, and 3) multiply X by the result of step 2. If we assume that each multiplication takes one time unit three time units are required to complete the computation.
In Mentat, the compiler and run-time system detect that the first two multiplications, B*C and D*E, are not data dependent on one another and can be safely executed in parallel. The two matrix multiplications will be executed in parallel in a single step, and the result will be automatically forwarded to the final multiplication. That result will be forwarded to the caller, and associated with A.
The difference between the programmer's sequential model, and the parallel execution of the two multiplies afforded by Mentat, is an example of inter-object parallelism encapsulation. In the absence of other parallelism or overhead, the speedup for this example is a modest 3/2 = 1.5.
However, that is not the end of the story. Additional intra-object parallelism may be realized within the matrix multiplication. Suppose the matrix multiplications are themselves executed in parallel (with the parallelism detected in a manner similar to the above). Further, suppose that each multiplication is executed in eight pieces (as in Figure 2b). Assuming zero overhead, the total execution time is 0.125 + 0.125 = 0.25 time units, resulting in a speedup of 3/0.25 = 12. As matrix multiplication is implemented using more pieces even larger speedups result.
The Mentat philosophy on parallel computing is guided by two observations. First, the programmer understands the problem domain of the application better than the compiler and can therefore make better data and computation partitioning decisions than the compiler. The truth of this is evidenced by the fact that most successful parallel applications have been hand-coded with low-level primitives. In these applications the programmer has decomposed and distributed both the data and the computation.
On the other hand, the management of tens to thousands of asynchronous tasks, where timing-dependent errors are easy to make, is beyond the capacity of most programmers unless a tremendous amount of effort is expended. The truth of this is evidenced by the fact that writing parallel applications is almost universally acknowledged to be far more difficult than writing sequential applications. Compilers, however, are very good at ensuring that events happen in the right order, and can more readily and correctly manage communication and synchronization, particularly in highly asynchronous, non-SPMD, environments.
A key feature of Mentat is the transparent encapsulation of parallelism within and between Mentat object member function invocations. Suppose, for example, an instance matrix_ops of a matrix_operator Mentat class with member function mpy, which multiples two matrices and returns a third matrix. When the user invokes mpy in an operation the choice of sequential or parallel implementation is not important: the user just wants a correct answer. Intra-object parallelism encapsulation will make this sequential/parallel implementation decision transparent. Similarly, opportunities of parallelism between Mentat object functions is also made transparent via inter-object encapsulation. The compiler must ensure data dependencies between invocations.
In C++ objects are defined by their class and each class has an interface section in which member variables and member functions are defined. MPL does not require that all class objects should be Mentat objects. In particular, objects that do not have a sufficiently high communication ratio, i.e., whose object operations are not sufficiently computationally complex, should not be Mentat objects. Exactly what is complex enough depends on the architecture involved. In general, it should be a minimum of several hundred executed instructions long. At smaller grain sizes the communication and run-time overhead takes longer than the member function; resulting in a slow-down rather than a speed-up.
Mentat uses an object model that distinguishes between two "types" of objects, contained objects and independent objects (Figure 3). Contained objects are objects contained in another object's address space. Instances of C++ classes, integers, structures, and so on, are contained objects. Independent objects possess a distinct address space, a system-wide unique name, and a thread of control. Communication between independent objects is accomplished via member function invocation. Independent objects are analogous to UNIX processes. Mentat objects are independent objects. The programmer defines a Mentat class by using the keyword mentat in the class definition. The programmer may further specify whether the class is stateful, sequential, or stateless. For a description of these semantics, including MPL support for fault-tolerance and dynamic load balancing, please see the MPL manual.
In this example the loop is unrolled at run-time and up to N instances of the integer_ops class may execute in parallel. Note that parallel execution of the B.mpy() operation is achieved simply by using the member function. All of the A.add() operations are executed on the same object instance which is created and initialized with the integer_accumulator A(0); declaration.
In C++, when the scope in which X is declared is entered, a new integer is created on the stack. In MPL, because MC is a stateful Mentat class, foo's data and code representation are automatically created (possibly on a different machine) and its name is bound to the locally created variable foo. When the scope is left, foo is destroyed along with the data and code to which it was bound. Though different in the sense that Mentat objects are distributed, their basic usage is the same as C++ objects.
There are important differences between Mentat objects and standard C++ objects however. It is often desirable in Mentat to create objects that outlive their scope, such as a Mentat object that acts as a server object. In this situation an application may want to create this server object and leave it running so that other applications can bind to it and then use it. This is analogous to a C++ program wanting to use an object created in a different program. While C++ has no mechanism that can handle this, the MPL does.
For further information about MPL, please see "The Mentat Programming Language Reference Manual for Legion," available on the Legion web site (<http://legion.virginia.edu/documentation.html>).