| CSDL Overview | RTLs | Lambda-RTL | Basic RTL Operator Library | SPARC Description | Pentium excerpts |
Bare RTLs are both spartan and verbose. Expressions do not include if-then-else, so conditionals must be represented by using guards on effects. There is no expression meaning ``undefined;'' assignments of undefined values must be specified using a kill effect. These restrictions, and the requirement that operations be fully disambiguated, make RTLs a form that is good for manipulation by tools but not so good for writing specifications.
lambda-RTL is a metalanguage that enables specification writers to attach RTL trees to SLED-like constructors without having to write everything explicitly. lambda-RTL is a higher-order, strongly typed, polymorphic, pure functional language based largely on Standard ML [cite milner:definition]. A variation on the Hindley-Milner type system makes it possible to write flexible, type-safe functions without having to write types explicitly.
We have drawn on our experience with SLED [cite ramsey:specifying] to identify mechanisms and properties that are desirable in any CSDL language, including lambda-RTL. These include:[*]
Address,
the location designated by
that effective address should be the default meaning.
lambda-RTL descriptions are easier to write than bare RTLs in three ways: grouping and higher-order functions can help eliminate repetition, the type system largely eliminates the need to write sizes explicitly, and lambda-RTL relaxes several of the restrictions on the form of RTLs. In particular,
Most programmers are used to writing x := x + 1 and having the x on the left denote a location while the x on the right denotes the value stored in that location. Typical compilers identify ``lvalue contexts'' and ``rvalue'' contexts and automatically insert fetches in rvalue contexts. We do the same in lambda-RTL, but instead of using syntax to identify the contexts, we use types.
We use types and not syntax because lambda-RTL has no special syntax for writing RTL assignments. Instead of an assignment syntax, lambda-RTL provides a built-in store function that accepts a location and a value and produces an effect. Any user-defined function might result in a call to the built-in store, so we have to recognize the right and left contexts by their types. Thus, if a location is used where a value is expected, we insert a fetch.
lambda-RTL does almost everything with functions, not syntax, so the writer of a specification can normally redefine the meaning of a notation by defining a new function with the same name. This strategy doesn't work with implicit fetches because there is no explicit notation associated with a fetch. We want users to be able to control the meanings of these fetches, however, because many machines have resources that are almost, but not quite, sequences of mutable cells. For example, SPARC registers can be viewed as a collection of 32 mutable cells, except that register 0 is not mutable and always contains 0. We would like users to be able to define special meanings for ``fetch from register 0'' and ``store into register 0'' so the rest of the specification can pretend that the registers are an ordinary storage space. We do so by permitting users to attach fetch and store methods to each storage space. Methods not given explicitly default to the standard fetch and store operators. Sample fetch and store methods for the SPARC are shown on page [->]. If we wanted, we could use fetch and store methods to describe the true implementation of SPARC registers, in which ``registers'' 8 through 31 denote locations accessed indirectly through the register-window pointer (CWP).
Many machine instructions manipulate fragments of a word stored in a mutable cell. For example, some machines represent condition codes as individual bits within a program status word. Many machines have instructions that, for example, assign to the least-significant 8 bits of a 32-bit register. To make it easy to specify such instructions, lambda-RTL creates the illusion that a sub-range or ``slice'' of a cell can be a location in its own right, one that is a suitable argument for a fetch or store operation. This illusion helps keep machine descriptions readable; for example, an effect that sets the SPARC overflow bit simply assigns to it, hiding the fact that it is buried in a program status word that has to be fetched, modified, and stored.
lambda-RTL uses a special syntax for slices, because we hope eventually to
overload the notation to refer either to locations or to values.
Until we get this overloading figured out, however, uses have to be
distinguished by a ty suffix that must be either loc or
bits.
Examples of the syntax include
| x@ty[k] | Bit k of x. By default, bit 0 is the least significant bit. A future version of lambda-RTL may make it possible to change the numbering. |
| x@ty[k_1..k_2] | Bits k_1 through k_2 of x, inclusive. |
| x@ty[k bits at e] | A k-bit slice of x, with the least significant bit at e. |
Given a slice specification sigma, SLICE_LOC sigma maps locations to locations and SLICE_BITS sigma maps values to values. The illusion that slices can be applied to locations is implemented by rewriting, according to the following rules:
where l is a location and n is a value. After the rewriting, all slices operate on bit vectors, and all fetches and stores operate on true locations. Invocation of user-defined fetch and store methods takes place after the rewriting of slices. This ordering makes it possible to use fetch and store methods to define cell-like abstractions, while ensuring that the meaning of slicing is always consistent with respect to such abstractions.
FETCH(SLICE_LOC sigma l) becomes SLICE_BITS sigma(FETCH l) SLICE_LOC sigma l <--n becomes l <--bitInsert_sigma(FETCH l, n)
lambda-RTL provides the special syntax
$space[offset]
for references to mutable cells.
The offset can be an arbitrary expression, but
the space must be a literal name, so that lambda-RTL can identify
the storage space and use appropriate fetch and store methods.
To make this cell a location, lambda-RTL applies an aggregation, which is
also associated with the storage space as a method.
The default method is the identity aggregation, which permits
only ``aggregates'' of a single cell.
When little-endian, big-endian, or other aggregations are used, lambda-RTL will infer the size of the aggregate. We have not yet determined the rules to be used for the inference, but we hope at least to support the automatic inference that four 8-bit bytes must be aggregated to form a value that goes into a 32-bit register.
From the point of view of external tools, the output of a lambda-RTL specification is a set of bindings of values to attributes of constructors. The part of lambda-RTL used to specify values is a pure, typed functional language without recursion. Many features of lambda-RTL are inspired by Standard ML.
To describe syntax, we use an EBNF grammar with standard metasymbols for {sequences}, [optional constructs], and (alternative | choices). We use large metasymbols to help distinguish them from literals. Terminal symbols given literally appear in typewriter font. Other terminal symbols and all nonterminals appear in italic font. Excerpts from the grammar begin with the name of a nonterminal followed by the ::== (``produces'') symbol.
A lambda-RTL specification is a sequence of modules.
lambda-RTL uses modules to organize the name space of values.
A value x declared in a module S is visible within S as
just x, but outside S it can only be referred to as S.x.
Modules are written
module ::== module module-name is {import} {declaration} endModules may include declarations of nested modules as well as values.
At top level, modules provide a kind of separate compilation. lambda-RTL requires explicit import directives for references between top-level modules.
import ::== import (module-name | [ {module-name} ]) | from module-id import (name | [ {name} ])
module-id ::== module-name | module-id . module-nameFor example, before module
A can use values defined in module B,
module A must include the directive import B.
This directive makes B visible within A, as well as qualified
names beginning with B.
The from... import directive imports identifiers directly
into module A.
Not only can these identifiers be used without qualification; they
carry the same fixity (infix nature, operator precedence, and
associativity) that they had in module B.
For example, if the module StdOperators defines infix identifiers
+ and - to denote addition and subtraction, then if
module A includes the directive
from StdOperators import [+ -]then these identifiers are also infix in module
A, where it is
legal to write, e.g., the function definition fun inc x is x + 1.
Except as changed by an import directive, the scopes of names are
limited to the modules in which they are defined.
Values and modules share a single name space within lambda-RTL.
Constructors and storage spaces occupy their own individual name
spaces, which are flat.
RTL operators occupy a separate, flat name space,
but within lambda-RTL they are not distinguished from other kinds of
values.
Eventually, the lambda-RTL translator will check that the same name is not
used to declare distinct RTL operators in two different modules.
Within a module, declarations are either top-level or inner declarations. Inner declarations may appear anywhere, but top-level declarations may appear only outside of function and value definitions. Top-level declarations appear only in contexts that guarantee they will be evaluated exactly once.
lambda-RTL's top-level declarations introduce nested modules, locations, storage spaces, RTL operators, or bindings to attributes of constructors.
Storage spaces are introduced by storage; they name the storage space, give its granularity (width of an individual cell) and size (number of cells), and possibly fetch, store, or aggregation methods.
declaration ::== storage character-literal {storage-alias}
--- --- is {storage-property}
storage-alias ::== alias name
storage-property ::== storage-method using expression | [cell-count] cells of cell-width bits | called documentation-string
storage-method ::== fetch | store | aggregate
RTL operators are introduced by:
declaration ::== rtlop (operator-name | [ {operator-name} ]) : typeFigure [->] shows lambda-RTL types. For example,
rtlop bit : bool -> #1 bits rtlop [= <>] : #n bits * #n bits -> boolintroduces three new RTL operators. The
#n bits in the definition of = and <> introduces
parametric polymorphism;
the equality and inequality operators can be applied to values of any
size.
This size is normally implicit in lambda-RTL, but it is made explicit in the
translated RTL form.
In the current implementation of lambda-RTL,
rtlop introduces an arbitrary, abstract value.
Because it is the only abstraction mechanism available, it
is sometimes put to odd uses.
type ::== type {* type} --- Tuple | { member-name : type{, member-name : type} } --- Record | type -> type --- Function | type-constant (bits | loc | cell) --- Unary constructors | bool | rtl --- Nullary constructors | type vector --- Vector
type-constant ::== `#variable-name | `#integer-literallambda-RTL types [*]
Attribute bindings are introduced as follows:
declaration ::== attribute of {constructor is expression}
attribute ::== default attribute of | attribute (attribute-name | default) of
constructor ::== opcode ( [operand-name {, operand-name}] )Each expression must be terminated by a newline; if an expression doesn't fit on one line, it must contain an open parenthesis or brace so that the lambda-RTL compiler knows to continue to the closing delimiter.
For example,
default attribute of
nop() is RTL.SKIP
defines the effect of a no-op.
The operand directive is used to tell lambda-RTL about the types of constructors' operands.
declaration ::== operand (operand-name | [ {operand-name} ]) : typeThese types will eventually be instantiated as C types, so they are required to be monomorphic, i.e., they may not have free type or width variables. The types of simple integer operands are determined by the machine architecture, and in fact, the lambda-RTL translator can extract these types automatically from a SLED specification. [Try, e.g., lrtl -operands sparc.sled.] The types of other operands are up to the author of the lambda-RTL specification. For example the Pentium specification in Chapter [->] represents an effective address as a triple, which contains locations of three different sizes.
operand addr : { b : #8 loc, w : #16 loc, d : #32 loc }
An expression bound to an attribute must denote a legal fragment of
RTL; in the current
implementation, this means a value (type #n bits), a
location (type #n loc), or an RTL (type rtl).
More precisely, an expression must denote an RTL template.
An RTL template becomes a fragment of RTL when suitable fragments of
RTL are substituted for the constructor's operands.
An RTL template is an RTL, except that
Eventually there will be a precise, formal definition of RTL template.
Inner declarations may appear anywhere a top-level declaration may appear, and they may also appear in let-expressions, where they are the only kind of declaration permitted. Their typical usage is to bind local values in function bodies. Inner declarations include value bindings, function bindings, and fixity declarations. Value, location, and function bindings are written as
declaration ::== val value-spec is expression | locations value-spec is expression | fun value-spec {argument-pattern}^+ is expression
value-spec ::== result-pattern | [ {name} ]A location binding is just like a value binding, except the value is forced to be a location. The square brackets in value-spec represent lambda-RTL's grouping mechanism, in which the expression actually denotes a sequence of values, and each value in the sequence is bound of the corresponding name on the left.
As examples, consider
val do_nothing is RTL.SKIP fun bool n is n <> 0 val [eq ne] is [(=) (<>)] fun triple x y z is (x, y, z) val triple is \x.\y.\z.(x, y, z)As in ML, a function definition in lambda-RTL may include more than one argument-pattern, so the two definitions of
triple are
equivalent.
lambda-RTL supports user-defined infix operators with arbitrarily many levels of precedence. Precedence levels range from 999, which represents the highest possible precedence (tightest binding) for an infix operator, down to the most negative integer on the machine (loosest binding). The syntax is
declaration ::== (infixl | infixr | infixn) precedence value-spec | nonfix value-specInfix operators must be declared to be left-associative, right-associative, or non-associative with other operators of the same precedence. For example, we can make equality testing infix and non-associative using
infixn 5 [= <>]
A single instance of an infix operator can be made ``nonfix'' (treated
as an ordinary value) by enclosing it in parentheses.
The nonfix declaration makes this behavior permanent.
lambda-RTL values include
tuples, records, vectors, and functions
(lambda-abstractions), in addition to the
cells, locations, and values of the
underlying RTLs.
lambda-RTL provides expressions to create call of these kinds of values,
as well as conditional expressions, let bindings, selection,
slicing, function application, and type casting.
As in ML, function application is written simply by writing one
expression next to another, e.g., f x or g (x, y).
Function application has higher precedence than
any infix operator, but lower precedence than dot selection and
slicing.
Figure [->] shows the syntax, precedence, and associativity
of lambda-RTL expressions.
Most of lambda-RTL's expressions are standard and require no explanation, but type casts are unusual. Type casts are particularly important when dealing with references to cells in storage spaces (bytes in memory or registers in a register file). Like some type casts in C, but unlike type casts in ML, a lambda-RTL type cast may involve computation. For example, casting a location to a value forces the insertion of a fetch operator (the fetch method of the underlying storage space). Casting a cell to a location forces the insertion of an aggregation. Function, record, and tuple values may also be cast, but users of lambda-RTL will rarely need to do so. [For the adventurous, the usual rules for covariant and contravariant subtyping apply.]
expression ::== ( expression {, expression} ) --- Tuple | { member-name : type{, member-name : type} } --- Record | [. expression {, expression} .] --- Vector | let {declaration} in expression end --- Local declarations | if expression then expression else expression fi --- Conditional | `
argument-pattern . expression --- Function | expression.member-name --- Selection | $ space-identifier [ expression ] --- Location | expression @ [slice-specification] --- Slice | expression expression --- Function application | expression : type --- Type conversion | identifier --- Named value | literal-constant --- Constant
lambda-RTL expressions [*]
Expression Precedence Fixity or Associativity Dot selection Highest Postfix Slice Highest Postfix Grouping Next highest Function application Above all infix Left associative infixx operatorsAs declared As declared \-abstractionLowest Right associative
Patterns create bindings of identifiers to values.
Syntactically, they are subsets of expressions, but every occurence of
an identifier in a pattern is a binding instance.
They are used to bind names to the arguments of functions or the
values in val declarations.
pattern ::== ( pattern {, pattern} ) | { member-name : typeThe
--- --- {, member-name : type} } | identifier | identifier as pattern
as syntax may be used to assign names both to a whole pattern
and to its constituents, e.g, as in
val result as {sum, carry} is add(r1, r2, c)
A group may appear in place of an expression anywhere in lambda-RTL. Whereas an expression denotes a single value, a group denotes a sequence of values. All operations except binding distribute over the group, so if the expression in a declaration contains a group, then that expression denotes a sequence of values, no matter how deeply nested the group is. The most common operations to distribute over a group are function application and tuple formation.
The usual syntax for a group is a list of expressions in
square brackets.
The expressions need not be of the same type.
The space separating elements of a group has precedence higher than
function application (but lower than dot selection).
Therefore, the group [f x] is a group containing two values,
as is [f (x)]. To use function application or infix
operators within a group, one must include the whole application in
parentheses, e.g., [(f x)] or [(f(x))].
In addition to the usual syntax,
lambda-RTL provides two kinds of special syntax to create groups of integers.
All use square brackets:
expression ::==group
group ::== [ {expression} ] | [ expression .. expression ] | [ expression to expressionIn the latter two cases, the constituent expressions must be compile-time constants.
--- --- [by expression | columns expression] ]
Groups are evaluated left to right, last-in, first-out, so for example the expression
([a b], [1 2])denotes the same sequence of four expressions as
[(a, 1) (a, 2) (b, 1) (b, 2)]The evaluation rule and the design of groups in general are based on the generator mechanism of the Icon programming language [cite griswold:evaluation].
The primary role of groups is to make it easy to specify multiple machine instructions in a single attribute binding. This is done by using value-spec (either a single name or a sequence of names in square brackets) to form the opcodes of constructors. Multiple value-specs may be used to combine, for example, a group of operations with a group of suffixes. The SPARC and Pentium chapters have examples.
opcode ::== value-spec{^value-spec}
In some cases, the default left-to-right, first-in, last-out rule for enumerating groups may be inconvenient or confusing. lambda-RTL therefore provides some syntactic sugar.
expression ::== {foreach ident in group}^+ group expression endBefore enumerating groups, the lambda-RTL translator rewrites expressions of the form foreach x_1 in e_1 ... foreach x_n in e_n group e' end into function applications of the form
(\x_1....\x_n.e') (e_1)...(e_n).
Because of this rewriting, using groups within the body of the
foreach is likely to produce unexpected results.
Like identifiers in ML, lambda-RTL identifiers may be alphanumeric or symbolic. An alphanumeric identifier begins with a letter and continues with letters, digits, underscores, and primes. A symbolic identifier is a sequence of the symbols
!&+/\:<=>?@~|*`%;-^#.
Such an identifier may begin with any
symbol except #.
Identifiers consisting solely of two or more dashes introduce
comments; the comment includes the dashes and continues to the end of
the line.
For example, -- and --- introduce comments, but --> and
--*-- do not.
A comment is syntactically equivalent to a newline.
An integer literal is a sequence of decimal digits; lambda-RTL has no
floating-point literals.
lambda-RTL also supports hexadecimal literals beginning with 0x,
octal literals beginning with 0, and binary literals beginning
with 0b.
Literals (and identifiers) preceded by #
represent widths in lambda-RTL's type system.
RTL operators and functions that are polymorphic in width may be
specialized by applying them to a width literal in a width
application.
For example, four 8-bit bytes can be aggregrated into a 32-bit word by
RTL.AGGB #8 #32and the
sx sign-extension operator can be specialized to extend a
13-bit value to 32 bits by
sx #13 #32It is rarely necessary to supply these constants explicitly; lambda-RTL almost always infers them from context.
Most of the RTL-specific content of lambda-RTL is in the initial basis,
i.e., the collection of predefined functions and values.
Figure [->] shows lambda-RTL's initial basis.
Of course, access to values within the RTL or Vector modules
requires that the modules be imported.
| true | Truth |
| false | Falsehood |
| RTL.STORE | Takes location and value, produces effect. |
| RTL.FETCH | Fetches value from location. |
| RTL.AGGB | Big-endian aggregation. |
| RTL.AGGL | Little-endian aggregation. |
| RTL.SKIP | The empty RTL; an effect that does nothing. |
| RTL.GUARD | Takes a Boolean expression and an effect and produces an effect. |
| RTL.NOT | Boolean negation. |
| RTL.PAR | Takes two effects (RTLs) and composes them so they take place simultaneously (list append on list of effects). |
| RTL.SEQ | Takes two effects (RTLs) and composes them by forward substitution (sequential composition of effects). |
| sub | Vector subscript, a left-associative infix operator with precedence 5. |
| Vector.spanning | A curried function of two arguments. Vector.spanning x y produces the vector [.x, x+1, ..., y.]. |
| Vector.foldr | A higher-order function used to visit every element of a vector.
Like Vector.foldr in Standard ML '97. |
RTL.FETCH and RTL.STORE use the user's fetch and store
methods.
To bypass fetch and store methods, and when defining fetch and store
methods, use RTL.TRUE_FETCH and RTL.TRUE_STORE.
lambda-RTL's initial basis [*]
lambda-RTL provides expressive power with few restrictions. Writers of specifications use the functions in lambda-RTL's initial basis to create RTLs that correspond to the form prescribed in Chapter [->]. Writers can also introduce new RTL operators. This freedom gives us ample scope for experimenting with different styles of description---as well as ample rope with which to hang ourselves. Our initial experiments have led to a ``standard library'' of useful RTL operators, as well as some useful lambda-RTL functions. Chapter [->] presents this library.
lambda-RTL provides no looping or recursion constructs.
Loops whose sizes are known in advance can be simulated by
using Vector.foldr and Vector.spanning.
Because this style will be familiar only to those well versed in
functional programming, we expect eventually to provide some syntactic
sugar for it.
is, not =.
if expressions must be terminated by fi.
from...
import directive replaces ML's open.
| CSDL Overview | RTLs | Lambda-RTL | Basic RTL Operator Library | SPARC Description | Pentium excerpts |