CSDL OverviewRTLsLambda-RTLBasic RTL Operator LibrarySPARC DescriptionPentium excerpts

Using lambda-RTL to specify register transfer lists

[*]

Bare RTLs are both spartan and verbose. Expressions do not include if-then-else, so conditionals must be represented by using guards on effects. There is no expression meaning ``undefined;'' assignments of undefined values must be specified using a kill effect. These restrictions, and the requirement that operations be fully disambiguated, make RTLs a form that is good for manipulation by tools but not so good for writing specifications.

lambda-RTL is a metalanguage that enables specification writers to attach RTL trees to SLED-like constructors without having to write everything explicitly. lambda-RTL is a higher-order, strongly typed, polymorphic, pure functional language based largely on Standard ML [cite milner:definition]. A variation on the Hindley-Milner type system makes it possible to write flexible, type-safe functions without having to write types explicitly.

Design considerations

We have drawn on our experience with SLED [cite ramsey:specifying] to identify mechanisms and properties that are desirable in any CSDL language, including lambda-RTL. These include:[*]

We expect to be able to analyze lambda-RTL specifications for internal consistency and ``plausibility.'' For example, it should be possible to identify cases in which a register number is mistakenly used as an immediate value instead of as an offset into the storage space modeling the register file.

Restrictions eased in lambda-RTL

lambda-RTL descriptions are easier to write than bare RTLs in three ways: grouping and higher-order functions can help eliminate repetition, the type system largely eliminates the need to write sizes explicitly, and lambda-RTL relaxes several of the restrictions on the form of RTLs. In particular,

Implicit fetches

Most programmers are used to writing x := x + 1 and having the x on the left denote a location while the x on the right denotes the value stored in that location. Typical compilers identify ``lvalue contexts'' and ``rvalue'' contexts and automatically insert fetches in rvalue contexts. We do the same in lambda-RTL, but instead of using syntax to identify the contexts, we use types.

We use types and not syntax because lambda-RTL has no special syntax for writing RTL assignments. Instead of an assignment syntax, lambda-RTL provides a built-in store function that accepts a location and a value and produces an effect. Any user-defined function might result in a call to the built-in store, so we have to recognize the right and left contexts by their types. Thus, if a location is used where a value is expected, we insert a fetch.

lambda-RTL does almost everything with functions, not syntax, so the writer of a specification can normally redefine the meaning of a notation by defining a new function with the same name. This strategy doesn't work with implicit fetches because there is no explicit notation associated with a fetch. We want users to be able to control the meanings of these fetches, however, because many machines have resources that are almost, but not quite, sequences of mutable cells. For example, SPARC registers can be viewed as a collection of 32 mutable cells, except that register 0 is not mutable and always contains 0. We would like users to be able to define special meanings for ``fetch from register 0'' and ``store into register 0'' so the rest of the specification can pretend that the registers are an ordinary storage space. We do so by permitting users to attach fetch and store methods to each storage space. Methods not given explicitly default to the standard fetch and store operators. Sample fetch and store methods for the SPARC are shown on page [->]. If we wanted, we could use fetch and store methods to describe the true implementation of SPARC registers, in which ``registers'' 8 through 31 denote locations accessed indirectly through the register-window pointer (CWP).

Slices

Many machine instructions manipulate fragments of a word stored in a mutable cell. For example, some machines represent condition codes as individual bits within a program status word. Many machines have instructions that, for example, assign to the least-significant 8 bits of a 32-bit register. To make it easy to specify such instructions, lambda-RTL creates the illusion that a sub-range or ``slice'' of a cell can be a location in its own right, one that is a suitable argument for a fetch or store operation. This illusion helps keep machine descriptions readable; for example, an effect that sets the SPARC overflow bit simply assigns to it, hiding the fact that it is buried in a program status word that has to be fetched, modified, and stored.

lambda-RTL uses a special syntax for slices, because we hope eventually to overload the notation to refer either to locations or to values. Until we get this overloading figured out, however, uses have to be distinguished by a ty suffix that must be either loc or bits. Examples of the syntax include
x@ty[k]Bit k of x. By default, bit 0 is the least significant bit. A future version of lambda-RTL may make it possible to change the numbering.
x@ty[k_1..k_2]Bits k_1 through k_2 of x, inclusive.
x@ty[k bits at e]A k-bit slice of x, with the least significant bit at e.
k's denote integer constants, e's denote expressions, and x's may denote values or locations. In all cases the size of the slice is known statically, so its type can be computed automatically. We use the Greek letter sigma to stand for any of these slice specifications.

Given a slice specification sigma, SLICE_LOC sigma maps locations to locations and SLICE_BITS sigma maps values to values. The illusion that slices can be applied to locations is implemented by rewriting, according to the following rules:

FETCH(SLICE_LOC sigma l)becomesSLICE_BITS sigma(FETCH l)
SLICE_LOC sigma l <--nbecomesl <--bitInsert_sigma(FETCH l, n)
where l is a location and n is a value. After the rewriting, all slices operate on bit vectors, and all fetches and stores operate on true locations. Invocation of user-defined fetch and store methods takes place after the rewriting of slices. This ordering makes it possible to use fetch and store methods to define cell-like abstractions, while ensuring that the meaning of slicing is always consistent with respect to such abstractions.

Implicit aggregation

lambda-RTL provides the special syntax $space[offset] for references to mutable cells. The offset can be an arbitrary expression, but the space must be a literal name, so that lambda-RTL can identify the storage space and use appropriate fetch and store methods. To make this cell a location, lambda-RTL applies an aggregation, which is also associated with the storage space as a method. The default method is the identity aggregation, which permits only ``aggregates'' of a single cell.

When little-endian, big-endian, or other aggregations are used, lambda-RTL will infer the size of the aggregate. We have not yet determined the rules to be used for the inference, but we hope at least to support the automatic inference that four 8-bit bytes must be aggregated to form a value that goes into a 32-bit register.

Overview of lambda-RTL

From the point of view of external tools, the output of a lambda-RTL specification is a set of bindings of values to attributes of constructors. The part of lambda-RTL used to specify values is a pure, typed functional language without recursion. Many features of lambda-RTL are inspired by Standard ML.

To describe syntax, we use an EBNF grammar with standard metasymbols for {sequences}, [optional constructs], and (alternative | choices). We use large metasymbols to help distinguish them from literals. Terminal symbols given literally appear in typewriter font. Other terminal symbols and all nonterminals appear in italic font. Excerpts from the grammar begin with the name of a nonterminal followed by the ::== (``produces'') symbol.

A lambda-RTL specification is a sequence of modules. lambda-RTL uses modules to organize the name space of values. A value x declared in a module S is visible within S as just x, but outside S it can only be referred to as S.x. Modules are written

module ::== module module-name is {import} {declaration} end
Modules may include declarations of nested modules as well as values.

At top level, modules provide a kind of separate compilation. lambda-RTL requires explicit import directives for references between top-level modules.

import ::== import (module-name | [ {module-name} ]) | from module-id import (name | [ {name} ])
module-id ::== module-name | module-id . module-name
For example, before module A can use values defined in module B, module A must include the directive import B. This directive makes B visible within A, as well as qualified names beginning with B. The from... import directive imports identifiers directly into module A. Not only can these identifiers be used without qualification; they carry the same fixity (infix nature, operator precedence, and associativity) that they had in module B. For example, if the module StdOperators defines infix identifiers + and - to denote addition and subtraction, then if module A includes the directive
  from StdOperators import [+ -]
then these identifiers are also infix in module A, where it is legal to write, e.g., the function definition fun inc x is x + 1.

Except as changed by an import directive, the scopes of names are limited to the modules in which they are defined. Values and modules share a single name space within lambda-RTL. Constructors and storage spaces occupy their own individual name spaces, which are flat. RTL operators occupy a separate, flat name space, but within lambda-RTL they are not distinguished from other kinds of values. Eventually, the lambda-RTL translator will check that the same name is not used to declare distinct RTL operators in two different modules.

Within a module, declarations are either top-level or inner declarations. Inner declarations may appear anywhere, but top-level declarations may appear only outside of function and value definitions. Top-level declarations appear only in contexts that guarantee they will be evaluated exactly once.

Top-level declarations

lambda-RTL's top-level declarations introduce nested modules, locations, storage spaces, RTL operators, or bindings to attributes of constructors.

Storage spaces are introduced by storage; they name the storage space, give its granularity (width of an individual cell) and size (number of cells), and possibly fetch, store, or aggregation methods.

declaration ::== storage character-literal {storage-alias}
--- --- is {storage-property}
storage-alias ::== alias name
storage-property ::== storage-method using expression | [cell-count] cells of cell-width bits | called documentation-string
storage-method ::== fetch | store | aggregate

RTL operators are introduced by:

declaration ::== rtlop (operator-name | [ {operator-name} ]) : type
Figure [->] shows lambda-RTL types. For example,
 
  rtlop bit : bool -> #1 bits
  rtlop [= <>] : #n bits * #n bits -> bool
introduces three new RTL operators. The #n bits in the definition of = and <> introduces parametric polymorphism; the equality and inequality operators can be applied to values of any size. This size is normally implicit in lambda-RTL, but it is made explicit in the translated RTL form.

In the current implementation of lambda-RTL, rtlop introduces an arbitrary, abstract value. Because it is the only abstraction mechanism available, it is sometimes put to odd uses.


type ::== type {* type} --- Tuple | { member-name : type{, member-name : type} } --- Record | type -> type --- Function | type-constant (bits | loc | cell) --- Unary constructors | bool | rtl --- Nullary constructors | type vector --- Vector
type-constant ::== `#variable-name | `#integer-literal
lambda-RTL types [*]

Attribute bindings are introduced as follows:

declaration ::== attribute of {constructor is expression}
attribute ::== default attribute of | attribute (attribute-name | default) of
constructor ::== opcode ( [operand-name {, operand-name}] )
Each expression must be terminated by a newline; if an expression doesn't fit on one line, it must contain an open parenthesis or brace so that the lambda-RTL compiler knows to continue to the closing delimiter.

For example,

  default attribute of
    nop() is RTL.SKIP
defines the effect of a no-op.

The operand directive is used to tell lambda-RTL about the types of constructors' operands.

declaration ::== operand (operand-name | [ {operand-name} ]) : type
These types will eventually be instantiated as C types, so they are required to be monomorphic, i.e., they may not have free type or width variables. The types of simple integer operands are determined by the machine architecture, and in fact, the lambda-RTL translator can extract these types automatically from a SLED specification. [Try, e.g., lrtl -operands sparc.sled.] The types of other operands are up to the author of the lambda-RTL specification. For example the Pentium specification in Chapter [->] represents an effective address as a triple, which contains locations of three different sizes.
  operand addr : { b : #8 loc, w : #16 loc, d : #32 loc }

An expression bound to an attribute must denote a legal fragment of RTL; in the current implementation, this means a value (type #n bits), a location (type #n loc), or an RTL (type rtl). More precisely, an expression must denote an RTL template. An RTL template becomes a fragment of RTL when suitable fragments of RTL are substituted for the constructor's operands. An RTL template is an RTL, except that

A tuple or record containing RTL templates is also considered an RTL template. The most notable consequence of these rules is that no expression whose normal form contains a lambda-abstraction is an RTL template; lambda-abstractions must be applied to arguments.

Eventually there will be a precise, formal definition of RTL template.

Inner declarations

Inner declarations may appear anywhere a top-level declaration may appear, and they may also appear in let-expressions, where they are the only kind of declaration permitted. Their typical usage is to bind local values in function bodies. Inner declarations include value bindings, function bindings, and fixity declarations. Value, location, and function bindings are written as

declaration ::== val value-spec is expression | locations value-spec is expression | fun value-spec {argument-pattern}^+ is expression
value-spec ::== result-pattern | [ {name} ]
A location binding is just like a value binding, except the value is forced to be a location. The square brackets in value-spec represent lambda-RTL's grouping mechanism, in which the expression actually denotes a sequence of values, and each value in the sequence is bound of the corresponding name on the left.

As examples, consider

  val do_nothing is RTL.SKIP
  fun bool n is n <> 0
  val [eq ne] is [(=) (<>)]
  fun triple x y z is (x, y, z)
  val triple is \x.\y.\z.(x, y, z)
As in ML, a function definition in lambda-RTL may include more than one argument-pattern, so the two definitions of triple are equivalent.

lambda-RTL supports user-defined infix operators with arbitrarily many levels of precedence. Precedence levels range from 999, which represents the highest possible precedence (tightest binding) for an infix operator, down to the most negative integer on the machine (loosest binding). The syntax is

declaration ::== (infixl | infixr | infixn) precedence value-spec | nonfix value-spec
Infix operators must be declared to be left-associative, right-associative, or non-associative with other operators of the same precedence. For example, we can make equality testing infix and non-associative using
  infixn 5 [= <>]

A single instance of an infix operator can be made ``nonfix'' (treated as an ordinary value) by enclosing it in parentheses. The nonfix declaration makes this behavior permanent.

Expressions

lambda-RTL values include tuples, records, vectors, and functions (lambda-abstractions), in addition to the cells, locations, and values of the underlying RTLs. lambda-RTL provides expressions to create call of these kinds of values, as well as conditional expressions, let bindings, selection, slicing, function application, and type casting. As in ML, function application is written simply by writing one expression next to another, e.g., f x or g (x, y). Function application has higher precedence than any infix operator, but lower precedence than dot selection and slicing. Figure [->] shows the syntax, precedence, and associativity of lambda-RTL expressions.

Most of lambda-RTL's expressions are standard and require no explanation, but type casts are unusual. Type casts are particularly important when dealing with references to cells in storage spaces (bytes in memory or registers in a register file). Like some type casts in C, but unlike type casts in ML, a lambda-RTL type cast may involve computation. For example, casting a location to a value forces the insertion of a fetch operator (the fetch method of the underlying storage space). Casting a cell to a location forces the insertion of an aggregation. Function, record, and tuple values may also be cast, but users of lambda-RTL will rarely need to do so. [For the adventurous, the usual rules for covariant and contravariant subtyping apply.]


expression ::== ( expression {, expression} ) --- Tuple | { member-name : type{, member-name : type} } --- Record | [. expression {, expression} .] --- Vector | let {declaration} in expression end --- Local declarations | if expression then expression else expression fi --- Conditional | `
argument-pattern . expression --- Function | expression.member-name --- Selection | $ space-identifier [ expression ] --- Location | expression @ [slice-specification] --- Slice | expression expression --- Function application | expression : type --- Type conversion | identifier --- Named value | literal-constant --- Constant
ExpressionPrecedenceFixity or Associativity
Dot selectionHighestPostfix
SliceHighestPostfix
GroupingNext highest
Function applicationAbove all infixLeft associative
infixx operatorsAs declaredAs declared
\-abstractionLowestRight associative
lambda-RTL expressions [*]

Patterns

Patterns create bindings of identifiers to values. Syntactically, they are subsets of expressions, but every occurence of an identifier in a pattern is a binding instance. They are used to bind names to the arguments of functions or the values in val declarations.

pattern ::== ( pattern {, pattern} ) | { member-name : type
--- --- {, member-name : type} } | identifier | identifier as pattern
The as syntax may be used to assign names both to a whole pattern and to its constituents, e.g, as in
val result as {sum, carry} is add(r1, r2, c)

Grouping

A group may appear in place of an expression anywhere in lambda-RTL. Whereas an expression denotes a single value, a group denotes a sequence of values. All operations except binding distribute over the group, so if the expression in a declaration contains a group, then that expression denotes a sequence of values, no matter how deeply nested the group is. The most common operations to distribute over a group are function application and tuple formation.

The usual syntax for a group is a list of expressions in square brackets. The expressions need not be of the same type. The space separating elements of a group has precedence higher than function application (but lower than dot selection). Therefore, the group [f x] is a group containing two values, as is [f (x)]. To use function application or infix operators within a group, one must include the whole application in parentheses, e.g., [(f x)] or [(f(x))]. In addition to the usual syntax, lambda-RTL provides two kinds of special syntax to create groups of integers. All use square brackets:

expression ::==group
group ::== [ {expression} ] | [ expression .. expression ] | [ expression to expression
--- --- [by expression | columns expression] ]
In the latter two cases, the constituent expressions must be compile-time constants.

Groups are evaluated left to right, last-in, first-out, so for example the expression

  ([a b], [1 2])
denotes the same sequence of four expressions as
  [(a, 1) (a, 2) (b, 1) (b, 2)]
The evaluation rule and the design of groups in general are based on the generator mechanism of the Icon programming language [cite griswold:evaluation].

The primary role of groups is to make it easy to specify multiple machine instructions in a single attribute binding. This is done by using value-spec (either a single name or a sequence of names in square brackets) to form the opcodes of constructors. Multiple value-specs may be used to combine, for example, a group of operations with a group of suffixes. The SPARC and Pentium chapters have examples.

opcode ::== value-spec{^value-spec}

In some cases, the default left-to-right, first-in, last-out rule for enumerating groups may be inconvenient or confusing. lambda-RTL therefore provides some syntactic sugar.

expression ::== {foreach ident in group}^+ group expression end
Before enumerating groups, the lambda-RTL translator rewrites expressions of the form foreach x_1 in e_1 ... foreach x_n in e_n group e' end into function applications of the form (\x_1....\x_n.e') (e_1)...(e_n). Because of this rewriting, using groups within the body of the foreach is likely to produce unexpected results.

Lexical conventions

Like identifiers in ML, lambda-RTL identifiers may be alphanumeric or symbolic. An alphanumeric identifier begins with a letter and continues with letters, digits, underscores, and primes. A symbolic identifier is a sequence of the symbols

!&+/\:<=>?@~|*`%;-^#.
Such an identifier may begin with any symbol except #.

Identifiers consisting solely of two or more dashes introduce comments; the comment includes the dashes and continues to the end of the line. For example, -- and --- introduce comments, but --> and --*-- do not. A comment is syntactically equivalent to a newline.

An integer literal is a sequence of decimal digits; lambda-RTL has no floating-point literals. lambda-RTL also supports hexadecimal literals beginning with 0x, octal literals beginning with 0, and binary literals beginning with 0b.

Literals (and identifiers) preceded by # represent widths in lambda-RTL's type system. RTL operators and functions that are polymorphic in width may be specialized by applying them to a width literal in a width application. For example, four 8-bit bytes can be aggregrated into a 32-bit word by

  RTL.AGGB #8 #32
and the sx sign-extension operator can be specialized to extend a 13-bit value to 32 bits by
  sx #13 #32
It is rarely necessary to supply these constants explicitly; lambda-RTL almost always infers them from context.

The initial basis

Most of the RTL-specific content of lambda-RTL is in the initial basis, i.e., the collection of predefined functions and values. Figure [->] shows lambda-RTL's initial basis. Of course, access to values within the RTL or Vector modules requires that the modules be imported.


Boolean constants:

RTL operators and other functions used to create RTLs:

Functions on vectors:
trueTruth
falseFalsehood
RTL.STORETakes location and value, produces effect.
RTL.FETCHFetches value from location.
RTL.AGGBBig-endian aggregation.
RTL.AGGLLittle-endian aggregation.
RTL.SKIPThe empty RTL; an effect that does nothing.
RTL.GUARDTakes a Boolean expression and an effect and produces an effect.
RTL.NOTBoolean negation.
RTL.PARTakes two effects (RTLs) and composes them so they take place simultaneously (list append on list of effects).
RTL.SEQTakes two effects (RTLs) and composes them by forward substitution (sequential composition of effects).
subVector subscript, a left-associative infix operator with precedence 5.
Vector.spanningA curried function of two arguments. Vector.spanning x y produces the vector [.x, x+1, ..., y.].
Vector.foldrA higher-order function used to visit every element of a vector. Like Vector.foldr in Standard ML '97.

RTL.FETCH and RTL.STORE use the user's fetch and store methods. To bypass fetch and store methods, and when defining fetch and store methods, use RTL.TRUE_FETCH and RTL.TRUE_STORE.

lambda-RTL's initial basis [*]


Style

lambda-RTL provides expressive power with few restrictions. Writers of specifications use the functions in lambda-RTL's initial basis to create RTLs that correspond to the form prescribed in Chapter [->]. Writers can also introduce new RTL operators. This freedom gives us ample scope for experimenting with different styles of description---as well as ample rope with which to hang ourselves. Our initial experiments have led to a ``standard library'' of useful RTL operators, as well as some useful lambda-RTL functions. Chapter [->] presents this library.

lambda-RTL provides no looping or recursion constructs. Loops whose sizes are known in advance can be simulated by using Vector.foldr and Vector.spanning. Because this style will be familiar only to those well versed in functional programming, we expect eventually to provide some syntactic sugar for it.

Differences between lambda-RTL and Standard ML

Although lambda-RTL is inspired in large part by Standard ML, there are significant differences, which may interest ML programmers. Syntactic differences include: There are also some semantic restrictions:

CSDL OverviewRTLsLambda-RTLBasic RTL Operator LibrarySPARC DescriptionPentium excerpts
Back to Lambda-RTL home page