| CSDL Overview | RTLs | Lambda-RTL | Basic RTL Operator Library | SPARC Description | Pentium excerpts |
Special-purpose and general-purpose computers are designed at a rapid pace, and software tools and technology don't always keep up. The tools we need include not only the compilers, assemblers, linkers, and debuggers familiar to all programmers, but also less familiar tools, like profilers, tracers, test-coverage analyzers, and general code-modification tools. Most of these tools must work with machine instructions.
Machine-dependent detail makes it hard to build machine-level tools. For many years, compilers have used machine descriptions to capture such detail. Machine descriptions isolate target-specific information so that it can easily be examined and changed. Despite their successful use in compilers, machine descriptions are seldom used to build other systems software. Descriptions used in compilers are hard to reuse because they typically combine information about the target machine with information about the compiler. For example, ``machine descriptions'' written using tools like BEG [cite emmelmann:beg] and BURG [cite fraser:burg] are actually descriptions of code generators, and they depend not only on the target machine but also on a particular intermediate language. In extreme cases (e.g., gcc's md files), the description formalism itself depends on the compiler.
We believe we can simplify construction of compilers and other machine-level tools by developing description techniques that separate machine properties from compiler concerns. We expect this capability to be useful in a National Compiler Infrastructure because it will lead to a back-end infrastructure that should be usable not only with many different target machines, but also with different source languages, front ends, and intermediate languages.
Some existing languages for machine description, like VHDL [cite lipsett:vhdl] and Verilog [cite thomas:verilog] do describe only properties of the machine, but they are at too low a level, describing implementations as much as architectures. These description languages require too much detail that is not needed to build systems software.
We are designing a family of Computer Systems Description Languages (CSDL) to support a variety of machine-level tools while remaining independent of any one in particular. We have several goals for CSDL:
To identify core aspects to be used throughout the CSDL family, we examined descriptions used to help retarget a variety of systems-level tools. These tools included an optimizer [cite benitez:portable], a debugger [cite ramsey:retargetable], an instruction scheduler [cite proebsting:detecting], a call-sequence generator [cite bailey:formal], a linker [cite fernandez:retargetable], and an executable editor [cite larus:eel]. Cursory inspection showed no single common aspect of a machine used in descriptions for all of these tools, but a closer look revealed that all the descriptions refer either to the machine's instruction set or to its storage locations, and some to both. For example, the specifications used by the scheduler and linker refer only to the machine's instructions and the properties thereof. The specifications used in the call-sequence generator and in the debugger's stack walker refer only to storage, explaining in detail how values move between registers and memory. The specifications used in the optimizer and the executable editor refer both to instructions and to storage, and in particular, to how instructions change the contents of storage. From these observations, we have chosen to require that languages in the CSDL family refer to instructions, storage, or both, and that they use the models of instructions and storage presented below.
The CSDL model of an instruction set is a list of instructions together with some identifying information about their operands. Although instruction names in assembly languages are typically overloaded, CSDL requires instructions to have unique names, because tools often need uniquely named code for each instruction in an instruction set. For example, an assembler might use a unique C procedure to encode each instruction, or an executable editor might use a unique element of a C union to represent an instance of each instruction.
An individual instruction is viewed as a function or constructor that, when applied to operands, produces something interesting: a binary representation, an assembly-language string, a semantics, etc. Instruction descriptions that use the CSDL core will include the names and types of the operands of each instruction. Types will include integers of various sizes, but it will also be possible to introduce new types to define such machine-dependent concepts as effective addresses. Values of these new types may be created by applying suitable constructors; for example, Chapter [->] shows how a Pentium effective address may be formed by applying any of 9 constructors, each one representing a different addressing mode.
The structure defined by CSDL constructors and their operands can be viewed as a discriminated-union type. It is also equivalent to an ASDL grammar [cite wang:asdl], without recursion, in which the start symbol is ``instruction,'' other nonterminals refer to machine-level concepts like ``effective address'' or ``integer-instruction operand,'' and terminal symbols are defined by integers or addresses. This structure is a simplification of the ``constructor'' from the Specification Language for Encoding and Decoding (SLED) of the New Jersey Machine-Code Toolkit [cite ramsey:specifying]. It is determined by the machine, independent of any tool, so it should be useful in any specification language that deals with individual machine instructions. For example, the nML machine-description language [cite fauth:describing] uses this structure, although nML is otherwise quite different from SLED.
Experience with the New Jersey Machine-Code Toolkit shows that many applications can be built from specifications that discuss only instructions and their properties, with no reference to storage. Such applications include assemblers and disassemblers, as well as code generators that work by pattern matching on the names of instructions.
We specify properties of instructions in a compositional style, so instructions' properties are functions of the properties of their operands. We formalize that style using attribute grammars. For example, assembly-language representations of instructions can be computed as synthesized attributes, where the attributes are strings. Binary representations can also be computed as attributes, where the attributes are SLED ``patterns.'' Attributes readily support machines like the 68000 family, in which the representations of effective addresses depend on context.
The CSDL model of a storage space is a sequence of mutable cells. A storage space is like an array; cells are all the same size, and they are indexed by integers. For example, a typical microprocessor has a memory made up of 8-bit cells (bytes) and a register file made up of 32-bit cells. The number of cells in a storage space may be left unspecified.
The state of a machine can be described as the contents of a collection of storage spaces. We use storage spaces to model main memory, general-purpose registers, special-purpose registers, condition codes, and so on.
Experience with CCL, a Calling Convention Language [cite bailey:formal], shows that applications can be built from specifications that discuss only storage, with no reference to instructions. For example, calling conventions can be described by discussing the placement of parameters in storage cells and the the effects of calls and returns on storage. From a CCL description one can generate procedure prologs and epilogs in a compiler, and one can also generate code to test compilers' implementations of calling conventions [cite bailey:target]. One might be able to derive stack walkers or exception-handling code from similar descriptions. A garbage collector may need to know which locations in storage can contain pointers; this is a property of an application, not of a machine, but it can be described using the CSDL storage core. (We want to describe application properties as well as machine properties, but we want to keep them separate.)
Languages in the CSDL family may refer to individual locations. Ways of writing locations may vary, but each one must resolve to a name of a storage space and an integer offset identifying a cell within that storage space.
In principle, a single register-transfer description could support all of the different uses above. A single RTL could be given different interpretations, depending on its intended use. We believe we can support this mode of specification by prescribing a rigid form for RTLs, while supplying suitable abstraction mechanisms for use within that form. Abstraction mechanisms support our goals of enabling application writers to leave irrelevant facts unspecified. For example, we want to make it easy to write an RTL that means ``register rd is assigned an unspecified function of registers rs and rt.''
We consider SLED, for specifying representations of instructions [cite ramsey:specifying], and CCL, for specifying calling conventions [cite bailey:formal], to be the first languages in the CSDL family. We are developing a new language, lambda-RTL, for specifying instruction semantics. We expect that CSDL will expand to include languages for specifying properties of memory hierarchies and of pipelines. Indeed, the simple functional-resource languages of [cite proebsting:detecting] and [cite bala:efficient] fit nicely into the CSDL framework.
This report focuses on lambda-RTL. The report does not give a specification or definition of lambda-RTL, because lambda-RTL is not sufficiently developed for a specification or definition to be worthwhile. Instead, this report presents some essential properties of lambda-RTL, and it gives examples of machine specifications using the current, imperfect version. Later chapters present
| CSDL Overview | RTLs | Lambda-RTL | Basic RTL Operator Library | SPARC Description | Pentium excerpts |