|
UVa CS |
| Intro | RTL Creation Interface | Assembly Language Interface | VPO Code Generation Interface |
An Interface to Assembly Languages
This document describes a generic, machine-independent interface to assembly languages. The representations of instructions, labels, and relocatable addresses are left abstract. We expect that a particular compiler will select fixed implementations for labels and relocatable addresses, and that the representation of instructions will change with every machine.
In fact, we hope this interface will have at least three implementations for every target machine: emit assembly language, emit object code, and help emit RTLs for vpo. [The interface may also be used within vpo, which will then emit assembly language or object code.] Therefore, this interface does not attempt to be the best possible interface for assembly; rather, it defines a plausible interface that is consistent with existing assemblers.
The design of the interface is similar to that of the New Jersey Machine-Code Toolkit's library. We imagine that implementations supporting binary emission would in fact use the Toolkit. Nonetheless, this interface attempts to be independent of the Toolkit.
The assembly interface is exported as a struct assembler, making
it easy to supply multiple implementations in one compiler.
Every assembler is automatically provided with a symbol table.
Other components may be added to implementations by type extension.
<asm.h>=
<assembly interface types>
typedef struct asm_symtab *AsmSymtab;
typedef struct assembler {
AsmSymtab symtab; /* symbol table */
<assembly interface procedures>
} *Assembler;
<common assembly prototypes>
DefinesAsmSymtab,Assembler(links are to index).
Tables [->], [->], and [->] summarize the types and functions exported by this interface.
Like most assemblers, this one operates with a collection of named ``sections,'' one of which is the ``current section.'' A section identifies a contiguous block of memory in the running process image. Typically, the location at which the section is mapped is not known until link time.
Much of the assembler's job is determining the contents of the sections. Each section has a ``location counter,'' which identifies a current location. The location counter is an integer, and the current location is the location at that offset from the beginning of the current section. Many procedures in this interface deposit data (or instructions) at the current location and advance the location counter.
The assembler also uses names to refer to constants or to locations within sections. To support separate compilation, names may include references to locations defined in other compilations; such references are resolved at link time.
| Type | Abbreviation Meaning | |
AsmLabel | label | A label. |
AsmRelAddr | relAddr | A relocatable address. |
AsmScope | scope | A scope for names (imported, exported, local, or common). |
AsmSymbol | sym | An assembly-language symbol. |
AsmInstruction | instr | A machine instruction. |
AsmSymtab | symtab | A symbol table. |
Assembler | assembler | An assembler. |
| Relocatable addresses | |
Asm_newaddr (label l, int offset) | Create address L+k. |
Asm_shiftaddr(relAddr a, int offset) | Create address a+k. |
| Symbol-table support | |
Asm_symtab (void) | Create new symbol table. |
Asm_sym_insert (Symtab, const char *, scope) | Add a symbol. |
Asm_sym_lookup (Symtab, const char *, scope default_scope) | Look up a symbol; add if not present. |
Asm_symreloc (Assembler, const char *) | Find relocatable address corresponding to name. |
Other functions defined in this interface
[*]
| Symbols and names | |
import(const char *) | Return symbol for imported name. |
export(const char *) | Return symbol for exported name. |
local (const char *) | Return symbol for local (private) name. |
common(const char *name, int size, int align, const char *section) | Return common symbol. |
lookup(const char *) | Look up symbol by name. |
offset(const char *, sym, int) | Create a symbol relative to another symbol (deprecated). |
define_symbol_here (sym) | Bind the symbol to the current location (e.g., define label). |
define_symbol_const(sym, int) | Bind a symbol to a constant. |
function(sym) | Start a function definition. |
| Sections and the location counter | |
section(const char *) | Change sections. |
current_section(void) | Return the name of the current section. |
org(unsigned) | Set the location counter to the argument. |
align(unsigned n) | Round the location counter to an n-byte boundary.
|
addlc(unsigned n) | Add n to the location counter.
|
| Emitting values and instructions | |
emit_zeroes(unsigned n) | Write n zero bytes.
|
emit_instruction(instr i) | Emit an instruction. |
emit(long value, int width) | Emit value (width bytes wide).
|
emita(AsmRelAddr) | Emit a relocatable address. |
emitf32(int sign, int exp, unsigned long mantissa) | Emit a 32-bit IEEE float. |
emitf64(int sign, int exp, unsigned long mhi, unsigned long mlo) | Emit a 64-bit IEEE float. |
emitf32s(const char *) | Emit a 32-bit IEEE float (from string). |
emitf64s(const char *) | Emit a 64-bit IEEE float (from string). |
| Miscellaneous | |
progbeg(struct assembler *, int argc, const char **argv) | Initialize the assembler. |
progend(void) | Finalize the assembler. |
comment(const char *) | Insert a comment. |
asmtext(const char *) | Insert arbitrary text into the assembly language (deprecated). |
Functions accessible indirectly through the assembler structure [*]
Labels and relocatable addresses both resolve to integers at link time. Labels, but not relocatable addresses, can be bound to locations or to values. The reason for distinguishing labels and relocatable addresses is that labels can be bound to a location, but relocatable addresses cannot. Relocatable addresses can nevertheless appear as operands to many machine instructions, so it is appropriate to use them in RTLs.
<assembly interface types>= (<-U) [D->] typedef struct asm_label *AsmLabel; typedef struct relAddr_st *AsmRelAddr;
DefinesAsmLabel,AsmRelAddr(links are to index).
In most assemblers, a relocatable address is:
<common assembly prototypes>= (<-U) [D->] AsmRelAddr Asm_newaddr (AsmLabel l, int offset); AsmRelAddr Asm_shiftaddr(AsmRelAddr a, int offset);
DefinesAsm_newaddr,Asm_shiftaddr(links are to index).
Note that labels are not created directly; instead they are part of assembly-language symbols, as detailed below.
offset, the k part of the
relocatable address is zero, so the label part is directly associated
with the name.
Imported labels are unbound.
Exported and local labels are bound either to locations in relocatable
blocks or to integers.
<assembly interface types>+= (<-U) [<-D->]
typedef enum asm_scope {
ASM_IMPORTED=1, ASM_EXPORTED, ASM_LOCAL, ASM_COMMON
} AsmScope;
typedef struct asm_symbol {
AsmScope scope;
const char *name; /* name by which known to the assembler */
AsmRelAddr relAddr; /* usually with offset k == 0 */
union {
struct { int size; short align; } common;
} u;
} *AsmSymbol;
DefinesAsmScope,AsmSymbol(links are to index).
Note that additional information is associated with common symbols.
Symbols must not be created directly by the user, but only through the procedures provided in this interface.
<assembly interface procedures>= (<-U) [D->] AsmSymbol (*import)(const char *); AsmSymbol (*export)(const char *); AsmSymbol (*local) (const char *); AsmSymbol (*common)(const char *name, int size, int align, const char *section);
Definescommon,export,import,local(links are to index).
It is an unchecked runtime error to register the same name in
different scopes. Multiple calls to import with the same name are
OK. It is not determined whether implementations can handle multiple
calls of export or local with the same name.
A common symbol may be declared in multiple compilation
units, with multiple sizes and alignments.
The linker reserves an area with the largest size and the most
strict alignment, and the symbol is bound to the address of that area.
The area is reserved in the section specified in the common
directive; it is an unchecked (link-time) error to declare a common
symbol in different sections.
Some assemblers or linkers may restrict the sections in which common
symbols may be declared, and some linkers may require that the same
size and alignment be used in all declarations of a common symbol.
Consult the Processor Supplement for information about restrictions.
Symbols that have been registered can be looked up. It is a checked runtime error to look up an unregistered symbol.
<assembly interface procedures>+= (<-U) [<-D->] AsmSymbol (*lookup)(const char *);
Defineslookup(links are to index).
lcc uses an unusual convention for relocatable addresses of the form L+k; it represents them as symbols. So as to touch the lcc back ends as little as possible, we make it possible to create a new symbol that represents an offset from an existing symbol. Such symbols have no labels associated with them. To create p=L+k, we call offset(p, L, k).
<assembly interface procedures>+= (<-U) [<-D->] AsmSymbol (*offset)(const char *, AsmSymbol, int);
Definesoffset(links are to index).
offset is deprecated and may be removed from a future version of
this interface.
Local symbols can be defined to point at the current location in the current relocatable block, or to be constants.
<assembly interface procedures>+= (<-U) [<-D->] void (*define_symbol_here )(AsmSymbol); void (*define_symbol_const)(AsmSymbol, int);
Definesdefine_symbol_const,define_symbol_here(links are to index).
It is an unchecked runtime error to define the same symbol twice.
<assembly interface procedures>+= (<-U) [<-D->] void (*function)(AsmSymbol);
Definesfunction(links are to index).
Sections are referred to by name.
section switches to a given section, and current_section
returns the name of the current section.
The exact set of valid section names is determined by the target
machine and OS; it is documented in the Processor Supplement.
Most targets are likely to support "text" for code and
"data" for initialized data.
<assembly interface procedures>+= (<-U) [<-D->] void (*section)(const char *); const char *(*current_section)(void);
Definescurrent_section,section(links are to index).
The location counter, lc, is a nonnegative offset into a section,
measured in bytes.
The location counter is considered part of the section, so section
saves the current location counter and restores the proper one for the
new section.
The location counter of the current section can be manipulated in various ways:
<assembly interface procedures>+= (<-U) [<-D->] void (*org)(unsigned); /* set lc to argument */ void (*align)(unsigned n); /* round lc up to an n-byte boundary */ void (*addlc)(unsigned n); /* add n to lc */
Definesaddlc,align,org(links are to index).
If advancing the location counter results in unwritten areas in a section, the contents of those areas are undefined. It's also possible to advance the location counter by filling in with zeroes:
<assembly interface procedures>+= (<-U) [<-D->] void (*emit_zeroes)(unsigned n); /* write n zero bytes */
Definesemit_zeroes(links are to index).
Arguably it ought to be possible to get the current value of the location counter, but most assemblers don't let you store and reuse such a value, which makes a C interface problematic. Luckily, the ability to create and define labels makes such an interface unnecessary.
emit_instruction emits an instruction at the current location.
The definition of struct asm_instruction is machine-dependent and not
part of this interface.
(In fact, an application like a compiler might use this interface with
more than one representation of instruction, in which case casting
would be required.)
A (machine-dependent) definition of
struct asm_instruction might be generated automatically from a SLED description of
the instruction set.
The implementation of emit_instruction guarantees that *i does
not outlive the activation of emit_instruction(i), so it is
permissible---and recommended---to pass the address of a local variable.
<assembly interface procedures>+= (<-U) [<-D->] void (*emit_instruction)(AsmInstruction i);
Definesemit_instruction(links are to index).
<assembly interface types>+= (<-U) [<-D] typedef struct asm_instruction *AsmInstruction;
DefinesAsmInstruction(links are to index).
<assembly interface procedures>+= (<-U) [<-D->] void (*emit)(long value, int width); void (*emita)(AsmRelAddr);
Definesemit,emita(links are to index).
Note that if it is desired to support cross-assembly to a machine with a larger word size than the host machine, large constants will have to be emitted in pieces. This requirement should not represent an undue burden because large constants will have to be represented using multiple host words anyway.
mlo for the least
significant 32 bits, and mhi for the remaining most significant bits.
<assembly interface procedures>+= (<-U) [<-D->] void (*emitf32) (int sign, int exp, unsigned long mantissa); void (*emitf64) (int sign, int exp, unsigned long mhi, unsigned long mlo); void (*emitf32s)(const char *); void (*emitf64s)(const char *);
Definesemitf32,emitf64,emitf32s,emitf64s(links are to index).
These functions emit IEEE 754 floating-point values of 32 and 64
bits. Compilers wishing to emit infinities or NaNs must use emit
to emit the binary representation.
argc and argv may be used to pass information that is machine-dependent or
implementation-dependent.
[I know of no such use at present, but these kinds of escapes
have proven useful in the past.]
<assembly interface procedures>+= (<-U) [<-D->] void (*progbeg)(struct assembler *, int argc, const char *argv[]); void (*progend)(void);
Definesprogbeg,progend(links are to index).
comment.
It is an unchecked runtime error for a comment to contain a newline,
line feed, form feed, etc.
<assembly interface procedures>+= (<-U) [<-D->] void (*comment)(const char *);
Definescomment(links are to index).
<assembly interface procedures>+= (<-U) [<-D] void (*asmtext)(const char *);
Definesasmtext(links are to index).
Asm_symtab creates a new symbol table.
Asm_sym_insert inserts a symbol, complaining if it already exists
with a different scope.
Asm_sym_lookup seeks a symbol, inserting it with the
default_scope if it doesn't exist.
There's no such thing as an undefined symbol until you reach the
linking stage!
<common assembly prototypes>+= (<-U) [<-D->] extern AsmSymtab Asm_symtab (void); extern AsmSymbol Asm_sym_insert (AsmSymtab, const char *, AsmScope); extern AsmSymbol Asm_sym_lookup (AsmSymtab, const char *, AsmScope default_scope);
DefinesAsm_sym_insert,Asm_sym_lookup,Asm_symtab(links are to index).
For convenience only, we provide a generic routine for mapping names to relocatable addresses by looking up names in the assembler's symbol table.
<common assembly prototypes>+= (<-U) [<-D] extern AsmRelAddr Asm_symreloc (Assembler, const char *);
DefinesAsm_symreloc(links are to index).
| Intro | RTL Creation Interface | Assembly Language Interface | VPO Code Generation Interface |