| CSDL Overview | RTLs | Lambda-RTL | Basic RTL Operator Library | SPARC Description | Pentium excerpts |
This chapter illustrates lambda-RTL by specifying the SPARC instruction set. Only a handful of instructions have been omitted, primarily the coprocessor-branch instructions, a few privileged load and store instructions, and the cache-flush instruction.
Sections [->]--[->] describe some instructions and give extensive commentary. Load and store instructions illustrate the basic techniques used to move data of different sizes. Logical instructions and add instructions show simple groups of computational instructions; each group offers a slightly different treatment of condition codes. Specifications of save and restore instructions illustrate one of several possible treatments of register windows. In this case we have not yet achieved our goal of separating hardware behavior from software conventions; our model of register windows describes the abstraction that is presented by the combination of hardware, calling convention, and operating system.
Section [->] presents the rest of the instructions with little commentary.
[*] Storage locations manipulated by SPARC instructions include memory and several kinds of registers. We have little to say about memory other than that the machine is byte-addressed and uses big-endian byte order.
<SPARC basics>= (U->) [D->] storage 'm' is cells of 8 bits called "memory" aggregate using RTL.AGGB
save and restore instructions manipulate the CWP, as
do traps and returns from traps.
This low-level model would be easy to describe using lambda-RTL, but it is
not the model used by most compiler writers.
Compiler writers seldom need to use register 0 explicitly, because
SPARC assembly languages provide ``synthetic'' instructions that use
register 0 as needed.
Instead of using the detailed semantics of register windows, compiler
writers adhere to the SPARC calling convention, which
(with some help from the operating system) gives the illusion of an
infinite collection of register windows, and which allocates one
register window to
each activation of each procedure.
[``Optimized leaf procedures'' may
use their caller's register window.]
The compiler must reserve space on the stack for use
as backing store, and it must
use save and restore in procedure prologs and epilogs.
lambda-RTL is not biased toward any particular model of register windows,
and in fact it could be used to specify different models which might
be useful in different situations.
Eventually, lambda-RTL will have a modules system that will enable us to
create different models, at different levels of abstraction, of such
things as
SPARC register windows.
For this document, we have chosen a
fairly abstract model that is convenient for compiler writers.
We hide most of the low-level hardware behavior, and we describe
registers (except register 0) as a simple collection of
mutable cells.
Section [->] shows the details of the model that are
needed to specify the effects of the save and restore
instructions.
Dealing with register 0 is fairly simple; there are only two reasonable models. One is the full semantics; the other is a model which specifies only that register 0 is immutable. The simpler model suffices for some compilers, and the resulting RTLs are easier to understamd. As in Chapter [->], we use ((simple)) and ((full)) to show the two alternatives.[*]
<SPARC register fetch and store methods ((simple))>= fetch using \n. $r[n] store using \(n, v). n <> 0 --> RTL.TRUE_STORE ($r[n], v)
We make register 0 immutable by ensuring that attempts to store into it have no effect. The full semantics also shows that fetches return 0.
<SPARC register fetch and store methods ((full))>= fetch using \n. if n = 0 then 0 else RTL.TRUE_FETCH ($r[n]) fi store using \(n, v). n <> 0 --> RTL.TRUE_STORE ($r[n], v)
In this straightforward model of registers, we use these special fetch
and store methods, and we also permit
registers to be aggregated into pairs, so we can describe instructions
like ldd.
<SPARC basics>+= (U->) [<-D->]
storage
'r' is 32 cells of 32 bits called "registers"
<SPARC register fetch and store methods>
aggregate using RTL.AGGB
<SPARC basics>+= (U->) [<-D->] storage 'i' is 6 cells of 32 bits called "Integer-unit control/status registers" locations [PSR WIM TBR Y PC nPC] is $i[[0..5]] [EC EF S PS ET] is PSR@loc[1 bit at [13 12 7 6 5]]
The integer condition codes are also part of the PSR.
We put them in their own nested module, called icc,
so their association with condition codes will be more obvious.
<SPARC basics ((full))>= [D->] module icc is locations [N Z V C] is PSR@loc[1 bit at [23 22 21 20]] end val carryBit is icc.C
DefinescarryBit(links are to index).
In the simple version of the specification, we treat integer condition codes as an atomic unit.
<SPARC basics ((simple))>= locations ICC is PSR@loc[20..23] rtlop carryBit : #4 bits -> #1 bits val carryBit is carryBit ICC
DefinescarryBit(links are to index).
A reference to a cell like $m[a] normally means the cell by
itself---in this case, a single byte in memory located at address a.
But sometimes we want to talk about a 32-bit word at address a, or
even a 16-bit halfword, a doubleword, or a quad-word.
Often, lambda-RTL can figure out what is meant just from the context in
which $m[a] is used---but when it can't,
an explicit cast can be used to make the size explicit, thus: $m[a] : #32 bits.
Because the syntax of the casts can be awkward, we define
functions that do the casting.
<SPARC basics>+= (U->) [<-D->] fun byte x is x : #8 bits fun hword x is x : #16 bits fun word x is x : #32 bits fun dword x is x : #64 bits fun qword x is x : #128 bits
Definesbyte,dword,hword,qword,word(links are to index).
Given these functions, the result of, e.g., dword is guaranteed
to be a 64-bit value, even if dword is applied to an 8-bit byte in
memory.
These types have been gleaned automatically straight from the SLED specification.
<address modes and operands>= (U->) [D->] operand [ cd fd fs1 fs2 rd rdi rs1 rs1i rs2] : #5 bits operand asi : #8 bits operand simm13 : #13 bits operand imm22 : #22 bits
<address modes and operands>+= (U->) [<-D->] operand target : #32 bits operand annul : #1 bits
The SPARC has a simple structure for effective addresses and
operands.
As far as the hardware is concerned, there are only two addressing
modes, depending on whether the immediate bit is used.
We've chosen to let the default attribute of the addressing modes be
the address, not the location in memory denoted by that address.
The address is the value of register rs1, plus either the value of
register rs2 or the result of
sign-extending a 13-bit immediate value.
<address modes and operands>+= (U->) [<-D->] operand address : #32 bits default attribute of indexA(rs1, rs2) is $r[rs1] + $r[rs2] dispA (rs1, simm13) is $r[rs1] + sx simm13
The rmode and imode constructors produce the values used in
operands of type reg_or_imm.
<address modes and operands>+= (U->) [<-D] operand reg_or_imm : #32 bits default attribute of rmode (rs2) is $r[rs2] imode (simm13) is sx simm13
The SPARC architecture provides for not one but 256 possible address spaces. By default, load instructions use space 0x0A in user mode and space 0x0B in supervisor mode. The mode is determined by the value of the S bit in the processor state register. Values from other address spaces may be obtained by using the ``load from alternate space'' instructions, but these instructions are privileged. We omit all this complexity from our description, treating the machine as if it were always in user mode. This omission is partly for simplicity, but partly because lambda-RTL does not deal well with numbered collections of storage spaces.
The specifications of the load-integer instructions give us our first
opportunity to factor lambda-RTL descriptions.
On the left-hand side, the phrases [s u] and [b h] are like
for loops, and the carets (^) join parts of constructor
names, so the opcode on the left-hand side expands to the list of constructors
ldsb, ldsh, ldub, and lduh.
Corresponding to [s u] on the right-hand side is the ``expression
group'' [sx zx].
sx and zx were introduced in Section [->] to
stand for sign extension and zero extension, respectively.
Corresponding to [b h] is the expression group [byte hword].
These functions, defined above, produce aggregations of
8 and 16 bits.
Combining the two groups specifies four instructions---signed and
unsigned versions of byte and halfword loads.
<instruction defaults>= (U->) [D->] default attribute of ld^[s u]^[b h] (address, rd) is $r[rd] := [sx zx] ([byte word] $m[address])
This double factoring works smoothly because the two 2-groups are in the same order on the left- and right-hand sides.
We use another group to define load and load-double
instructions, which don't require sign extension or zero extension.
For ldd, lambda-RTL infers automatically that it has to aggregate
two 32-bit registers in order to hold the 64-bit value dword $m[address].
<instruction defaults>+= (U->) [<-D->] ld^["" d] (address, rd) is $r[rd] := [word dword] $m[address]
The chunks named
<instruction defaults>
define the default attributes of instructions.
We have chosen to use the default attributes to specify the ``main
effect'' of these instructions, but there are circumstances in which
the processor traps instead of executing this main effect.
Traps are not relevant to all specifications, so we have chosen to
give the trap semantics separately, by binding them to an attribute
named trap.
Again, with a proper modules system,
trap semantics could be omitted from some specifications.
All loads except byte loads could trap if the address is
improperly aligned.
The load-double also traps if rd is an odd-numbered register.
<trap semantics>= (U->) [D->] ld^["" sh uh] (address, rd) is alignTrap(address, [4 2 2]) ldd(address, rd) is alignTrap(address, 8) -- second test is incompatible with limited temporaries -- | rd@bits[0] <> 0 --> trap(illegal_instruction)
<SPARC utilities>= (U->) [D->] fun alignTrap (address, k) is address modu k <> 0 --> trap(mem_address_not_aligned)
DefinesalignTrap(links are to index).
The interaction of the trap semantics with the main semantics is not specified in lambda-RTL.
The instructions that load floating-point registers or coprocessor registers don't illustrate anything new, so we have omitted them from this example specification.
dword to force
register $r[rd] to be aggregated with register $r[rd+1].
<instruction defaults>+= (U->) [<-D->] sth (rd, address) is $m[address] := $r[rd]@bits[16 bits at 0] stb (rd, address) is $m[address] := $r[rd]@bits[ 8 bits at 0] st^["" d] (rd, address) is $m[address] := [word dword] $r[rd]
The trap semantics for store instructions is the same as for load instructions. A more aggressively factored specification might merge the trap semantics of the two groups.
<trap semantics>+= (U->) [<-D->] st^["" h] (rd, address) is alignTrap(address, [4 2]) std (rd, address) is alignTrap(address, 8) -- second test is incompatible with limited temporaries -- | rd@bits[0] <> 0 --> trap(illegal_instruction)
<instruction defaults>+= (U->) [<-D->] ldstub(address, rd) is $r[rd] := zx $m[address] | $m[address] := 0xff swap (address, rd) is $r[rd] := $m[address] | $m[address] := $r[rd]
Trap semantics for swap is like that for load and store.
<trap semantics>+= (U->) [<-D->] swap(address, rd) is alignTrap(address, 4)
To describe the effects of save and restore, we introduce a
fictional storage space w to stand for the locations where
registers are saved.
Some of these locations correspond to hardware register windows;
others correspond to reserved locations on the stack.
<register windows>= (U->) storage 'w' is cells of 32 bits called "register windows" 'W' is 1 cell of 32 bits called "register-window pointer" locations winptr is $W[0]
The location called winptr is analogous to the current window
pointer (CWP), but they are not identical.
At any moment during the execution of a program,
cells $w[0..winptr-1] hold the contents of registers that have
been saved with previous save instructions.
Other cells in w space have undefined
contents.
Figure [->] shows the layout of w space.
Layout of
winptr...
Space available for future saveswinptr-8Recently saved local registers winptr-16Recently saved in registers 0 ...
Previously saved registers
...
w space used to model register windows
[*]
The registers have aliases, which are shown in Table 4-1 in the SPARC manual [cite sparc:architecture]. To make it easier to specify the save and restore instructions correctly, we create functions that implement these aliases.
<SPARC registers>= (U->) module Reg is fun in' n is $r[n+24] fun local' n is $r[n+16] fun out n is $r[n+8] fun global n is $r[n] end
Definesglobal,in',local',out(links are to index).
We have to use the names in' and local' because in and local are
reserved words in lambda-RTL.
The true behavior of a save instruction is to decrement CWP and to
trap if the new value is invalid (according to the Window Invalid Mask
register).
The normal trap handler provides the illusion of infinite register
windows, by saving register windows on the stack and by adjusting the
WIM register.
We model this illusion as movement of out registers to in registers
and movement of in and local registers to the w space.
Global registers are unchanged by save; Figure [<-] helps
clarify what happens to the others.
<SPARC utilities>+= (U->) [<-D->] fun saveOut n is Reg.in' n := Reg.out n fun saveLocal n is $w[winptr+8+zx n] := Reg.local' n fun saveIn n is $w[winptr +zx n] := Reg.in' n
DefinessaveIn,saveLocal,saveOut(links are to index).
It's necessary to zero-extend n to get from a 5-bit register
number to a larger index into the w space.
We use functions from the built-in Vector structure to create
do8 such that do8 f applies f to the integers
0 to 7 and returns the simultaneous composition of the resulting effects.
<SPARC utilities>+= (U->) [<-D->] fun do8 f is Vector.foldr (\(n, effects). f n | effects) RTL.SKIP (Vector.spanning 0 7)
Definesdo8(links are to index).
If this idiom proves useful, we will provide syntactic sugar for it. Here's one possibility:
<possible syntactic sugar for Vector.foldr (not implemented)>=
fun do8 f is simultaneously for n from 0 to 7 do f n
Definesdo8(links are to index).
<incorrect instruction specifications>= [D->] save (rs1, reg_or_imm, rd) is ( do8 saveOut | do8 saveLocal | do8 saveIn | winptr := winptr + 16 | $r[rd] := $r[rs1] + reg_or_imm)
but the problem with this specification is that it is ill-formed
whenever rd is an ``in'' register---the
explicit assignment to rd conflicts with the assignment created
by do8 saveOut, and there is no way to say which one takes
priority.
We do not want to write the sequence
<incorrect instruction specifications>+= [<-D] save (rs1, reg_or_imm, rd) is ( do8 saveOut | do8 saveLocal | do8 saveIn | winptr := winptr + 16; $r[rd] := $r[rs1] + reg_or_imm)
because the assignment to $r[rd] would use the value of $r[rs1]
from the new register window, not from the old one.
One possible, if unpleasant, way out of this dilemna is to guard the
assignments created by saveOut, so that register rd is not
touched:
<SPARC utilities>+= (U->) [<-D->]
fun saveRegs rd is
let fun saveOut n is rd <> n+24 --> Reg.in' n := Reg.out n
fun saveLocal n is $w[winptr+8+zx n] := Reg.local' n
fun saveIn n is $w[winptr+ zx n] := Reg.in' n
in do8 saveOut | do8 saveLocal | do8 saveIn | winptr := winptr + 16
end
DefinessaveRegs(links are to index).
<instruction defaults>+= (U->) [<-D->] save (rs1, reg_or_imm, rd) is saveRegs rd | $r[rd] := $r[rs1] + reg_or_imm
We use similar tactics to specify restore.
<SPARC utilities>+= (U->) [<-D->]
fun restoreRegs rd is
let fun restoreOut n is rd <> n+8 --> Reg.out n := Reg.in' n
fun restoreLocal n is rd <> n+16 --> Reg.local' n := $w[winptr-8 +zx n]
fun restoreIn n is rd <> n+24 --> Reg.in' n := $w[winptr-16+zx n]
in do8 restoreOut | do8 restoreLocal | do8 restoreIn | winptr := winptr - 16
end
DefinesrestoreRegs(links are to index).
<instruction defaults>+= (U->) [<-D->] restore (rs1, reg_or_imm, rd) is restoreRegs rd | $r[rd] := $r[rs1] + reg_or_imm
The SPARC doesn't have a bitwise-complement instruction; instead, each
logical instruction has a variant that complements its second operand.
Using existing RTL operators,
we define functions to represent these ``logical-complement'' operations.
We use com for bitwise complement; not works only on booleans.
<SPARC utilities>+= (U->) [<-D->] fun [andn orn xnor] (a, b) is [and or xor](a, com b)
Definesandn,orn,xnor(links are to index).
The logical instructions include variants that set condition codes and
variants that leave condition codes unchanged.
Because this is a common pattern among SPARC instructions, we define
leave_cc and set_cc to help specify these two kinds of
effects.
Instructions that do set condition codes typically set the N and
Z bits according to the value of an integer result.
There is no single typical treatment of the O and C bits, so
we require that values for these bits be passed in.
<SPARC utilities ((full))>= [D->] fun set_cc(result, overflow, carry) is icc.N := bit (result < 0) | icc.Z := bit (result = 0) | icc.V := overflow | icc.C := carry fun leave_cc _ is RTL.SKIP
Definesleave_cc,set_cc(links are to index).
<SPARC utilities ((simple))>= [D->] rtlop encapsulate_cc : #32 bits * #1 bits * #1 bits -> #4 bits fun set_cc arg is ICC := encapsulate_cc arg fun leave_cc _ is RTL.SKIP
Definesencapsulate_cc,leave_cc,set_cc(links are to index).
The logical instructions clear overflow and carry.
<SPARC utilities>+= (U->) [<-D->] fun logical_cc (result) is set_cc(result, 0, 0)
Defineslogical_cc(links are to index).
The logical instructions are among the SPARC instructions that have an assembly-language syntax of the form
<x> rs1, reg_or_imm, rdwhere <x> is a binary operator. We define the function
binary to get the effect of this common form.
<SPARC utilities>+= (U->) [<-D->] fun binary (operator, rs1, r_o_i, rd) is $r[rd] := operator($r[rs1], r_o_i)
Definesbinary(links are to index).
This function by itself isn't sufficient for variants that set the
condition codes.
binary_with_cc combines the main effect with whatever effects are
produced by special_cc, a code-setting function that is passed in.
If this function is leave_cc, the condition codes won't
be changed.
<SPARC utilities>+= (U->) [<-D->] fun binary_with_cc (operator, rs1, r_o_i, rd, special_cc) is let val result is operator($r[rs1], r_o_i) in $r[rd] := result | special_cc result end
Definesbinary_with_cc(links are to index).
We use factoring and these utility functions to specify all the logical instructions at once.
<instruction defaults>+= (U->) [<-D->] [and or xor andn orn xnor]^[cc ""] (rs1, reg_or_imm, rd) is binary_with_cc([and or xor andn orn xnor], rs1, reg_or_imm, rd, [logical_cc leave_cc])
<SPARC utilities ((simple))>+= [<-D->] rtlop add_overflows : #n bits * #n bits * #1 bits -> bool
Definesadd_overflows(links are to index).
<SPARC utilities ((full))>+= [<-D->]
fun add_overflows (x, y, c) is
let val {result, carry} is add(x, y, c)
in x@bits[31] = y@bits[31] andalso x@bits[31] <> result@bits[31]
end
Definesadd_overflows(links are to index).
Integer addition takes three arguments and returns two results, so
we can't use binary_with_cc.
The sum goes in rd, plus
there may be effects on the condition codes.
<SPARC utilities>+= (U->) [<-D->]
fun add_instruction (rs1, operand2, carry_in, rd, {set_codes}) is
let val {result, carry is carry_out} is add($r[rs1], operand2, carry_in)
in $r[rd] := result
| set_codes -->
set_cc(result, bit(add_overflows ($r[rs1], operand2, carry_in)), carry_out)
end
Definesadd_instruction(links are to index).
The naming of the add instructions makes factoring a bit tricky.
A 4-group on the left-hand side corresponds to two 2-groups on the
right-hand side.
Groups are evaluated in left-to-right LIFO order, so the rightmost
group varies most rapidly, and so for example the addcc
constructor corresponds to the use of 0 for operand3 and the
use of add_cc for special_cc.
<instruction defaults>+= (U->) [<-D->]
[add addcc addx addxcc] (rs1, reg_or_imm, rd) is
add_instruction(rs1, reg_or_imm, [0 carryBit], rd, {set_codes is [false true]})
Reg64 structure shows how to use such a pair to hold a 64-bit
value.
To put the value in the pair, we put the most significant 32 bits in
the Y register and the least significant 32 bits in the
general-purpose register.
To recover the value, we zero-extend the general-purpose register to
64 bits, then insert the contents of the Y register in place of the
most significant 32 bits (which we know to be zeroes).
There are two ways to do this, and we show both of them.
The definition of get instantiates the zero-extension
operator zx explicitly with the size of its argument and result.
The definition of get' simply says that it returns a 64-bit value.
The resulting functions are equivalent.
<SPARC utilities>+= (U->) [<-D->]
module Reg64 is
fun set (reg, n) is Y := n@bits[32 bits at 32] | $r[reg] := n@bits[32 bits at 0]
fun get reg is bitInsert {wide is zx #32 #64 $r[reg], lsb is 32} Y
fun get' reg is bitInsert {wide is zx $r[reg], lsb is 32} Y : #64 bits
end
Definesget,get',set(links are to index).
The syntax for bitInsert could probably be improved.
The multiply instructions produce a 64-bit result, and they have their own way of setting condition codes, so we define another pair of auxiliary functions.
<SPARC utilities>+= (U->) [<-D->]
fun multiply (mul, rs1, r_o_i, rd, {set_codes}) is
let val result is mul($r[rs1], r_o_i)
in Reg64.set(rd, result) | set_codes --> set_cc (result, ?, ?)
end
fun mul_cc(result) is set_cc(result, ?, ?)
Definesmul_cc,multiply(links are to index).
An undefined, rather than unspecified, value is the right thing for the V and C bits, because the manual says that ``specification of this condition code may change in a future revision to the architecture. Software should not test this condition code.''
<instruction defaults>+= (U->) [<-D->]
[u s]^mul^["" cc] (rs1, reg_or_imm, rd) is
multiply([mulu muls], rs1, reg_or_imm, rd, {set_codes is [false true]})
The SPARC processor can branch on any of 16 conditions, each of which
is a function of the 4 condition-code bits.
Here, we give the tests abstractly, without showing the functions.
The val declaration of the 16 tests hides the previous declaration
of the same identifiers as RTL operators.
<SPARC utilities ((simple))>+= [<-D->]
module IccTest is
rtlop [A N NE E G LE GE L GU LEU CC CS POS NEG VC VS] : #4 bits -> bool
val [A N NE E G LE GE L GU LEU CC CS POS NEG VC VS] is
[A N NE E G LE GE L GU LEU CC CS POS NEG VC VS] ICC : bool
val tests is {A is A, N is N, NE is NE, E is E, G is G, LE is LE, GE is GE,
L is L, GU is GU, LEU is LEU, CC is CC, CS is CS, POS is POS,
NEG is NEG, VC is VC, VS is VS}
end
DefinesA,CC,CS,E,G,GE,GU,L,LE,LEU,N,NE,NEG,POS,tests,VC,VS(links are to index).
Here, we define the test conditions as specified in the manual.
We let Z, N, V, and C stand for the proper bits within
the condition code, and we use or, xor, and not
as
bit operations, not boolean operations.
Finally, after all the bit computations, we use the utility function
bool to turn the result into a boolean, so it can be used as a guard.
<SPARC utilities ((full))>+= [<-D->]
module IccTest is
val [A N NE E G LE GE L GU LEU CC CS POS NEG VC VS] is
let val [Z N V C] is [icc.Z icc.N icc.V icc.C]
val not is com -- make usage conform to manual
infixn 3 [or xor]
in bool [ 1 0
(not Z) Z
(not (Z or (N xor V))) (Z or (N xor V))
(not (N xor V)) (N xor V)
(not (C or Z)) (C or Z)
(not C) C
(not N) N
(not V) V]
end
val tests is {A is A, N is N, NE is NE, E is E, G is G, LE is LE, GE is GE,
L is L, GU is GU, LEU is LEU, CC is CC, CS is CS, POS is POS,
NEG is NEG, VC is VC, VS is VS}
end
DefinesA,CC,CS,E,G,GE,GU,L,LE,LEU,N,NE,NEG,POS,VC,VS(links are to index).
There is a branch instruction for each test.
The branch instructions assign to nPC, which is the SPARC model of
a delayed branch.
The code locally uses I to stand for IccTest.
<instruction defaults>+= (U->) [<-D->]
b^[a n ne e g le ge l gu leu cc cs pos neg vc vs] (target, annul) is
let val t is IccTest.tests
in [t.A t.N t.NE t.E t.G t.LE t.GE t.L
t.GU t.LEU t.CC t.CS t.POS t.NEG t.VC t.VS
] --> nPC := target
end
To specify when the
instruction in the delay slot should be annulled, we define an
attribute annul_delay.
Annuling occurs only if the annul bit is set.
For the conditional branches, there is an additional requirement that
the branch not be taken.
<other attributes of instructions>= (U->)
attribute annul_delay of
b^[_ _ ne e g le ge l gu leu cc cs pos neg vc vs] (target, annul) is
let val t is IccTest.tests
in (annul <> 0 andalso
not ([t.A t.N t.NE t.E t.G t.LE t.GE t.L
t.GU t.LEU t.CC t.CS t.POS t.NEG t.VC t.VS]))
end
[ba bn] (target, annul) is annul <> 0
We use the wildcard _ to throw away the tests for A and
N, because ba and bn have special rules.
<instruction defaults>+= (U->) [<-D->] call (target) is nPC := target | Reg.out 7 := PC jmpl (address, rd) is nPC := address | $r[rd] := PC
<trap semantics>+= (U->) [<-D->] jmpl (address, rd) is alignTrap(address, 4)
<sparc.rtl>=
module Sparc is
import [RTL Vector]
from StdOperators import
[add and andalso bit bitInsert bool borrow carry com divu
modu muls mulu ne not or orelse quot shl shrl shra subtract sx xor zx
--> := | ; = <> + - < <= > >= ? IEEE754
]
from StdOperators.IEEE754 import
[fadd fcmp fdiv f2f f2i fabs fmulx fsqrt i2f fmul fneg fsub]
<SPARC basics>
<SPARC registers>
<register windows>
<address modes and operands>
<window dressing for trap semantics>
<SPARC utilities>
default attribute of
<instruction defaults>
<other attributes of instructions>
attribute trap of
<trap semantics>
end
All that is left to do is to define what we mean by trap.
Ideally, we would like a lambda-RTL abstraction mechanism that would let
us define trap as ``an unspecified function from a 7-bit code to
an effect.''
Because lambda-RTL lacks suitable abstraction mechanisms, we have to specify a
concrete effect.
Rather than try to specify the complete semantics of traps, we model a
trap as an assignment to a nonexistent location.
<window dressing for trap semantics>= (<-U) storage 't' is 1 cell of 7 bits called "trap code" locations trap_code is $t[0] fun trap k is $t[0] := k
Definestrap(links are to index).
These bindings include only the ``trap type'' of a handful of traps.
We have omitted trap priorities.
The trap instruction uses a family of codes starting at 0x80.
<window dressing for trap semantics ((full))>=
val [data_store_error
illegal_instruction
mem_address_not_aligned
tag_overflow
trap_instruction
] is [0x2b 0x02 0x07 0x0a 0x80] -- 7-bit constants to be passed to trap
val trap_instruction is \code.trap_instruction+code
Definesdata_store_error,trap_instruction(links are to index).
In the simple version, we create an RTL operator for each trap code, so we can see them by name, not by number, in the processor supplement.
<window dressing for trap semantics ((simple))>=
rtlop [data_store_error
illegal_instruction
mem_address_not_aligned
tag_overflow
] : #7 bits
rtlop trap_instruction : #7 bits -> #7 bits
Definesdata_store_error,trap_instruction(links are to index).
[*] The preceding sections illuminate the issues involved in describing the SPARC. Here, we describe the rest of the instruction set, with minimal commentary.
<SPARC basics>+= (<-U) [<-D->]
locations
[ g0 g1 g2 g3 g4 g5 g6 g7
o0 o1 o2 o3 o4 o5 o6 o7
l0 l1 l2 l3 l4 l5 l6 l7
i0 i1 i2 i3 i4 i5 i6 i7 ] is $r[[0..31]]
[sp fp] is [o6 i6]
<SPARC basics>+= (<-U) [<-D->]
storage
'f' is 32 cells of 32 bits called "floating-point registers"
aggregate using RTL.AGGB
'F' is 1 cell of 32 bits called "floating-point state register (FSR)"
locations
FSR is $F[0]
RD is FSR@loc[30..31] --- rounding modes
fcc is FSR@loc[10..11]
We have chosen to use the floating-point rounding operator
as rnd_RD, that is, we make it a function of the value of
the rounding-mode bits.
Strictly speaking, the proper machine-independent thing to do is to
use two functions, by making rnd depend on the IEEE
rounding modes, then use roundingMode to compute the rounding mode
from the RD bits as follows:
<SPARC basics ((full))>+= [<-D->]
locations
RD is FSR@loc[30..31]
val roundingMode is [. IEEE754.Round.nearest, IEEE754.Round.zero,
IEEE754.Round.up, IEEE754.Round.down .] sub RD
DefinesroundingMode(links are to index).
Similar remarks apply to floating-point compare.
<SPARC basics ((full))>+= [<-D]
val floatingCompare is [. IEEE754.Comparison.=, IEEE754.Comparison.<,
IEEE754.Comparison.>, IEEE754.Comparison.unordered .] sub fcc
DefinesfloatingCompare(links are to index).
For compilation, we can get away with the scheme we're using by arranging for the RTL operators denoting the rounding modes and comparison results to have the right bit patterns for the machine we're interested in.
<SPARC basics>+= (<-U) [<-D->]
storage
'c' is cells of 32 bits called "coprocessor registers (implementation-dependent)"
aggregate using RTL.AGGB
'C' is 1 cell of 32 bits called "coprocessor status register"
locations
CSR is $C[0]
<instruction defaults>+= (<-U) [<-D->] ld^[f df] (address, fd) is $f[fd] := [word dword] $m[address]
The full semantics of the ldfsr instruction is quite dramatic.
Luckily it's irrelevant for many purposes.
<instruction defaults>+= (<-U) [<-D->] ldfsr (address) is FSR := $m[address]
Here's an imaginary picture of the full semantics. RTL operators of these types are flagrantly illegal.
<wishful thinking about ldfsr>=
rtlop drain_fpu_pipeline : rtl
-- `wait for all FPop instructions that have not finished execution
-- to complete' (p92)
rtlop delay : #n bits * rtl -> rtl -- delayed completion of an effect
default attribute of
ldfsr (address) is
drain_fpu_pipeline; FSR := $m[address] | delay(3, branch_fcc := fcc)
Definesdelay,drain_fpu_pipeline(links are to index).
<instruction defaults>+= (<-U) [<-D->] ld^[c dc] (address, cd) is $c[cd] := [word dword] $m[address] ldcsr (address) is CSR := $m[address]
<trap semantics>+= (<-U) [<-D->] ld^[c dc] (address, cd) is alignTrap(address, [4 8]) ldcsr (address) is alignTrap(address, 4)
<instruction defaults>+= (<-U) [<-D->] st^[f df] (fd, address) is $m[address] := [word dword] $f[fd] stfsr (address) is $m[address] := FSR
<instruction defaults>+= (<-U) [<-D->] st^[c dc] (cd, address) is $m[address] := [word dword] $c[cd] stcsr (address) is $m[address] := CSR
ldnext
and stnext, which take a register number but load and store the
adjacent register.
To refer to the floating-point register pair
(fd,fd+1),
I simply cast $f[fd] into a 64-bit location.
<instruction defaults>+= (<-U) [<-D->] ldnext(fd, address) is ($f[fd] : #64 loc)@loc[0..31] := $m[address] stnext(fd, address) is $m[address] := ($f[fd] : #64 loc)@bits[0..31]
I must temporarily hack ldnext because I can't generate C code
that assigns to a slice.
The hack must be voided because we can't deal with temporaries here.
<instruction defaults>+= (<-U) [<-D->] -- ldnext(fd, address) is $f[fd+1] := $m[address]
imm22 into a word of all zeroes.
<instruction defaults>+= (<-U) [<-D->]
sethi(imm22, rd) is $r[rd] := bitInsert {wide is 0, lsb is 10} imm22
<instruction defaults>+= (<-U) [<-D->]
[sll srl sra] (rs1, reg_or_imm, rd) is
$r[rd] := [shl shrl shra](32, $r[rs1], reg_or_imm@bits[5 bits at 0])
<SPARC utilities ((simple))>+= [<-D->] rtlop tag_overflows : #32 bits * #32 bits -> bool
Definestag_overflows(links are to index).
<SPARC utilities ((full))>+= [<-D->] fun tag_overflows(left, right) is left@bits[0..1] <> 0 orelse right@bits[0..1] <> 0 orelse add_overflows(left, right, 0)
Definestag_overflows(links are to index).
<instruction defaults>+= (<-U) [<-D->]
[taddcc taddcctv] (rs1, reg_or_imm, rd) is
add_instruction(rs1, reg_or_imm, 0, rd, {set_codes is true})
<trap semantics>+= (<-U) [<-D->]
taddcctv (rs1, reg_or_imm, rd) is
(tag_overflows($r[rs1], reg_or_imm) orelse add_overflows($r[rs1], reg_or_imm, 0)
--> trap(tag_overflow))
<SPARC utilities ((simple))>+= [<-D] rtlop sub_overflows : #32 bits * #32 bits * #1 bits -> bool
Definessub_overflows(links are to index).
<SPARC utilities ((full))>+= [<-D]
fun sub_overflows (x, y, b) is
let val {result, borrow} is subtract(x, y, b)
in x@bits[31] <> y@bits[31] andalso x@bits[31] <> result@bits[31]
end
Definessub_overflows(links are to index).
<SPARC utilities>+= (<-U) [<-D->]
fun subtract_instruction (rs1, operand2, borrow, rd, {set_codes}) is
let val {result, borrow is borrow'} is subtract($r[rs1], operand2, borrow)
in $r[rd] := result
| set_codes -->
set_cc(result, bit(sub_overflows($r[rs1], operand2, borrow)), borrow')
end
Definessubtract_instruction(links are to index).
The explicit fetch is here because of a flaw in the type-inference algorithm currently used by lambda-RTL. This algorithm is slated for replacement.
<instruction defaults>+= (<-U) [<-D->]
[sub subcc subx subxcc] (rs1, reg_or_imm, rd) is
subtract_instruction(rs1, reg_or_imm, [0 carryBit], rd, {set_codes is [false true]})
<instruction defaults>+= (<-U) [<-D->]
[tsubcc tsubcctv] (rs1, reg_or_imm, rd) is
subtract_instruction(rs1, reg_or_imm, 0, rd, {set_codes is true})
<trap semantics>+= (<-U) [<-D->]
tsubcctv (rs1, reg_or_imm, rd) is
(tag_overflows($r[rs1], reg_or_imm) orelse sub_overflows($r[rs1], reg_or_imm, 0)
--> trap(tag_overflow))
N xor V into $r[rs1].
$r[rd].
$r[rs1].
shiftIn function to shift a bit into a 32-bit register.
<instruction defaults ((full))>= [D->]
mulscc(rs1, reg_or_imm, rd) is
let fun shiftIn (n, bit) is bitInsert {wide is shrl(32, n, 1), lsb is 31} bit
val multiplier is reg_or_imm -- step 1
val v2 is shiftIn($r[rs1], xor(icc.N, icc.V)) -- step 2
val add_args is
(multiplier, if Y@bits[0] = 0 then 0 else v2 fi, 0) -- step 3
val {result, carry} is add add_args
in $r[rd] := result | -- step 4
set_cc(result, bit(add_overflows add_args), carry) | -- step 5
Y := shiftIn(Y, $r[rs1]@bits[0]) -- step 6
end
<SPARC utilities>+= (<-U) [<-D]
rtlop [sparc_udiv_overflow sparc_sdiv_overflow] : #64 bits * #32 bits -> #1 bits
fun divide({signed}, rs1, r_o_i, rd, {set_codes}) is
let val (operator, overflow) is if signed then ((divu), sparc_udiv_overflow)
else ((quot), sparc_sdiv_overflow)
fi
val result is operator(Reg64.get rs1, r_o_i)
val V is overflow(Reg64.get rs1, r_o_i)
in $r[rd] := result | set_codes --> set_cc (result, V, 0)
end
Definesdivide,sparc_sdiv_overflow,sparc_udiv_overflow(links are to index).
<instruction defaults>+= (<-U) [<-D->]
[u s]^div^["" cc] (rs1, reg_or_imm, rd) is
divide({signed is [false true]}, rs1, reg_or_imm, rd, {set_codes is [false true]})
Extra parentheses are needed because the division operators are normally infix.
To define the floating-point branches, we emulate the specification in the manual. One day we'll define extra attributes to cope with the fact that a delay is required between setting condition codes and testing them with this sort of instruction.
<instruction defaults ((full))>+= [<-D]
fb^[a n u g ug l ul lg ne e ue ge uge le ule o] (target, annul) is
let val [L E G U] is
fcc = [(IEEE754.Comparison.<) (IEEE754.Comparison.=)
(IEEE754.Comparison.>) (IEEE754.Comparison.unordered)]
val or is (orelse)
infixl 2 or
in [true false U G
(G or U) L (L or U) (L or G)
(L or G or U) E (E or U) (E or G)
(E or G or U) (E or L) (E or L or U) (E or L or G)
] --> nPC := target
end
If we don't want to see the details of the tests, we can make them abstract RTL operators.
<instruction defaults ((simple))>=
fb^[a n u g ug l ul lg ne e ue ge uge le ule o] (target, annul) is
let rtlop [fba fbn fbu fbg fbug fbl fbul fblg
fbne fbe fbue fbge fbuge fble fbule fbo] : #2 bits -> bool
in [fba fbn fbu fbg fbug fbl fbul fblg fbne fbe fbue fbge fbuge fble fbule fbo] fcc
--> nPC := target
end
rett is slightly different from restore in that there's no
destination register, so none of the asssignments need be guarded.
<instruction defaults>+= (<-U) [<-D->]
rett (address) is
let fun restoreOut n is Reg.out n := Reg.in' n
fun restoreLocal n is Reg.local' n := $w[winptr-8 +zx n]
fun restoreIn n is Reg.in' n := $w[winptr-16+zx n]
in do8 restoreOut | do8 restoreLocal | do8 restoreIn | winptr := winptr - 16 |
S := PS | ET := 1 | nPC := address
end
<instruction defaults>+= (<-U) [<-D->] t^[a n ne e g le ge l gu leu cc cs pos neg vc vs] (address) is RTL.SKIP
<trap semantics>+= (<-U) [<-D->]
t^[a n ne e g le ge l gu leu cc cs pos neg vc vs] (address) is
let val t is IccTest.tests
in [t.A t.N t.NE t.E t.G t.LE t.GE t.L
t.GU t.LEU t.CC t.CS t.POS t.NEG t.VC t.VS
] --> trap(trap_instruction address@bits[7 bits at 0])
end
<instruction defaults>+= (<-U) [<-D->] rd^[y psr wim tbr] (rd) is $r[rd] := [Y PSR WIM TBR]
<instruction defaults>+= (<-U) [<-D->] wr^[y psr wim tbr] (rs1, reg_or_imm, rd) is [Y PSR WIM TBR] := xor($r[rs1], reg_or_imm)
<SPARC basics>+= (<-U) [<-D] storage 'b' is 1 cell of 1 bit called "store-barrier-pending flag" locations store_barrier_pending is $b[0]
<instruction defaults>+= (<-U) [<-D->] stbar () is store_barrier_pending := 1
<instruction defaults>+= (<-U) [<-D->] unimp (imm22) is RTL.SKIP
<trap semantics>+= (<-U) [<-D] unimp(imm22) is trap(illegal_instruction)
StdOperators.IEEE754
<instruction defaults>+= (<-U) [<-D->]
f ^[s d q]^toi (fs2, fd) is $f[fd] := f2i([word dword qword] $f[fs2],
IEEE754.Round.zero)
fito^[s d q] (fs2, fd) is $f[fd] := [word dword qword] (i2f($f[fs2], RD))
fsto^[d q] (fs2, fd) is $f[fd] := [dword qword] (f2f(word $f[fs2], RD))
fdto^[s q] (fs2, fd) is $f[fd] := [word qword] (f2f(dword $f[fs2], RD))
fqto^[s d] (fs2, fd) is $f[fd] := [word dword] (f2f(qword $f[fs2], RD))
<instruction defaults>+= (<-U) [<-D->] fmovs (fs2, fd) is $f[fd] := $f[fs2] fnegs (fs2, fd) is $f[fd] := fneg $f[fs2] fabss (fs2, fd) is $f[fd] := fabs $f[fs2]
<instruction defaults>+= (<-U) [<-D->] fsqrt^[s d q] (fs2, fd) is $f[fd] := fsqrt [#32 #64 #128] ($f[fs2], RD)
<instruction defaults>+= (<-U) [<-D->] f^[add sub mul div]^[s d q] (fs1, fs2, fd) is $f[fd] := [fadd fsub fmul fdiv] [#32 #64 #128] ($f[fs1], $f[fs2], RD)
The ``exact multiply'' instructions do not round, and it is easiest to give the size of arguments and results explicitly.
<instruction defaults>+= (<-U) [<-D->] fsmuld (fs1, fs2, fd) is $f[fd] := fmulx #32 #64 ($f[fs1], $f[fs2]) fdmulq (fs1, fs2, fd) is $f[fd] := fmulx #64 #128 ($f[fs1], $f[fs2])
fcmp* and fcmpe* family have the same standard semantics,
but different trap semantics.
We haven't specified the trap semantics yet.
<instruction defaults>+= (<-U) [<-D] fcmp ^[s d q] (fs1, fs2) is fcc := fcmp [#32 #64 #128] ($f[fs1], $f[fs2]) fcmpe^[s d q] (fs1, fs2) is fcc := fcmp [#32 #64 #128] ($f[fs1], $f[fs2])
| CSDL Overview | RTLs | Lambda-RTL | Basic RTL Operator Library | SPARC Description | Pentium excerpts |
Vector.foldr (not implemented)>: D1
ldfsr>: D1
| CSDL Overview | RTLs | Lambda-RTL | Basic RTL Operator Library | SPARC Description | Pentium excerpts |