Most machines, from the simplest hammers to the most sophisticated appliances,
- have a finite set of possible operations (e.g., a car can accelerate, turn, and decelerate)
- use control input to determine the operation to use (e.g., the steering wheel and pedals provide input to a car)
- interact with data to determine the results of the operation (e.g., the pitch of the roadway)
- result in modification of some part of their environment (e.g., the position and velocity of the car)
A general-purpose computing machine is a special machine where the control, data, and modified environment are all the same. This allows one to have the computer modify its own instructions, expanding the capability of the finite set of core operations to perform arbitrarily complex work. General-purpose computing machines have become so pervasive that they are now commonly called simply “computers,” a term formerly reserved for a human occupation.
The first known general-purpose computer to be designed was Charles Babbage’s Analytical Engine, designed and partially constructed in the 1830s. Although Babbage designed it, it was Ada King-Noel, Countess of Lovelace, who first realized it was general-purpose and published the first algorithm for it, thus making Babbage the first computer engineer and Lovelace the first computer scientist. Babbage’s machine was never completed and no subsequent work built on Babbage and Lovelace’s pioneering efforts.
In 1933 Kurt Gödel and Jacques Herbrand defined a class of general recursive functions; in 1936 Alonzo Church expanded on this to define the lambda calculus; and later in 1936 Alan Turing defined an abstract notion of a universal computing machine, later known as the Turing Machine. In his 1939 Ph.D. dissertation Turing (with his Ph.D. advisor Alonzo Church) proposed that these three are all equivalent, and that no more computationally-powerful machine can exist. Known as the Church-Turing thesis, the postulate that no more powerful machine can exist cannot be proven but is generally acknowledged as true. Of the three models, Turing’s had the advantage that Babbage’s had had 100 years earlier of being a description of a plausibly constructable machine, but was not realistic enough to actually be constructed until decades later, when people started creating them as historical novelties.
Although many people were involved in the development of functional universal computing machines1, the clearest progenitor to how most computers are designed is John von Neumann, as described in the 1945 publication First Draft of a Report on the EDVAC. This model is the subject of our next section.
The von Neumann architecture consists of the following parts:
The memory unit, which acts like an array or list of numbers. Given an address (an index into the array) it gives back the number stored there; given an address and a number, it stores the number at that address.
- The control unit, which has several parts:
- A program counter (sometimes also known as an instruction pointer), a register that holds the address where the next instruction is to be found in memory;
- an instruction register that holds the instruction currently being executed;
- logic to adjust the program counter based on its current contents and the contents of the instruction register; and
- logic to control what the ALU does and how to send data to and from memory based on the contents of the instruction register.
- The arithmetic logic unit (usually abbreviated as ALU), consisting of
- A small number of program registers for storing operands and results; and
- a set of logical gates implementing basic arithmetic operations using those registers as inputs and outputs.
- Ability to communicate with external devices, such as disks, keyboards, screens, etc.
Exercise: Write the basic von Neumann architecture simulator in the programming language of your choice. You should include:
A list or array of 256
0s as a representation of
memory, in such a way you can easily change this to have different specific values inside it.
A list or array of 4
0s as a representation of the program
irto store the contents of the program counter and instruction register, respectively.
newPCfunction that returns the value the
pcshould have next based on the current
ir; you can just return
workfunction that uses the contents of the
irto decide what to do at each step; you can have it do nothing at all for now.
cyclefunction that does the following:
- simulate a rising clock edge by
- loading register inputs into registers
- simulate a rising clock edge by
runfunction that repeatedly:
- prints the value of
- prints the value of
- stops if
pcis negative or larger than
- prints the value of
If you run the
run function you should see something like
pc = 0 ir = 0; r0=0, r1=0, r2=0, r3=0
We’ll come back to this basic simulator and flesh it out in subsequent sections.
The core idea of a general-purpose computer is that the contents of memory describe the actions to take. But memory just stores bytes, and there is no intrinsic right set of things to do on these bytes. Various computational theorists have proven than even very limited sets of operations can be “complete” in the sense that they can be used to emulate everything else, but the general model is to have a range of actions that let programmers easily make the computer do what they wish.
The set of instructions a computer provides, together with how those instructions are encoded, defines the computer’s Instruction Set Architecture or ISA.
The most common three categories of actions are
- moves (corresponding to assignment operators in most programming languages),
- maths (corresponding to mathematical operators in most programming languages), and
- jumps (corresponding to control constructs in most programming languages).
One of the pillars of the imperative programming paradigm2 is the assignment operator
=. All it does is take a value which exists in one place and put it into another place, but as you have already seen that operation allows a great deal of flexibility.
When building a machine, making circuitry that is as general as programming’s
= is not trivially possible. Recall that our
work() function must decide what to do based only on information it can see, generally meaning the number stored in
register = register is going to have to have a different number for it than is
register = register; and both will be different from
register = memory or
register = memory[register].
That said, once we enumerate all of the moves we think are important, we can make them all happen in
work() with a long but simple
if-statement, or equivalently with a large mux in hardware.
=, the next building block of most imperative programming languages is binary operators. These are big networks of gates:
+ we’ve already looked at, and
- is very similar;
* is basically one
+ for each bit in the input numbers; and
%, while quite a bit more complicated still, can still be handled with a lot of gates.
There are other operators we’ll need too, like
>= and its friends, and boolean operators as well, but in the end they’re all just a bunch of gates.
To make the machine use them, though, we have an additional complexity. If there were a lot of ways to combine a source and destination for moves, there are even more ways to combine two sources and a destination for maths. Even using just 8 registers, there are 8^3^ = 512 possible combinations of
register[?] = register[?] + register[?], and more when we get to operands in memory or literals.
That said, the
work() for maths is not dramatically different from the
work() for move; there is a lot more circuitry in
if (ir == 12345): register = register * register than in
if (ir == 12345): register = register, but the logic needed to make it work is still basically a big mux.
While moves and maths have direct correlation to components of common programming languages, jumps are somewhat less direct.
The concept of a jump is a program instruction that specifies the new
pc as something other than the next instruction. These can enable loops:
|repeat forever:||1. do stuff|
|do stuff||2. more stuff to|
|more stuff too||3. run instruction 1 next|
|if x < y:||1. if not(x < y) skip to 3|
|do stuff||2. do stuff|
|more stuff too||3. more stuff too|
and also, by combining those, all the other kind of control constructs:
while loops are loops with a conditional redirection, function calls and returns are (mostly just) jumps to and from the function code, etc.
There is no
work() in most jumps; instead, they add a mux to the
There is no one “right” way to encode instructions into binary, but it is common to do this by having particular sets of bits with particular meanings. For example, we can specify one of 4 registers using 2 bits; one of 8 operations using 3 bits; etc. Thus, a simple way to create an instruction encoding is to figure out all the information any action will need, figure out how many bits we need to encode that, and then reserve a set of bits in the instruction for that information.
Example: Suppose we want the following operations:
- Move a literal value into a register
- Jump to a specific address if a given register is non-zero
- Move into a register a value found at a memory address stored in another register
- Move a value from a register into a memory address stored in another register
a = a op bwhere
opis one of
That’s eight operations, so we need 3 bits to encode which one is which. We also need 4 bits to encode up 2 registers and 8 more to encode a literal value, for a total of 15 bits. 15 bits fits into 2 bytes, so we can make our encoding be the following:
Look up the high-order 3 bits of the first byte on the following table:
move a literal into a register
jump to address if register non-zero
move from memory to register
move from register to memory
add one register to another
subtract one register from another
multiply one register by another
set one register to 1 if it is less than another, 0 otherwise
The low-order 4 bits of the first byte give two registers; if the action changes a register, that is given by the higher-order two bits; if a register is not changed, it is given in the lower-order two bits.
The second byte is the literal value (for
000) or address (for
This is of course not the only encoding we could make, but it does contain all the needed information.
Using this encoding, to encode
x = 3 while x < 50 x *= x
we’d first translate that into steps our example ISA has access to
- r0 = 3
- r1 = 50
- r0 = r0 × r0
r2 = r0
since we don’t have this operation, we’ll do it as
a. r2 = 0 b. r2 = r2 + r0
- r2 = 1 if r2 < r1 else 0
- if (r2 ≠ 0) jump to step 3
and then encode those instructions into binary (using
_ for unspecified bits
000 00__ 00000011 000 01__ 00110010 110 0000 000 1000 00000000 100 1000 111 1001 001 10__ ????????
Finally we have to decide where in memory we are putting this so we know what value to put into the
???????? jump address. Let’s put it at address 0, so the index of the multiply instruction is 4
Now we assume all
_ are 0, prepend a 1 bit to each instruction, and turn that into hex so we can put it into a simulator and get the following contents of memory:
0x80,0x03, 0x84,0x32, 0xE0, 0x88,0x00, 0xC8, 0xF9, 0x98,0x04
Exercise: Create an encoding for the following code that computes
r1 = r2 / r3 by implementing the following:
r1 = 0 while r1 * r2 < r3 r1 += 1
We can change the loop to a jump by doing
r1 = -1
r1 += 1
r1 * r2 < r3, go to step 2 again
and reduce that to the few operations we have available to us
- r2 = a numerator
- r3 = a denominator
- r1 = −1
- r0 = 1
- r1 = r1 + r0
- r0 = r1
- r0 = r0 × r2
- r0 = 1 if r0 < r3 else 0
- if (r0 ≠ 0) jump to step 4
Encode the above sequence of instructions into bytes. Remember to use the two’s complement encoding of −1.
Exercise: Modify your simulator from the last exercise to implement the example instruction set encoding listed above. In particular,
- Refer to the section on Masks to remember how to get specific bits out of a number in code.
loop()we’ll still have
memory[pc](the first byte only) for simplicity.
newPCwill generally return
pc + 2since instructions are 2 bytes long; but if the top three bits of
000and the register indicated by the low-order two bits of
0, it will return the second byte of the instruction (i.e.,
work()will need to consider the high-order three bits of
irand act as per the table in the example above.
Verify that your simulator runs the two example programs above. The code from the example should put 81 (0x51) into
r0; the code from the exercise should put
r2 / r3 into
Not all instructions need the same amount of information to be represented. Should instructions that need less information take up fewer bytes than instructions that need a lot of information?
For such a simple question, there does not appear to be a simple answer. Fixed-length instructions make some of what the computer does simpler, theoretically providing cost and power benefits; variable-length instructions make more efficient use of memory, also theoretically providing cost and power benefits. Intel and AMD are known for variable-length instruction sets, while IBM and Motorola have more fixed-length instruction sets.
Exercise: Modify your simulator so it increases the
pc by 2 only if the instruction needs 2 bytes (i.e.,
001); otherwise only increase the
pc by 1.
This should let us shorten our example program
0x00,0x03, 0x04,0x32, 0xC0,0x00, 0x68,0x00, 0xE9,0x00, 0x22,0x04
0x00,0x03, 0x04,0x32, 0xC0, 0x68, 0xE9, 0x22,0x04
removing 3 of the original 12 bytes (a 25% space saving). How much space does this save in the division program?
How should we be able to represent operands of operations? Many different representations are useful, leading to a a diversity of addressing modes:
- The most fundamental and universal addressing mode is to operate on values stored in registers.
- Immediate addressing refers to putting the value directly in the instruction. This roughly correlates to using literals in source code.
Immediate addressing is never used for destinations of operations. Theoretically it would be possible to store a result into the program memory itself, but this is not done by any mainstream ISA.
- Because access to memory is generally through a low-capacity interface, most ISAs limit at most one operand to be in memory and some limit all memory addressing to move operations only.
There are multiple sub-types of memory addressing. The three basic building-blocks are:
- Immediate address
- The address is available as a literal value in the code. This is useful for encoding global variables and function addresses.
- Register address
- The address is available as the contents of a program register. This is useful for passing references to mutable values in variables.
- Relative address
- For various reasons (space efficiency, relocatable code, etc) it is sometimes convenient to compute addresses relative to some reference address, such as the current program counter or a dedicated base address register.
These three blocks can be usefully combined in various ways. For example, one common addressing mode in Intel’s x86 architecture is “Relative Immediate1 + Register1 + Register2 × 2Immediate2”, where Immediate2 is limited to numbers no larger than 4. While that may seem strangely specific and ideosyncratic, it is useful for several common code patterns, such as
Picking the right set of addressing modes in an ISA is challenging. Fixed-length encodings in particular often face problems in these, as something as simple as a plain immediate value requires that instructions contain more bits than does any legal constant. To void extra code bloat to make every instruction large enough to fit a full-length immediate value, mainstream ISAs that use fixed-length encoding generally can represent only a small subset of integer literals as immediate values and require all others to be constructed by instruction sequences, sometimes adding customized instructions to enable that construction to be relatively efficient. Variable-length ISAs do not suffer from that limitation, but generally have quite complicated rules on how to determine the meaning of each instruction anyway so there is little gain in clarity.
In addition to program registers and memory, processors generally store additional information. We’ve already seen a few of these: the PC and IR are registers but not program registers because they cannot be directly addressed by programs. Modern processors have hundreds of other similar hidden support state, a few of which are important enough to deserve additional conversation.
Many processors have a set of registers set automatically by some operations, called condition codes. The can be used to determine how certain other operations behave.
In the x86 series of ISAs, for example, every mathematical operation not only computes its resulting value but also sets several condition code bits that describe the result of the computation. These are used to determine if all conditional jumps (and other conditional operations) are executed or not. Thus instead of writing
r0 = 1 if r0 < r3 else 0 if (r0 ≠ 0) jump to step 4
in x86 you’d write the equivalent of
r0 -= r3 if (last result < 0) jump to step 4
The ARM family of ISAs also use similar condition codes, but unlike x86 where only a few instructions are conditional, in ARM 7 almost every instruction is conditional.
Other ISAs, such as PowerPC and MIPS, do not make as extensive use of condition codes, instead using explicit condition operands in conditional operations. Even these still have some condition codes, however, to track things like integer overflow and other unusual conditions.
There was a period in which enough code was written directly in machine language that hardware developers chose to start adding higher-level abstractions directly into their ISAs. While all of these still exist in older ISAs like x86, most are no longer much used. One, however, is very pervasive: the stack operations.
The concept of a stack, as implemented by the x86 ISA, is easier to explain in code than words:
stack_memory = big contiguous chunk of memory stack_pointer = length_of(stack_memory) how to push(value): 1. stack_pointer -= 1 2. stack_memory[stack_pointer] = value how to pop(): 1. result = stack_memory[stack_pointer] 2. stack_pointer += 1 3. return result
Example: Consider the following code:
push(3) push(4) push(5) push(6) pop() x = pop() pop()
At the end of the code
x contains the value
5 and the stack still contains one value (the
Exercise: Consider the following code:
push(3) push(4) x = pop() push(5) y = pop() z = pop()
What are in
z at the end of the code?3
In x86-64, the
stack_pointer in the above pseudocode is a program register,
rsp. The ISA instruction
pop rax does the equivalent of
rax = pop() in the above pseudocode: that is, it reads a value from memory, stores that value in
rax, and also increases the value in
pop, as well as related instructions like
ret, are extremely prevalent in code in the wild today.
For various reasons, some historical and some performance-based, some instructions assume particular registers are part of their operation. This is particularly common in x86 and x86-64.
Example: Intel’s ISA manual, volume 2, defines the 64-bit version of the
div instruction as dividing
rdx:rax by the source operand and storing the quotient in
rax, the remainder in
rdx. In other words, this is a 128-bit integer divided by a 64-bit integer with both result and remainder returned, using three source registers and two destination registers; but only one of the source registers is able to be specified.
This results in assembly code for
answer = numerator / denominator looking something like
- move numerator into
- zero out
It is also fairly common to have the computer processor logically or physically divided into distinct subprocessors, each with a specialized set of instructions and registers. x86, for example, uses separate registers for floating-point maths than for integer maths, and has more registers for various vector operations and so on.
In the 1990s it was common for computer architecture textbooks to describe two kinds of ISAs: the RISC (reduced instruction set) and the CISC (complex instruction set). These do not describe any major production ISAs, but are still common terms in computer architecture discussions. A summary follows:
|RISC||CISC||Today we see|
|Fixed-length encoding||Variable-length encoding||Both|
|Few, simple instructions||Many, complicated instructions||Mostly CISC|
|Few addressing modes||Many addressing modes||More CISC|
|Uniform operand notation||Idiosyncratic operand notation||Both|
|Many program registers||Few program registers||More RISC|
|Clean new design||Backwards-compatible design||Both|
|All program registers are interchangeable||Some operations only work with specific registers||More CISC|
|1 instruction = 1 cycle||some instructions take several cycles||CISC|
|Immediate values cannot represent all numbers||Immediate values can represent all numbers||Both|
|All maths operands are registers||One maths operand may be in memory||Both|
|Some instructions may not follow others||No limitations on instruction ordering||More CISC|
|Conditions are in registers||There are special “condition codes”||More CISC|
|Function arguments usually passed in registers||Function arguments usually passed on the stack||More RISC|
There are also architectures with significant design decisions beyond those in this table, such as the compiler-controlled instruction scheduling of the Itanium architecture, but they are beyond the scope of this course.
Executing most instructions involved several steps, some of which depend on others. For example, the
push instruction involves the following:
While each instruction has its own dependency graph of steps, they can all be fit into the following sequence:
- Fetch an instruction from memory
- Decode what the instruction intends to do
- Retrieve values from the register file
- Execute any math or logic operations the instruction requires
- Access memory
- Update values in the register file
Naming the steps.
Every computer architecture text I have seen names the steps in the above sequence, but the exact set of steps and names, as well as the set of work included in each step, varies. The names “Fetch”, “Decode”, and “Execute” are almost always included, but which one includes accessing program registers varies. Some texts also name steps “Memory”, “Writeback”, and/or “PC Update”.
Official ISA documentation also tends to use these names in some form, though often with additional steps as well.
This decomposition of work into steps is not merely academic; processor throughput of instructions can be dramatically increased by pipelining these steps. “Pipeline” is the name used in hardware design for the manufacturing notion of an assembly line: when the part of the chip that fetches instructions finishes fetching the first instruction it passes it off to the next part of the chip and proceeds to fetch the second instruction. Thus, at any given time the processor might have many instructions in different stages of completion.
There are many additional issues that arise when pipelining. A few of the more common techniques used to resolve these issues include:
- Instruction dependencies may need to cause one stage to stand idle, waiting for an earlier instruction in a later stage to finish.
- One instruction may grab information from a half-completed instruction further down the pipeline.
- A later instruction may be executed before an earlier one to give otherwise-stalled stages something to do.
There is indirect evidence that many of them were anonymous women: many women in part because most people with the profession “computer” were women and because there was a shortage of male labor during the second world war when computers were being pioneered; anonymous because the culture of the day did not generally acknowledge female input or provide scope for their leadership. ↩
It is likely that if you have learned to program but have not learned about programming paradigms, you learned a variant of the imperative paradigm. There are other approaches to programming that do not depend on variable assignment, the two most famous being the functional paradigm (where recursion is the key to expressivity) and the logical paradigm (where power is obtained from clever application of horn clause reduction rules in formal logic). Computer hardware that uses these paradigms directly is unknown to the author of this text. ↩
Answer: x = 4, y = 5, z = 3 ↩