Computer Architecture (Exam Revision)
1.1 The von Neumann Architecture
Four basic hardware components:
- Main memory
- Input devices
- Output devices
These components are all connected to each other by the bus, which is simply a group of wires connecting the components.
1.4 CPU - controls and performs data processing
The CPU consists of an ALU (performing data processing) and a controller - controlling the program execution. The popular fetch-execute cycle used in computer to execute a program of a sequence of instructions stored in main memory:
- fetch - an instruction from the memory.
execute - the instruction, with steps including
- decode - the instruction (in binary form)
- execute - the instruction by sending control signals to the ALU and other parts of the computer
- send - the result back to memory and/or an output device
Provides for communication among CPU, main memory and IO devices. During the execution of a program, there are there sorts of data flowing along the bus:
- address bus - for passing the addresses of instructions and data in memory
- data bus - for transferring the instructions and data themselves
- control bus - for issuing control signals such as reading or writing
2.x Computer generations
- Vacuum tubes 45-55
- Transistors 55-65
- ICs 65-80
- VLSI 80-present
3.0 Computer levels
Level 7 - Applications level
Computer viewed as a machine which performs a specific function. A user at this level shouldn't be concernted with any details of computer science or engineering.
Level 6 - High-level language level
User writing programs in high level languages such as C++, Java, C#, and a compiler is used to convert the high level language instructions into lower level machine chode or assembly code.
The computer at this level is also called a virtual machine as it appears to the user as if the computer directly executes programs written in high level langauge.
Level 5 - Assembly language level
Computer viewed as being run by a program of so called assembly level instructions. Instructions that explicitly move data around between physical memory addresses, registers, I/O ports, and instructions that transform or combine their contents.
To be effect at this level, the user must have a model of the machine. Its instruction set. Assembly language is translated into machine language using an assembler.
Level 4 - Operating system level
At this level, OS software is in charge of running assembly language as well as handly memory management, process scheduling and so on.
Level 3 - Machine language level
Same sort of instructions as level 5 except they are coded as numbers instead of letters. They are ready at this stage for execution in the processor. Human programmers have greater difficulty understanding the code written for this level
Level 2 - Microprogram level
This level is concerned with the underlying engine that implements individual level 3 instructions by appropriately directing the components and their interconnections at that level.
Level 1 - Digital logic level
The level of gates and memory cells. The language is made up of 1s and 0s, or binary numbers
1.2 Transistor used as a switch
Transistors operate in a so-called switching mode.
1.3 Capacitor as computer memory
A capacitor is an electrical device which can be used to store a charge by application of a voltage. The simplest type of capacitor consists of two parallel plates with a voltage applied across them. A capacitor is characterized by its ratio of charge stored to applied voltage. Called capacitance, measured in Farads.
Two different physical states: 1 - Charged and 0 - discharged
2.3 Representing integer numbers in binary format
1. Divide decimal number by 2 and record the remainder
2. Keeping dividing the quotient obtained until the new quotient is 0
3. Concatenate remainder values in reverse order to get the value
2.4 Binary numbers in computers
To represent significant amounts of information, bits are manipulated in groups. Always in a fixed-length format. eg, 8 bits, 16 bits, 32 bits or 64 bits. (1 byte, 2 bytes, 4 bytes and 8 bytes respectively)
Always write binary numbers in a required fixed-length format. Every bit must be defined.
Given the binary number, the leftmost big is the most significant bit. The rightmost bit is the least significant bit. The number that has the greatest impact on the size of the number (ie, the leftmost bit) is the most significant.
2.5 Hexadecimal number respresentation
Converting a binary number to a hexadecimal number
1. Group every four bits in the binary number (append zeros at the MSB position if needed to make a group of 4
2. Replace each group by its corresponding hexadecimal value
Always write binary in groups of 4 bits to make it easier to convert
$ is used for representing a hexadecimal number.
2.6 Converting integers between decimal and hexadecimal
Converting decimal to hexadecimal
This is similar to the method for converting a decimal number to binary, expect this time divide by 16 instead of 2 and record the remainders. The remainders read backwards are the hexadecimal digits.
Converting hexadecimal to decimal
Note the base numbers for hexadecimal number systems are:
4096 (163) 256 (162) 16(161) 1 (160)
eg. $3CE = 3*256 + 12*16 + 14
2.7 Capacity of representation
An n-bit binary number can represent 2n different values
2.8 Binary arithmetic
The product of two numbers can be calculated using the shifting and adding algorithm. The multiplication of 2 4-bit numbers results in an 8-bit result. The result of (byte x byte) must be stored at least in a 2-byte unit, or accuracy will be lost.
Mask operation: using logical AND operation to force unwanted bits to zero, while leaving the other bits of interest unaffected.
Set operation: using logical OR to set chosen bits to one while leave the other bits unaffected.
3.2 Sign and magnitude representation
With n bits, one way of representing a signed number is to use the MSB to indicated the sign of the number. Convention is 0 for positive and 1 for negative, and the remaining bits to represent the magnitude of the number.
With n bits, the range is
-(2n-1-1) to +(2n-1-1)
3.3 Complementary representation
A negative number is represented by the two's complement of the corresponding positive number. As a result, subtraction can be implemented using addition, so no subtraction (hardware or software) is required.
3.3.1 Obtaining the two's complement of a binary number
1. Invert each bit in the number, i.e 1->0, 0->1
2. Add 1
Method for adding two numbers with two's complement
Subtraction is performed using addition, with the following procedures:
1. Replace the negative numbers in the formula by their two's complements (using the above method)
2. Perform addition with the complements
3. Discard the carry out (if any) on the MSB side.
3.3.3 Interpreting numbers in two's complement arithmetic
In the two's complement domain, the base number associated with the MSB is negative - used to represent negative numbers
3.4 Ranges of integer numbers
|Byte (8-bit)||Two bytes integer||Four bytes integer|
Overflow happens when the result of an operation does not fir the representation being used
Positive Overflow: positive + positive = negative
Negative Overflow: (discard carry) negative + negative = postive
3.6 Biased representation
Signed numbers may be converted into unsigned numbers by adding a positive constant bias b.
The values are stored as positive numbers after the bias has been added. The actual signed value is retrieved by subtracting the bias from the stored values.
Biased representation is not efficient in terms of computation. It is used to encode the expenent part in IEEE floating point numbers.
4.2 Binary representation of real numbers
Method for finding the binary representation of a fraction
1. Multiply by 2
2. Strip the integer part from the result
3. Repeat until 0 is reached, or the desired degree of precision
4. The stripped integers, read forwards, are the fractional binary digits
4.3 IEEE standard 754
Real numbers represented using fixed-length binary numbers, in a floating-point format.
V = ( -1 ) s x 1.f x 2e-b
s = a sign bit. 0 = positive, 1 = negative
e = exponent in excess b
f = fractional mantissa (bit after the binary point)
The sign-and-magnitude method is used to represent the sign of a real number (decided by the sign bit
s). The biased method is used to represent the exponent. A hidden bit is implied, which increases the representation capacity without any hardware cost.
Range of representation for
Limited length representation has limited precision. Use longer units (such as
long double) to reduce the chances of getting into underflow or overflow, and increase the presentation precision.
NaN is typically obtained if 0 is divided by 0. Any arithmetic operations involving NaN always gives a NaN answer.
5.0 ASCII and Unicode
UTF-8 Uses one byte to encode English characters, and uses up to four to encode the other characters. Widely used in email systems and on the Internet
UTF-16 Uses two bytes to encode the most commonly used characters, and uses four btes to encode each of the additional characters.
A half adder adds two single bits to give a sum and a carry.
S = (notA and B) OR (A and notB)
C = A and B
A full adder also adds together two single bits, but it takes into account a Carry in, to generate a sum and a carry out. A full adder can be built from two half adders and an OR gate.
4.2 Arithmetic and logic unit (ALU)
A one bit ALU (a bit-slice ALU) takes three single bits, A, B, and Carry-in as input and performs four operations simultaneously on them.
- A and B
- A or B
- not B
- A + B + Carry-in
Two function bits F0 and F1, determine which operation result is passed to the output.
An n-bit ALU can be made by connecting n one bit ALUs together.
5.0 Register and Memory
Registers refer to memory units located within the CPU, while memory refers to memory units outside the CPU which are connected to the CPU through system busses.
5.1 D flip-flop as memory unit and the construction of register
Used to store a single bit. Two inputs, D (data) and C (clock). Output Q.
When C = 1, Qnext = D
When C = 0, Qnext = Q (no change)
These can be used to build CPU registers. Because of the speed of the circuits making up the registers, and their proxmity, data transfers to and from a CPU register is normally an order of magnitude faster than access to the external memories. Can also be used to build static RAM
Data from the inputs will only be stored when the clock signal is high.
A memory is a sequence of linked cells, each cell is of a fixed size (typically a byte) and can store a binary number. Memory cells are numbered, so that each cell can be located by a unique number, called a memory address.
In computers, a decoder is used to convert number addresses to electronic signals to locate the required memory cells.
Random access vs Sequential access. Random access means that memory at any address can be read from or written to in any order. Sequential access menas that to get to memory address N, memory addresses have to be read from 0 - N sequentially. An example of this is tape.
RAM stores OS information, Applications, Users' programs and data.
Static RAM: a single bit of memory is simply a d flip flop circuit and requires only continuous power to maintain its state.
Dynamic RAM: a bit of memory is a storage capacitor in either the charged or discharged condition. The term dynamic refers to the need to periodically renew or refresh the slowly discharging capacitor.
Both types are volatile. SRAM is simpler to use, and about ten times faster, and more reliable than DRAM. SRAM is more expensive, consumes more power and requires more physical space. DRAM is what most computer memory modules are made from. It must be refreshed hundreds of times per second in order to maintain the data stored in it.
Cannot be modified, used to store the inital program that runs when the computer is powered on or otherwise begins execution.
5.3.1 Addressing Memory - Decoder
Memory cells are located by using their number addresses. Decoders use n address bits to select one memory cell among 2n memory cells.
It should be noted that when data is going to be written, the data is made available to all cells. It is not stored unless the clock signal for any particular cell is high, which is selected using the address decoder.
5.3.2 Memory Size Measure
1024 bytes = 1KB = 210 bytes
1024 KB = 1MB = 220 bytes
1024MB = 1GB = 230 bytes
1024GB = 1TB = 240 bytes
6 Timing Circuits
Used to synchronize all operations in a computer system. Generates square waves at constant intervals. Nearly everything that happens in a computer happens on the rising (or falling depending on the design) edge of the pulsed signal. A machine instruction typically takes several cycles to complete. Generally a faster clock signal leads to a higher processing speed.
2.1 CPU registers
Located within the CPU. Not visible to high level programming languages, but accessible with assembly languages. Because of the speed of the circuits making up the registers, and their proximity, data transfers to/from the CPU registers is normally an order of magnitude faster than to/from external memory.
Registers are generally divided into two groups:
General purpose registers: Usable in assembly programs
Special purpose regsiters: Used exclusively by the CPU for the control of the execution of programs.
Data registers can be used to store temporal data or frequently-used data during a calculation, which will reduce the number of memory access, thereby speeding up the execution.
2.1.2 Address registers used as data registers
Using as data registers: When being used this way, only word and long word operations are available for use; no instructions operation on the individual bytes of an address register.
Using to address memory:
Memory map: main memory divided up into different regions for OS, user program code, data, IO etc.
DS define storage - label DS.size # - label, address of first storage, size, Byte, word or long, #, how many to reserve.
DC define constant
5.1 Condition code register (CCR) and condition flags
A special purpose 8-bit register containing five flag bits, which are set by the ALU immediately after an arithmetic or logic operation is performed to provide information about the result of the operation, but does not provide the result itself in most cases.
The values of these flags are used to form the conditions for branches.
Carry - C: Set to 1 if an addition operation produces a carry, otherwise set to 0
Overflow - V: Only useful for operations on signed integers. Set to 1 if the addition of two like-signed numbers produces a result that exceeds the 2's complement range of the operand; otherwise set to 0
Zero - Z: If the result is zero, set to 1, otherwise set to 0
Negate - N: Meaningful again only in signed number operations. Set to 1 if a negative result is produced, otherwise set to 0. The flag follows the MSB for an 8, 16 or 32-bit operand.
Extend - X: This bit functions as a carry for multiple precision operations.
5.2 Unconditional branch instructions
Program Counter PC: special purpose register, which stored the memory address of the instruction to be executes. After an instruction is fetched into the CPU for execution, the PC will be automatically updated to the address of the next instruction.
Without intervention, PC will be incremented by one instruction location at a time. Therefore instructions stored in consecutive memory locations will be executed in sequence.
1.2 Instruction Cycle
Fetch execute cycle
1. Fetch an instruction from memory
2. Decode the instruction
3. Fetch operand (if required)
4. Execute the instruction
5. Write back the result generated by the operation (if applicable)
The address of the instruction to be fetched within each cycle is given by the contents of the program counter. Thus after fetching an instruction, the contents of PC must be updated to point to the next instruction
More detailed fetch execute cycle
- Copy instruction pointed to by PC from memory into instruction register
- Increment PC to address of the next instruction
- Decode the op-code into set of signals needed to control the ALU and other components
- If instruction requires operands, get them (from memory, IO) and load into CPU's internal registers
- Execute the instruction under controlling signals from the CU
- If required, write result into appropriate register, memory address or IO
- Recognise pending interrupts
1.3 Control Unit (CU)
The circuitry that decodes and implements each instruction held in the instruction register (IR), by controlling the functional elements that internally compose a CPU. There are two different approaches which have been used to implement the CU and instructions:
Each instruction is executed directly by logic circuit - hardware.
Each assembly instruction could be translated into a sequence of even more primitive instructions called microinstructions. They specify step-by-step control signals for the hardware circuitry to implement the instruction.
The microinstructions associated with an assembly instruction are called a microprogram.
1.3.3 Microprogramming vs hardwiring
- Allows arbitrarily complex instructions to be built-up
- More flexible
Capable of decoding and executing an instruction in one clock period. This is a lot faster the microprogrammed equivalent, which would usually require more than one cycle to complete an instruction
2.1 Memory Hierarchy
Each decreasing level in the hierarchy consists of modules of larger capacity, slower access time, and lower cost/bit. The goal of the memory hierarchy is to try to match the processor speed with the rate of information transfer from the highest elemtn in the hierarchy.
2.2 Cache Memory
Temporal locality: The word referenced now is likely to be referenced again soon. Hence it is wise to keep the currently accessed word handy for a while
Spatial locality: Words near the currently referenced word are likely to be referenced soon. Hence it is wise to prefetch words near the currently reefrenced word and keep them handy for a while.
A cache is a small fast memory between the processor andmain memory. It contains a subset of the contents of the main memory.
Because of the high speeds involved with the cache, manamgent of the data transfer and storage in the cache is done in hardware - the OS doesn't know about the cache.
2.3 Main Memory
A collection of words used for storing programs or data. Each word may consist of one or more bytes. Physically, a memory of N words can be constructed using an N word SRAM or DRAM.
A byte in memory is often called a memory cell. Each cell in the memory can be located uniquely by its address.
The arragement of multi-byte data in memory follows one of two conventions set independently by Intel and Motorola.
When transferring multi-byte data from one machine to another with different byte ordering, the byte order of the target data has to be appropriately reversed before they can be correctly interpreted by the target machine. Ignoring this will cause serious errors.
Store the low-order byte first and the high-order byte last.
Store the high-order byte first and the low-order byte last.
2.4 Memory Mapped IO
Same address bus to address both memory and IO devices. The memory and registers of the IO devices are mappted to address values. So when an address is accessed by the CPU, it may refer to a portion of physical RAM, but it can also refer to memory of the IO devce.
This, the CPU instructions used to access the memory can also be used for accessing devices. To accommodate the IO devices, areas of the addresses used by the CPU must be reserved for IO and must not be available for normal physical memory.
3.1 Interrupt and IO
Busy status programming: Keeping the CPU busy in a loop which repeatedly tests the interface for a change, wasting much time and performance waiting for something that may not happen.
Interrupt request signal: Let the keyboard tell the CPU by sending an interrupt request signal that is has been used and needs attention. This is called interrupt programming.
An interrupt is an event that forces the CPU to break the sequence of actions to jump to execute an interrupt-service routine (a piece of program stored at a different location, designed to handle the interrupt condition.
Interrupts occur at a time not controlled by the programmer. They can be generated by hardware or software.
3.3 IO Modules (IO Interfaces)
External devices are not generally connected directly into the bus structure of the computer. A wide variety of devices require their own respective logic interfaces because of things like mismatch of data rates, different data representations.
The IO module or interface provides a standard interface to the CPU and the bus, which is tailored to specific IO devices and its interface requirements. It relieves the CPU of the management of the IO devices.
The interface consists of control signals; status signals; data signals.
Actions taken by CPU when an interrupt request is accepted:
- Stop current execution
- Push PC and CCR on system stack
- Set new PC to start interrupt-service routine
The reason for storing the old PC is so that the CPU knows from which instruction to resume the interrupted main program, after the interrupt is serviced. The reason for saving the old CCR is to prevent important information for the main program from being altered by the interrupt, as an interrupt can occur at any point in a program.
4.2 Bus system
Typical buses consist of 100-150 lines and are divided into 3 parts.
Bus performance is limited by data propagation delay through the bus itself. Longer buses require longer delays. Aggreate demand for access to the bus from all devices connected to the bus.
To avoid bottlenecks, multiple buses are used in most systems, which is hierarchical, provides high-speed limited access buses close to the processor, and slower-speed general access buses farther away from the processor.
- Used to pass addresses for memory or peripherals - specifying source/destination of data transfer
- Width determines the capacity of the system
- Used to pass program instructions and data
- Width is key in determining overall performance
- Used to pass control signals to control access to and use of the memory and peripherals