The Processor
The Processor
The Processor
What is MIPS ?
Multiple revisions of the MIPS instruction set exist, including MIPS I, MIPS II,
MIPS III, MIPS IV, MIPS V, MIPS32, and MIPS64.
The current revisions are MIPS32 (for 32-bit implementations) and MIPS64 (for
64-bit implementations).
A Basic MIPS Implementation
It includes a subset of the core MIPS instruction
set like:
• The memory-reference instructions load word
(lw) and store word (sw)
• The arithmetic-logical instructions add, sub,
and,or.
• The instructions branch equal (beq) and jump
(j).
An Overview of the Implementation
For any instruction execution, the following 2 steps are common
• Once the register operands have been fetched, they can be operated on to compute a
memory address (for a load or store), to compute an arithmetic result (for an integer
arithmetic-logical instruction), or a compare (for a branch).
• If the instruction is an arithmetic-logical instruction, the result from the ALU must be
written to a register.
• If the operation is a load or store, the ALU result is used as an address to either store a
value from the registers or load a value from memory into the registers. The result
from the ALU or memory is written back into the register file.
• Branches require the use of the ALU output to determine the next instruction address,
which comes from either the ALU (where the PC and branch offset are summed) or
from an adder that increments the current PC by 4.
The basic implementation of the MIPS subset including the
necessary multiplexors and control lines.
• The top multiplexor controls what value replaces the PC (PC + 4 or the
branch destination address); the multiplexor is controlled by the gate that
“ands” together the Zero output of the ALU and a control signal that
indicates that the instruction is a branch.
• The multiplexor whose output returns to the register file is used to steer
the output of the ALU (in the case of an arithmetic logical instruction) or
the output of the data memory (in the case of a load) for writing into the
register file.
• The added control lines are straightforward and determine the operation
performed at the ALU, whether the data memory should read or write,
and whether the registers should perform a write operation. The control
lines are shown in color to make them easier to see.
The single-cycle data path conceptually must
have separate instruction and data memories
because
1. the format of data and instructions is different
in MIPS and hence different memories are
needed
2. having separate memories is less expensive
3. the processor operates in one cycle and
cannot use a single-ported memory for two
different accesses within that cycle
Logic Design Conventions
• how the logic implementing the machine will operate and how the
machine is clocked
• The functional units in the MIPS implementation consist of two different
types of logic elements: elements that operate on data values and
elements that contain state.
• The elements that operate on data values are all combinational, which
means that their outputs depend only on the current inputs. Ex ALU
• An element contains state if it has some internal storage. We call these
elements state elements because, if we pulled the plug on the machine,
we could restart it by loading the state elements with the values they
contained before we pulled the plug. the instruction and data memories
as well as the registers are all examples of state elements.
• The clock is used to determine when the state element should be written;
a state element can be read at any time
• Logic components that contain state are also called sequential because
their outputs depend on both their inputs and the contents of the internal
state.
• We will use the word asserted to indicate a signal that is logically high and
assert to specify that a signal should be driven logically high, and deassert
or deasserted to represent logical low.
Clocking Methodology
• A clocking methodology defines when signals can be read and when they can be written. It
is important to specify the timing of reads and writes because, if a signal is written at the
same time it is read, the value of the read could correspond to the old value, the newly
written value, or even some mix of the two!
• shows the two state elements surrounding a block of combinational logic, which operates in
a single clock cycle: All signals must propagate from state element 1, through the
combinational logic, and to state element 2 in the time of one clock cycle. The time necessary
for t
• The signals to reach state element 2 defines the length of the clock cycle.
An edge-triggered methodology allows a state element to be read and written
in the same clock cycle without creating a race that could lead to indeterminate
data values.
Building a Datapath
• To start a datapath design is to examine the major components required to
execute each class of MIPS instruction.
• Let’s start by looking at which datapath elements each instruction needs.
• The state elements are the instruction memory and the program counter.
• The instruction memory need only provide read access because the datapath does
not write instructions.
• Since the instruction memory only reads, we treat it as combinational logic: the
output at any time reflects the contents of the location specified by the address
input, and no read control signal is needed.
• The program counter is a 32-bit register that will be written at the end of every
clock cycle and thus does not need a write control signal.
• The adder is an ALU wired to always perform an add of its two 32-bit inputs and
place the result on its output.
A portion of the datapath used for fetching instructions and incrementing
the program counter.
• The processor’s 32 general-purpose registers are stored in a structure called a register file.
• A register file is a collection of registers in which any register can be read or written by
specifying the number of the register in the file.
• The register file contains the register state of the machine. In addition, we will need an ALU
to operate on the values read from the registers.
• The register number inputs are 5 bits wide to specify one of 32 registers (32 = 2^5 ), whereas
the data input and two data output buses are each 32 bits wide.
• Figure shows the ALU, which takes two 32-bit inputs and produces a 32-bit result, as well as
a 1-bit signal if the result is 0
Register file and ALU to implement R-
format
Units to implement load and store
with sign extension
The beq instruction has three operands, two registers that are compared for
equality, and a 16-bit offset used to compute the branch target address
relative to the branch instruction address.
• Attention for branch instruction( Same old Story):
cmp $1, $2
Beq l1
■ Since we compute PC + 4 (the address of the next instruction) : use this value as
the base for computing the branch target address.
■ The architecture also states that the offset field is shifted left 2 bits so that it is a
word offset
• When the condition is true (i.e., the operands are equal), the branch target
address becomes the new PC, and we say that the branch is taken.
• If the operands are not equal, the incremented PC should replace the current PC
(just as for any other normal instruction); in this case, we say that the branch is not
taken.
The datapath for a branch uses the ALU to evaluate the branch condition and a
separate adder to compute the branch target as the sum of the incremented PC and
the sign-extended, lower 16 bits of the instruction (the branch displacement), shifted
left 2 bits (Repeat the same story)
Creating a single Datapath
• Combine all the datapath components and
add control to complete the implementation
• This datapath executes all instructions in one
clock cycle.
• No datapath resource is used more than once.
Overview of pipelining
Pipelining is universal
MIPS Instruction execution steps
• Fetch instruction from memory
• Read registers while decoding the instruction
and determine the opcode of the instruction
• Execute the operation or calculate the address
• Access an operand in data memory
• Write the result into a register
• The units are connected in a serial fashion and
all of them operate simultaneously.
• The use of Pipelining improves the
performance as compared to the traditional
sequential execution of tasks.
Key advantages
Advantages:
• More efficient use of processor
= 800/5
=160 time units
Designing instruction sets for pipelining.
else
split the pipeline in 6 stages
3. Memory operands (LW, SW)
Use execute stage to calculate the memory
address
else
Ins IF ID E M WB
t2
Ins IF ID E M WB
t3
ins IF ID E M WB
t4
• In MIPS we have separate memory for
Instructions and data
So…..
No Structural hazard occurs in MIPs
Data Hazard
• An occurrence in which a planned instruction
cannot execute in the proper clock cycle
because data that is needed to execute the
instruction is not yet available.
• E.g.
add $s0, $t0, $t1
sub $t2, $s0, $t3
Reason for Data hazards
• Destination register of first instruction is used
as a source operand in second instruction
• In 4th CC 2nd inst want $s0 but it is available
after 5th CC only.
Solution 1:Data forwarding
Solution 2: stalls
Consider
lw $s0, 20($t1)
sub $t2, $s0, $t3
In diagram
• Shading on the right half of the register file or
memory
• means the element is read in that stage, and
shading of the left half means it is written in that
stage.
Try with forwarding
lw $t1, 0($t0) IF ID E M WB
(t1)
lw $t2, 4($t0) IF ID E M WB
add $t3, IF ID E M WB
$t1,$t2
sw $t3, 12($t0) IF ID E M WB
lw $t4, 8($01) IF ID E M WB
add $t5, IF ID E M WB
$t1,$t4
sw $t5, 16($t0) IF ID E M WB
Note: Data hazard since for add, one of the source operand is from load inst
but operand is available after the completion of 5th stage.
Solution for this is to reorder the instruction
without loosing the dependency
Lw IF ID E M WB
Lw IF ID E M WB
Lw IF ID E M WB
add IF ID E M WB
sw IF ID E M WB
add IF ID E M WB
sw IF ID E M WB
Control hazard
An occurrence in which the proper
instruction cannot execute in the proper
clock cycle because the instruction that was
fetched is not the one that is needed.
Example:
40: or $7, $8, $9 996
add $4, $5, $6 1000
beq $1, $2, 40 1004
lw $3, 300($0) 1008
By using branch prediction, predicts some branch as
taken / not taken