The Processor

The Processor
Objective of this chapter
To construct a datapath and control unit for MIPS instruction set.
What is MIPS ?
MIPS stands for Microprocessor without Interlocked Pipeline Stages
It is a reduced instruction set computer (RISC)
It’s Instruction set architecture (ISA) was developed by MIPS

Technologies (formerly MIPS Computer Systems, Inc.).
Multiple revisions of the MIPS instruction set exist, including MIPS I, MIPS II,
MIPS III, MIPS IV, MIPS V, MIPS32, and MIPS64.
The current revisions are MIPS32 (for 32-bit implementations) and MIPS64 (for
64-bit implementations).
A Basic MIPS Implementation
It includes a subset of the core MIPS instruction
set like:
• The memory-reference instructions load word
(lw) and store word (sw)
• The arithmetic-logical instructions add, sub,
and,or.
• The instructions branch equal (beq) and jump
(j).
An Overview of the Implementation
For any instruction execution, the following 2 steps are common
1. Send the program counter (PC) to the memory that

contains the code and fetch the instruction from
that memory.
2. Read one or two registers, using fields of the

instruction to select the registers to read. For the
load word instruction, we need to read only one
register, but most other instructions require that we
read two registers.
All instruction classes, use the arithmetic-logical
unit (ALU) after reading the registers.
• The memory-reference instructions use the

ALU for an address calculation
LW $10, 20($1)
i.e $1=1000;
20+[$1];
[1020]
$10=100
Arithmetic and Logical instructions uses ALU
for Calculations e.g
ADD $1, $2, $3
Branch instruction uses ALU for comparison

e.g. CMP $1, $2 ($1==$2)
BEQ L1
After using the ALU, the actions required to complete
various instruction classes differ.
Identify the correct action taken by these instructions?
a. memory-reference instruction
b. An arithmetic-logical instruction
c. A branch instruction
must write the data back into a register/memory
To access the memory either to write/read data
change the next instruction address based on the

comparison
An abstract view of the implementation of the MIPS subset showing
the major functional units and the major connections between them.
• All instructions start by using the program counter to supply the instruction address to
the instruction memory. After the instruction is fetched, the register operands used by
an instruction are specified by fields of that instruction.
• Once the register operands have been fetched, they can be operated on to compute a
memory address (for a load or store), to compute an arithmetic result (for an integer
arithmetic-logical instruction), or a compare (for a branch).
• If the instruction is an arithmetic-logical instruction, the result from the ALU must be
written to a register.
• If the operation is a load or store, the ALU result is used as an address to either store a
value from the registers or load a value from memory into the registers. The result
from the ALU or memory is written back into the register file.
• Branches require the use of the ALU output to determine the next instruction address,
which comes from either the ALU (where the PC and branch offset are summed) or
from an adder that increments the current PC by 4.
The basic implementation of the MIPS subset including the
necessary multiplexors and control lines.
• The top multiplexor controls what value replaces the PC (PC + 4 or the
branch destination address); the multiplexor is controlled by the gate that
“ands” together the Zero output of the ALU and a control signal that
indicates that the instruction is a branch.
• The multiplexor whose output returns to the register file is used to steer
the output of the ALU (in the case of an arithmetic logical instruction) or
the output of the data memory (in the case of a load) for writing into the
register file.
• Finally, the bottommost multiplexor is used to determine whether the

second ALU input is from the registers (for a non immediate arithmetic-
logical instruction) or from the offset field of the instruction (for an
immediate operation, a load or store, or a branch).
• The added control lines are straightforward and determine the operation
performed at the ALU, whether the data memory should read or write,
and whether the registers should perform a write operation. The control
lines are shown in color to make them easier to see.
The single-cycle data path conceptually must
have separate instruction and data memories
because
1. the format of data and instructions is different
in MIPS and hence different memories are
needed
2. having separate memories is less expensive
3. the processor operates in one cycle and
cannot use a single-ported memory for two
different accesses within that cycle
Logic Design Conventions
• how the logic implementing the machine will operate and how the
machine is clocked
• The functional units in the MIPS implementation consist of two different
types of logic elements: elements that operate on data values and
elements that contain state.
• The elements that operate on data values are all combinational, which
means that their outputs depend only on the current inputs. Ex ALU
• An element contains state if it has some internal storage. We call these
elements state elements because, if we pulled the plug on the machine,
we could restart it by loading the state elements with the values they
contained before we pulled the plug. the instruction and data memories
as well as the registers are all examples of state elements.
• The clock is used to determine when the state element should be written;
a state element can be read at any time
• Logic components that contain state are also called sequential because
their outputs depend on both their inputs and the contents of the internal
state.
• We will use the word asserted to indicate a signal that is logically high and
assert to specify that a signal should be driven logically high, and deassert
or deasserted to represent logical low.
Clocking Methodology
• A clocking methodology defines when signals can be read and when they can be written. It
is important to specify the timing of reads and writes because, if a signal is written at the
same time it is read, the value of the read could correspond to the old value, the newly
written value, or even some mix of the two!
• we will assume an edge-triggered clocking methodology. An edge-triggered clocking

methodology means that any values stored in a sequential logic element are updated only on
a clock edge.
• shows the two state elements surrounding a block of combinational logic, which operates in
a single clock cycle: All signals must propagate from state element 1, through the
combinational logic, and to state element 2 in the time of one clock cycle. The time necessary
for t
• The signals to reach state element 2 defines the length of the clock cycle.
An edge-triggered methodology allows a state element to be read and written
in the same clock cycle without creating a race that could lead to indeterminate
data values.
Building a Datapath
• To start a datapath design is to examine the major components required to
execute each class of MIPS instruction.
• Let’s start by looking at which datapath elements each instruction needs.
• The state elements are the instruction memory and the program counter.
• The instruction memory need only provide read access because the datapath does
not write instructions.
• Since the instruction memory only reads, we treat it as combinational logic: the
output at any time reflects the contents of the location specified by the address
input, and no read control signal is needed.
• The program counter is a 32-bit register that will be written at the end of every
clock cycle and thus does not need a write control signal.
• The adder is an ALU wired to always perform an add of its two 32-bit inputs and
place the result on its output.
A portion of the datapath used for fetching instructions and incrementing
the program counter.
• The processor’s 32 general-purpose registers are stored in a structure called a register file.
• A register file is a collection of registers in which any register can be read or written by
specifying the number of the register in the file.
• The register file contains the register state of the machine. In addition, we will need an ALU
to operate on the values read from the registers.
• The register number inputs are 5 bits wide to specify one of 32 registers (32 = 2^5 ), whereas
the data input and two data output buses are each 32 bits wide.
• Figure shows the ALU, which takes two 32-bit inputs and produces a 32-bit result, as well as
a 1-bit signal if the result is 0
Register file and ALU to implement R-
format
Units to implement load and store
with sign extension
The beq instruction has three operands, two registers that are compared for
equality, and a 16-bit offset used to compute the branch target address
relative to the branch instruction address.
• Attention for branch instruction( Same old Story):
cmp $1, $2
Beq l1
■ Since we compute PC + 4 (the address of the next instruction) : use this value as
the base for computing the branch target address.
■ The architecture also states that the offset field is shifted left 2 bits so that it is a
word offset
• When the condition is true (i.e., the operands are equal), the branch target
address becomes the new PC, and we say that the branch is taken.
• If the operands are not equal, the incremented PC should replace the current PC
(just as for any other normal instruction); in this case, we say that the branch is not
taken.
The datapath for a branch uses the ALU to evaluate the branch condition and a
separate adder to compute the branch target as the sum of the incremented PC and
the sign-extended, lower 16 bits of the instruction (the branch displacement), shifted
left 2 bits (Repeat the same story)
Creating a single Datapath
• Combine all the datapath components and
add control to complete the implementation
• This datapath executes all instructions in one
clock cycle.
• No datapath resource is used more than once.
Overview of pipelining
Pipelining is universal
MIPS Instruction execution steps
• Fetch instruction from memory
• Read registers while decoding the instruction
and determine the opcode of the instruction
• Execute the operation or calculate the address
• Access an operand in data memory
• Write the result into a register
• The units are connected in a serial fashion and
all of them operate simultaneously.
• The use of Pipelining improves the
performance as compared to the traditional
sequential execution of tasks.
Key advantages
Advantages:
• More efficient use of processor
• Quicker time of execution of large number of

instructions
If stages are perfectly balanced then speed-up is
calculated as
Time between instructions pipelined=
Time between instructions nonpipelined

Number of pipe stages
= 800/5
=160 time units
Designing instruction sets for pipelining.
1. If (all instructions are the same length)

then fetching and decoding is easy
else pipeline is challenging
2 if(source operands position is fixed for all inst)

Reading register file is easy
else
split the pipeline in 6 stages
3. Memory operands (LW, SW)
Use execute stage to calculate the memory
address
else
Separate stage like Execution and Address

calculation stages
Pipelining Hazard
• A pipeline hazard refers to a situation in which
a correct program ceases to work correctly
due to implementing the processor with a
pipeline.
• There are 3 different pipeline hazards
a) Structural hazard
b) Data hazard
c) Control hazard
a) Structural hazards
A structural hazard occurs when a part of the
processor's hardware is needed by two or
more instructions at the same time.
e.g
IF ID E M WB
Ins IF ID E M WB
t2
Ins IF ID E M WB
t3
ins IF ID E M WB
t4
• In MIPS we have separate memory for
Instructions and data
So…..
No Structural hazard occurs in MIPs
Data Hazard
• An occurrence in which a planned instruction
cannot execute in the proper clock cycle
because data that is needed to execute the
instruction is not yet available.
• E.g.
add $s0, $t0, $t1
sub $t2, $s0, $t3
Reason for Data hazards
• Destination register of first instruction is used
as a source operand in second instruction
• In 4th CC 2nd inst want $s0 but it is available
after 5th CC only.
Solution 1:Data forwarding
Solution 2: stalls
Consider
lw $s0, 20($t1)
sub $t2, $s0, $t3
In diagram
• Shading on the right half of the register file or
memory
• means the element is read in that stage, and
shading of the left half means it is written in that
stage.
Try with forwarding
Passing data backward not possible

Now try with stalls+forwarding
Identify any hazard in these instructions
sub $2, $1, $3

And $12, $2, $5
or $13, $6, $2
add $14, $2, $2
sw $15, 100($2)
Problem with starting next instruction before first is finished –
dependencies that “go backward in time” are data hazards
Now Try with stalls
Now Try to use forwarding technique….
DADD R1,R2,R3
DSUB R4,R1,R5
AND R6,R1,R7
OR R8,R1,R9
XOR R10,R1,R11
Reordering code to avoid pipeline stalls
Consider the following code segment in C:
A = B + E;
C = B + F;
Here is the generated MIPS $t0: base address
lw $t1, 0($t0) //load B
lw $t2, 4($t0) //load E
add $t3, $t1,$t2// A=B+E
sw $t3, 12($t0) //STORE A INTO MEMROY
lw $t4, 8($01) // LOAD F
add $t5, $t1,$t4 // C=B+F
sw $t5, 16($t0) // STORE C INTO MEMORY
1 2 3 4 5 6 7 8 9
lw $t1, 0($t0) IF ID E M WB
(t1)
lw $t2, 4($t0) IF ID E M WB
add $t3, IF ID E M WB
$t1,$t2
sw $t3, 12($t0) IF ID E M WB
lw $t4, 8($01) IF ID E M WB
add $t5, IF ID E M WB
$t1,$t4
sw $t5, 16($t0) IF ID E M WB
Note: Data hazard since for add, one of the source operand is from load inst
but operand is available after the completion of 5th stage.
Solution for this is to reorder the instruction
without loosing the dependency
lw $t1, 0($t0) //LOAD B

lw $t2, 4($t1) // LOAD E
lw $t4, 8($01) // LOAD F
add $t3, $t1,$t2// A=B+E
sw $t3, 12($t0) // STORE A INTO MEMORY
add $t5, $t1,$t4 // C= B+F
sw $t5, 16($t0) // STORE C INTO MEMORY
1 2 3 4 5 6 7 8 9
Lw IF ID E M WB
Lw IF ID E M WB
Lw IF ID E M WB
add IF ID E M WB
sw IF ID E M WB
add IF ID E M WB
sw IF ID E M WB
Control hazard
An occurrence in which the proper
instruction cannot execute in the proper
clock cycle because the instruction that was
fetched is not the one that is needed.
Example:
40: or $7, $8, $9 996
add $4, $5, $6 1000
beq $1, $2, 40 1004
lw $3, 300($0) 1008
By using branch prediction, predicts some branch as
taken / not taken
Branch prediction: a method of resolving a branch

hazard that assumes a given outcome for the branch
and proceed from that assumption rather than waiting.
Branch taken
40: or $7, $8, $9
add $4, $5, $6 1000
beq $1, $2, 40 1004
lw $3, 300($0) 1008
Assume always branch is taken
In case of programming, at the bottom of loops
are branches that jump back to the top of the
loop is considered as taken
• The next diagram demonstrates “predicting
the branches as a solution to control hazard”
• The top diagram illustrates lw instruction is
fetched after beq assuming that branch is
untaken
• If the condition is true then lw at the decoding
stage should stop and s/m inserts bubble at
clock cycle 4 and starts fetching OR instruction
at lable 40
Branch will be untaken
One more solution for branch
hazard ( pls wait..)
• One popular approach to dynamic prediction of
branches is keeping a history for each branch as
taken or untaken, and then using the recent
past behavior to predict the future.
• Dynamic branch predictors can correctly predict

branches with over 90% accuracy.
• When the guess is wrong, must restart the

pipeline from the proper branch address.
Pipelined datapath and control
Remember
1. IF: Instruction fetch
2. ID: Instruction decode and register file read
3. EX: Execution or address calculation
4. MEM: Data memory access
5. WB: Write back
Two exception where data flow from right to left
1. The write-back stage, which places the result back into

the register file in the middle of the datapath(may get
data hazard)
2. The selection of the next value of the PC, choosing

between the incremented PC and the branch address
from the MEM stage(may get control hazard)
Single pipelined datapath
Datapath for load instruction
Datapath for store instruction

The Processor

Uploaded by

Copyright:

Available Formats

The Processor

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Processor

Uploaded by

Copyright:

Available Formats

The Processor

Objective of this chapter

To construct a datapath and control unit for MIPS instruction set.

MIPS stands for Microprocessor without Interlocked Pipeline Stages

It is a reduced instruction set computer (RISC)

It’s Instruction set architecture (ISA) was developed by MIPS

1. Send the program counter (PC) to the memory that

2. Read one or two registers, using fields of the

• The memory-reference instructions use the

Branch instruction uses ALU for comparison

must write the data back into a register/memory

To access the memory either to write/read data

change the next instruction address based on the

• Finally, the bottommost multiplexor is used to determine whether the

• we will assume an edge-triggered clocking methodology. An edge-triggered clocking

• Quicker time of execution of large number of

Time between instructions pipelined=

Time between instructions nonpipelined

1. If (all instructions are the same length)

else pipeline is challenging

2 if(source operands position is fixed for all inst)

Separate stage like Execution and Address

Passing data backward not possible

sub $2, $1, $3

lw $t1, 0($t0) //LOAD B

Branch prediction: a method of resolving a branch

• Dynamic branch predictors can correctly predict

• When the guess is wrong, must restart the

1. The write-back stage, which places the result back into

2. The selection of the next value of the PC, choosing

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.