0% found this document useful (0 votes)
96 views42 pages

Custom SPP

The document discusses custom single-purpose processors and their design. It begins by introducing custom single-purpose processors and noting they can be fast, small, and low power but require more design time and are less flexible than general purpose processors. It then covers basic digital logic components like transistors, logic gates, combinational logic, and sequential logic elements like flip-flops and registers that are used to build the data path and control unit of a custom processor. Finally, it discusses designing custom processors at the register-transfer level using common sequential and combinational components.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
96 views42 pages

Custom SPP

The document discusses custom single-purpose processors and their design. It begins by introducing custom single-purpose processors and noting they can be fast, small, and low power but require more design time and are less flexible than general purpose processors. It then covers basic digital logic components like transistors, logic gates, combinational logic, and sequential logic elements like flip-flops and registers that are used to build the data path and control unit of a custom processor. Finally, it discusses designing custom processors at the register-transfer level using common sequential and combinational components.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 42

CHAPTER 2

Custom single-purpose
processors
Outline
Introduction

Combinational logic

Sequential logic

Custom single-purpose processor design

RT-level custom single-purpose

processor design
Introduction
• Processor
– Digital circuit that performs a computation tasks

– Controller and data path.

– General-purpose: variety of computation tasks.

– Single-purpose: one particular computation task.

– Custom single-purpose: non-standard task.

• A custom single-purpose processor may be

– Fast, small, low power

– But, high NRE, longer time-to-market, less flexible.


CMOS transistor on silicon
Transistor

The basic electrical component in digital systems

Acts as an on/off switch

Voltage at “gate” controls whether current flows

from source to drain


Don’t confuse this “gate” with a logic gate. source

gate Conducts
if gate=1
drain

gate
IC package IC oxide
source channel drain
Silicon substrate
CMOS transistor implementations
Complementary Metal Oxide Semiconductor

We refer to logic levels source source

gate Conducts gate Conducts


Typically 0 is 0V, 1 is 5V if gate=1 if gate=0
drain
drain
pMOS
Two basic CMOS types
nMOS

nMOS conducts if gate=1

pMOS conducts if gate=0 1 1


1 x y x

Hence “complementary” x F = x' x


F = (xy)' y
F=
y x y(x+y)'
Basic gates 0
0
0
NOR
NAND gate gate
Inverter, NAND, NOR inverter
Basic logic gates

x x x x
F x F F
x y F y F x y F F x y F
0 0 y 0 0 0 0 0 0 y 0 0 0
1 1 0 1 0 0 1 1 0 1 1
F=x F=xy 1 0 0 F=x+ 1 0 1 F=x 1 0 1
1 1 1 1 1 1 1 1 0
Drive AND y y
r OR XOR

x F x x y x x y x x y
x F
F
F F F F F
0 1 y 0 0 1 y 0 0 1 y 0 0 1
1 0 0 1 1 0 1 0 0 1 0
F = x’ F = (x 1 0 1 F= 1 0 0 F=x y 1 0 0
Inverte y)’ 1 1 0 (x+y)’ 1 1 0 XNOR 1 1 1
r NAND NOR
Combinational logic design

A) Problem description B) Truth table C) Output equations

y is 1 if a is to 1, or b and c are 1. z is 1 if b Inputs Outputs y = a'bc + ab'c' + ab'c + abc' + abc


or c is to 1, but not both, or if all are 1. a b c y z
0 0 0 0 0
0 0 1 0 1 z = a'b'c + a'bc' + ab'c + abc' + abc
0 1 0 0 1
0 1 1 1 0
1 0 0 1 0
1 0 1 1 1
D) Minimized output equations 1 1 0 1 1
y bc 1 1 1 1 1 E) Logic Gates
a 00 01 11 10
0 0 0 1 0
a y
1 1 1 1 1 b
c
y = a + bc
z
bc
a 00 01 11 10
0 0 1 0 1
z
1 0 1 1 1

z = ab + b’c + bc’
RT-Level Combinational Components

A multiplexor, sometimes called a selector, allows

only one of its data inputs Im to pass through to the


output O.
A decoder converts its binary input I into a one-hot

output O. A common feature on a decoder is an extra


input called enable. When enable is 0, all outputs are
0. When enable is 1, the decoder functions as before.
RT-Level Combinational Components

An adder adds two n-bit binary inputs A and B,

generating an n-bit output sum along with an output


carry.
A comparator compares two n-bit binary inputs A

and B, generating outputs that indicate whether A is


less than, equal to, or greater than B.
An ALU (arithmetic-logic unit) can perform a

variety of arithmetic and logic functions on its n-bit


inputs A and B.
Sequential logic design
A sequential circuit is a digital circuit whose

outputs are a function of the current as well as


previous input values.
One of the most basic sequential circuits is the

flip-flop.
The simplest type of flip-flop is the D flip-flop. It

has two inputs: D and clock.


 When clock is 1, the value of D is stored in the

flip-flop, and that value appears at an output Q.


Sequential logic design
The SR flip-flop, which has three inputs: S, R and

clock.
When clock is 0, the previously stored bit is

maintained and appears at output Q.


When clock is 1, the inputs S and R are examined. If

S is 1, a 1 is stored. If R is 1, a 0 is stored.
If both are 0, there’s no change. If both are 1,

behavior is undefined. Thus, S stands for set, and R


for reset
Sequential logic design
JK flip-flop, which is the same as an SR flip-flop

except that when both J and K are 1, the stored bit


toggles from 1 to 0 or 0 to 1.
To prevent unexpected behavior from signal glitches,

flip-flops are typically designed to be edge


triggered.
They only pay attention to their non-clock inputs

when the clock is rising from 0 to 1, or alternatively


when the clock is falling from 1 to 0.
RT-Level Sequential Components
 A register stores n bits from its n-bit data input I, with those

stored bits appearing at its output O.

 A register usually has at least two control inputs, clock and

load.

 For a rising-edge-triggered register, the inputs I are only stored

when load is 1 and clock is rising from 0 to 1.

 The clock input is usually drawn as a small triangle, as shown

in the figure. Another common register control input is clear,

which resets all bits to 0,regardless of the value of I.


RT-Level Sequential Components
 Because all n bits of the register can be stored in

parallel, we often refer to this type of register as a


parallel-load register.
A shift register stores n bits, but these bits cannot

be stored in parallel. Instead, they must be shifted


into the register serially, meaning one bit per clock
edge.
A shift register has a one-bit data input I, and at least

two control inputs clock and shift.


RT-Level Sequential Components

When clock is rising and shift is 1, the value

of I is stored in the (n)’th bit, while the (n)’th


bit is stored in the (n-1)’th bit, and likewise,
until the second bit is stored in the first bit.
The first bit is typically shifted out, meaning

it appears over an output Q.


RT-Level Sequential Components
A counter is a register that can also

increment (add binary 1) to its stored binary


value.
A counter has a clear input, which resets all

stored bits to 0, and a count input, which


enables incrementing on the clock edge.
 A counter often also has a parallel load data

input and associated control signal.


RT-Level Sequential Components

A common counter feature is both up and down counting

(incrementing and decrementing), requiring an additional

control input to indicate the count direction.

The control inputs discussed above can be either synchronous

or asynchronous. A synchronous input’s value only has an

effect during a clock edge.

An asynchronous input’s value affects the circuit independent

of the clock. Typically, clear control lines are asynchronous.


RT-Level Sequential Components
Sequential logic design
A) Problem Description C) Implementation Model D) State Table (Moore-type)
You want to construct a clock
divider. Slow down your pre- x
existing clock so that you output a Combinational logic Inputs Outputs
a 1 for every four clock cycles I1 Q1 Q0 a I1 I0 x
I0 0 0 0 0 0
0
0 0 1 0 1
0 1 0 0 1 0
Q1 Q0 0 1 1 1 0
1 0 0 1 0 0
B) State Diagram 1 0 1 1 1
State register
1 1 0 1 1
x=0 x=1 a=0 1
a=0 1 1 1 0 0
I1 I0
0 a=1 3

a=1 a=1

1 2
a=1 • Given this implementation model
a=0 x=0 x=0 a=0
– Sequential logic design quickly reduces to

combinational logic design


Sequential logic design
E) Minimized Output Equations F) Combinational Logic
I1 Q1Q0
a 00 01 11 10
a
0 0 0 1 1
I1 = Q1’Q0a + Q1a’ + x
1 Q1Q0’
0 1 0 1

Q1Q0
01 11 10 I1
a
I0 00
0 0 1 1 0 I0 = Q0a’ + Q0’a

1 1 0 0 1

x Q1Q0 I0
a
00 01 11 10
0 0 0 1 0 x = Q1Q0
Q1 Q0
1 0 0 1 0
Custom single-purpose processor
design
We can apply the above combinational and

sequential logic design techniques to build data


path components and controllers.
we need to build a custom single-purpose

processor for a given program, since a processor


consists of a controller and a data path.
Data path stores and manipulates a systems

data.
It contains register units, functional units, and

connection units like wires and multiplexors.


A controller sets the data path control units like

register load and multiplexor select signals of the


register units, functional units and connection units
to obtain desired configuration at a particular time.
It monitors external control inputs as well as data

path control outputs known as status signals,


coming from functional units and sets external
control outputs.
external … …
external
control data
inputs inputs
controller datapath
… …
datapath
control next-state registers
inputs and
controller datapath
control
logic

datapath
control
outputs state functional
… … register units

external external
control outputs data
outputs

… …

controller and datapath a view inside the controller and datapath


Example: greatest common divisor
!1
(a) black-box 1:
(c) state
view
• First create algorithm diagram
1 !(!go_i)
2:
go_i x_i y_i !go_i

• Convert algorithm to GCD


2-J:

3: x = x_i
d_o
“complex” state machine 4: y = y_i

– Known as FSMD: finite- (b) desired


functionality
5: !(x!=y)
0: int x, y; x!=y

state machine with data 1: while (1) {


2: while (!go_i);
6:
x<y !(x<y)
3: x = x_i;
path. 4: y = y_i; 7: y = y -x 8: x = x - y
5: while (x != y) {
6-J:
– Can use templates to 6:
7:
if (x < y)
y = y - x;
else 5-J:
perform such conversion. 8: x = x - y; 9: d_o = x
}
9: d_o = x; 1-J:
}
State diagram templates
Branch statement

if (c1)
Assignment statement Loop statement c1 stmts
a=b else if c2
while
next c2 stmts
(cond) {
statement else
loop-body-
other stmts
next statement
statements
}
a=b next
!cond C:
C: statement
c1 !c1*c2 !c1*!c2
next cond
statemen
t loop-
body- c1 stmts c2 others
statement stmts
s
J:
J:

next
statement
next
statement
Creating the data path
• Create a register for any declared variable.

• Create a functional unit for each arithmetic


operation.
• Connect the ports, registers and functional units.
– Based on reads and writes

– Use multiplexors for multiple sources

• Create unique identifier.

– for each data path component control input and


output
!1
1:

1 !(!go_i) x_i y_i


2:
Datapath
!go_i

2-J: x_sel
n-bit 2x1 n-bit 2x1
3: x = x_i
y_sel

x_ld
4: y = y_i 0: x 0: y

y_ld
!(x!=y)
5:

x!=y
6: != < subtractor subtractor
x<y !(x<y)
5: x!=y 6: x<y 8: x-y 7: y-x
7: y = y -x 8: x=x-y
x_neq_y

6-J:
x_lt_y 9: d

d_ld
5-J:

9: d_o = x d_o

1-J:
Creating the controller’s FSM

1:
!1
Controller
go_i
• Same structure as FSMD
!1
1 !(!go_i) 0000 1:
2:
0001 2:
1 !(!go_i) • Replace complex
!go_i
!go_i
2-J:
0010 2-J: actions/conditions with
3: x = x_i x_sel = 0
0011 3: x_ld = 1

4: y = y_i
data path configurations.
y_sel = 0 x_i y_i
0100 4: y_ld = 1
!(x!=y) Datapath
5: !x_neq_y
0101 5:
x!=y x_sel
n-bit 2x1 n-bit 2x1
x_neq_y
6: 0110 6: y_sel
x<y !(x<y) x_lt_y !x_lt_y x_ld
0: x 0: y
7: y = y -x 8: x = x - y 7: y_sel = 1 8: x_sel =1
y_ld
y_ld = 1 x_ld = 1

6-J: 0111 1000


1001 6-J:
!= < subtractor subtractor
5-J: 1010 5-J: 5: x!=y 6: x<y 8: x-y 7: y-x
x_neq_
9: d_o = x 1011 9: d_ld = 1 y
x_lt_y 9: d
1-J: 1100 1-J: d_ld

d_
o
Splitting into a controller and data path
go_i

Controller implementation model Controller !1


0000 1: x_i y_i
go_i
x_sel 1 !(!go_i) (b) Datapath
Combinational y_sel 0001 2:
logic !go_i x_sel
x_ld n-bit 2x1 n-bit 2x1
y_ld 0010 2-J: y_sel
x_neq_y x_sel = 0 x_ld
0011 3: x_ld = 1 0: x 0: y
x_lt_y y_ld
d_ld
y_sel = 0
0100 4: y_ld = 1
!= < subtractor subtractor
x_neq_y=0 5: x!=y 6: x<y 8: x-y 7: y-x
0101 5: x_neq_
Q3 Q2 Q1 Q0 x_neq_y=1 y
0110 6: x_lt_y 9: d
State register d_ld
x_lt_y=1 x_lt_y=0
I3 I2 I1 I0
7: y_sel = 1 8: x_sel =1 d_
y_ld = 1 x_ld = 1 o
0111 1000
1001 6-J:

1010 5-J:

1011 9: d_ld = 1

1100 1-J:
Completing the GCD custom single-
purpose processor design
… …
• We finished the data path
controller datapath
• We have a state table for
next-state registers

the next state and control


and
control
logic

logic
state functional
– All that’s left is combinational register units

logic design

• This is not an optimized …

a view inside the controller and datapath


design, but we see the

basic steps
Controller state table for the GCD Example
Completing the GCD custom
single-purpose processor design
… …
We finished the data path
controller datapath
We have a state table for the
next-state registers

next state and control logic and


control
logic
All that’s left is
state functional
combinational logic design register units

This is not an optimized


… …
design, but we see the basic
a view inside the controller and datapath
steps

33
RT-level custom single-purpose processor
design
We often start with a state

Problem Specification
Sende Bridge Rece
machine r rdy_in A single-purpose processor that
converts two 4-bit inputs, arriving
rdy_out iver

clock one at a time over data_in along with


 Rather than algorithm a rdy_in pulse, into one 8-bit output
on data_out along with a rdy_out
data_in(4) pulse. data_out(8)
 Cycle timing often too central to

functionality
rdy_in=0 Bridge rdy_in=1

Example WaitFirst4
rdy_in=1
RecFirst4Start RecFirst4End
data_lo=data_in
 Bus bridge that converts 4-bit
rdy_in=0 rdy_in=0 rdy_in=1
bus to 8-bit bus rdy_in=1
WaitSecond4 RecSecond4Start RecSecond4End
FSMD

 Start with FSMD data_hi=data_in

rdy_in=0
 Known as register-transfer (RT) Send8Start
Inputs
rdy_in: bit; data_in: bit[4];
data_out=data_hi Send8End Outputs
level & data_lo
rdy_out=1
rdy_out=0 rdy_out: bit; data_out:bit[8]
Variables
data_lo, data_hi: bit[4];
 Exercise: complete the design

34
RT-level custom single-purpose
processor design (cont’)
Bridge
(a) Controller
rdy_in=0 rdy_in=1
rdy_in=1
WaitFirst4 RecFirst4Start RecFirst4End
data_lo_ld=1
rdy_in=0 rdy_in=0 rdy_in=1
rdy_in=1
WaitSecond4 RecSecond4Start RecSecond4End
data_hi_ld=1

Send8Start Send8End
data_out_ld=1 rdy_out=0
rdy_out=1

rdy_in rdy_ou
t
clk
data_in(4) data_out

data_lo_ld
data_out_ld
data_hi_ld
registers

data_hi data_lo
to all

data_out
(b) Datapath

35
Optimizing single-purpose
processors

Optimization is the task of making design

metric values the best possible


Optimization opportunities
original program

FSMD

Data path

FSM

36
Optimizing the original program

Analyze program attributes and look for areas

of possible improvement
number of computations

size of variable

time and space complexity

operations used

 multiplication and division very expensive

37
Optimizing the original
program
original program optimized program
0: int x, y; 0: int x, y, r;
1: while (1) { 1: while (1) {
2: while (!go_i); 2: while (!go_i);
3: x = x_i; // x must be the larger
4: y = y_i; number
5: while (x != y) { 3: if (x_i >= y_i) {
replace the subtraction
6: if (x < y) 4: x=x_i;
operation(s) with modulo
7: y = y - x; 5: y=y_i;
operation in order to speed
else }
up program
8: x = x - y; 6: else {
} 7: x=y_i;
9: d_o = x; 8: y=x_i;
} }
9: while (y != 0) {
10: r = x % y;
11: x = y;
12: y = r;
}
13: d_o = x;
}
GCD(42, 8) - 9 iterations to complete the loop GCD(42,8) - 3 iterations to complete the loop
x and y values evaluated as follows : (42, 8), (43, x and y values evaluated as follows: (42, 8),
8), (26,8), (18,8), (10, 8), (2,8), (2,6), (2,4), (2,2). (8,2), (2,0)

38
Optimizing the FSMD
Areas of possible improvements

merge states

states with constants on transitions can be eliminated,


transition taken is already known
states with independent operations can be merged
separate states

states which require complex operations (a*b*c*d) can


be broken into smaller states to reduce hardware size
scheduling
39
Optimizing the FSMD (cont.)
int x, y; !1 optimized FSMD
original FSMD
1:
int x, y;
1 !(!go_i) eliminate state 1 – transitions have constant values
2: 2:
!go_i go_i !go_i

2-J: x = x_i
3: y = y_i
merge state 2 and state 2J – no loop operation in
3: x = x_i between them
5:

4: y = y_i x<y x>y


merge state 3 and state 4 – assignment operations
are independent of one another 7: y = y -x 8: x = x - y
5: !(x!=y)

x!=y
9: d_o = x
6: merge state 5 and state 6 – transitions from state 6
x<y !(x<y) can be done in state 5
y = y -x 8: x = x - y
7:
eliminate state 5J and 6J – transitions from each state
6-J: can be done from state 7 and state 8, respectively

5-J:
eliminate state 1-J – transition from state 1-J can be
d_o = x done directly from state 9
9:

1-J:

40
Optimizing the data path

Sharing of functional units


one-to-one mapping, as done previously, is not

necessary
if same operation occurs in different states, they can

share a single functional unit


Multi-functional units
ALUs support a variety of operations, it can be

shared among operations occurring in different states


41
Summary

Custom single-purpose processors


Straightforward design techniques

Can be built to execute algorithms

Typically start with FSMD

CAD tools can be of great assistance

42

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy