Custom Single Purpose Processor Design
Custom Single Purpose Processor Design
Custom Single Purpose Processor Design
Higher Performance
Smaller Size
Longer Time-to-market
Less flexible
E) Logic Gates a b c y
y = a + bc 00 0 0 1 0 01 1 1 11 0 1 10 1 1
z = ab + bc + bc
n n-bit Adder
n carry sum
log n x n Decoder
O(n-1) O1 O0
O = A op B op determined by S.
State register I1 I0
a 0 1 0 1 0 1 0 1
x 0 0 0 1
a=1
I1 Q1Q0 00 a 0
1
01
11
0
0
0
1
1
0
10 1 1
I0
Q1Q0 a
01 1 0 11 1 0
10 0 1 I0 = Q0a + Q0a
I1
00 0 1
x Q1Q0 00 a 0 0 1 0
I0 01 0 0 11 1 1 10 0 0 x = Q1Q0 Q1 Q0
state register
functional units
y_i
d_o
4: y = y_i !(x!=y) x!=y 6: x<y 7: y = y -x 6-J: !(x<y)
Known as FSMD: finitestate machine with datapath Can use templates to perform such conversion
(b) desired functionality 0: int x, y; 1: while (1) { 2: while (!go_i); 3: x = x_i; 4: y = y_i; 5: while (x != y) { 6: if (x < y) 7: y = y - x; else 8: x = x - y; } 9: d_o = x; }
5:
8: x = x - y
Branch statement
if (c1) c1 stmts else if c2 c2 stmts else other stmts next statement
C: c1 !c1*c2 c2 stmts !c1*!c2 others
a=b
C:
next statemen t J:
next statement
next statement
Create a register for any declared variable Create a functional unit for each arithmetic operation Connect the ports, registers and functional units
Datapath
x_ld
4: y = y_i !(x!=y) x!=y 6: x<y 7: y = y -x 6-J: !(x<y) != 5: x!=y x_neq_ y x_lt_y y_ld
0: x
0: y
5:
< 6: x<y
subtractor 8: x-y
subtractor 7: y-x
8: x = x - y
9: d d_ o
d_ld
5-J:
9: 1-J:
d_o = x
Controller
0000 1: 1
!1 !(!go_i)
Same structure as FSMD Replace complex actions/conditions with Datapath datapath configurations
x_i y_i x_sel y_sel x_ld y_ld 0: x 0: y n-bit 2x1 n-bit 2x1
4:
y = y_i
5:
8: x = x - y
< 6: x<y
subtractor 8: x-y
subtractor 7: y-x
9: d d_ o
13
Controller
0000 1: 1 0001 2: !go_i 00102-J: x_sel = 0 0011 3: x_ld = 1 y_sel = 0 0100 4: y_ld = 1 0101 5:
!1 x_i !(!go_i) x_sel y_sel x_ld y_ld 0: x 0: y n-bit 2x1 n-bit 2x1 y_i
(b) Datapath
< 6: x<y
subtractor 8: x-y
subtractor 7: y-x
9: d d_ o
1001 6-J:
1010 5-J: 1011 9: 1100 1-J: d_ld = 1
14
0 0
0 0 0 0 0 1 1 1 1 1 1 1 1
0 1
1 1 1 1 1 0 0 0 0 1 1 1 1
1 0
0 0 1 1 1 0 0 1 1 0 0 1 1
1 0
1 1 0 0 1 0 1 0 1 0 1 0 1
* *
0 1 * * * * * * * * * * *
* *
* * 0 1 * * * * * * * * *
* *
* * * * * * * * * * * * *
0 0
1 0 1 0 1 1 1 0 1 0 0 0 0
1 1
0 1 0 1 0 0 0 1 1 0 0 0 0
0 0
1 1 0 1 0 0 1 0 0 0 0 0 0
0 1
1 0 0 1 1 1 0 1 0 0 0 0 0
0 X
X X X X X 1 X X X X X X X
X 0
X X X X 1 X X X X X X X X
1 0
0 0 0 0 0 1 0 0 0 0 0 0 0
0 1
0 0 0 0 1 0 0 0 0 0 0 0 0
0 0
0 0 0 0 0 0 0 0 1 0 0 0 0
number up to n
int i, j,k,n,Outp; while (1) { while (!go_i); n = n_i; i=0; j=1; k=0; outp=i; outp=j; while (k<=n) { k=i+j; i=j; j=k; outp=k; } }
Problem Specification
Sen der
rdy_in clock
Bridge A single-purpose processor that converts two 4-bit inputs, arriving one at a time over data_in along with a rdy_in pulse, into one 8-bit output on data_out along with a rdy_out pulse.
rdy_out
Re cei ver
data_in(4)
data_out(8)
rdy_in=0 WaitFirst4
Bridge
rdy_in=1 RecFirst4End
Example
FSMD
rdy_in=0
rdy_in=1 RecSecond4En d Inputs rdy_in: bit; data_in: bit[4]; Outputs rdy_out: bit; data_out:bit[8] Variables data_lo, data_hi: bit[4];
rdy_in= 1 RecSecond4Sta WaitSecond4 rt data_hi=data_in rdy_in=0 Send8Start data_out=data _hi & data_lo rdy_out=1 Send8End rdy_out=0
(a) Controller
rdy_in=0 WaitFirst4 rdy_in=0 rdy_in=1 RecFirst4Start data_lo_ld=1 rdy_in=1 RecFirst4End
rdy_in=1
RecSecond4End
Send8End rdy_out=0
data_in(4)
to all registers data_hi data_lo
data_out
(b) Datapath
Optimization is the task of making design metric values the best possible Optimization opportunities
original
program
FSMD
datapath FSM
of computations
of variable
time
operations
multiplication
20
GCD(42, 8) - 9 iterations to complete the loop x and y values evaluated as follows : (42, 8), (43, 8), (26,8), (18,8), (10, 8), (2,8), (2,6), (2,4), (2,2).
optimized program 0: int x, y, r; 1: while (1) { 2: while (!go_i); // x must be the larger number 3: if (x_i >= y_i) { 4: x=x_i; 5: y=y_i; } 6: else { 7: x=y_i; 8: y=x_i; } 9: while (y != 0) { 10: r = x % y; 11: x = y; 12: y = r; } 13: d_o = x; } GCD(42,8) - 3 iterations to complete the loop x and y values evaluated as follows: (42, 8), (8,2), (2,0)
states
states
states
separate
states
states
which require complex operations (a*b*c*d) can be broken into smaller states to reduce hardware size
scheduling
int x, y;
!1
3: 4: 5:
3:
5:
merge state 3 and state 4 assignment operations are independent of one another
merge state 5 and state 6 transitions from state 6 can be done in state 5 eliminate state 5J and 6J transitions from each state can be done from state 7 and state 8, respectively eliminate state 1-J transition from state 1-J can be done directly from state 9
x<y 7: y = y -x
x>y 8: x = x - y
9:
d_o = x
7:
9: 1-J:
necessary
if
same operation occurs in different states, they can share a single functional unit
Multi-functional units
ALUs
support a variety of operations, it can be shared among operations occurring in different states
State encoding
task
of assigning a unique bit pattern to each state in an FSM of state register and combinational logic vary
size
can
State minimization
task