High Level Synthesis - 01 - Introduction
High Level Synthesis - 01 - Introduction
Behavioral Structural
Architectural Architectural synthesis Control
For I=0 to I=15 Memory
Level Sum = Sum + array[I]
(register level)
+
0 State
Logic Logic synthesis
0 0
Level
0 Circuit synthesis
Circuit (Library)
Level
Geometric
-2-
Architectural Synthesis Problem
Specification
Constraints
Objectives
Output:
•Data-path + controller
Tasks
-3-
Architectural Synthesis output
datapath
control
… outputs …
external external • Finite state machine (FSM)
control data • Controller + microprogram
outputs outputs
Controller: • Sinchronization scheme (e.g.
global clock singol phase with
master-slave registers)
controller and datapath
-4-
Objective function
Main goals in classical approach
1. Minimum area
– Functional units, registers, memory, interconnect
2. Maximum speed
– latency in clock cycles
– cycle time
– throughput
Generally one parameter is set as a constrained
and the other one is optimized
-5-
More sophisticated Objective functions for
high-level and system design
Additional goals in modern approaches
❑ More accurate estimation, such as
– Size of operands
– Sharing of hardware for similar operations (e.g. + and -)
❑ Testability
❑ Low power
• Power down, clock disabling
❑ Reliability
• Fault tolerance, self-test
-6-
Finite State Machine + Datapath
model
Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis
… …
external external
control data controller datapath
inputs inputs
… …
datapath next-state registers
control and
controller inputs datapath control
logic
datapath
control state functional
outputs register units
… …
external external
control data
outputs outputs
… …
-7-
Example: greatest common divisor
!1
(a) black-box 1:
machine
d_o
4: y = y_i
datapath 0: int x, y;
1: while (1) {
6:
x<y !(x<y)
– Can use templates to 2: while (!go_i);
3: x = x_i; 7: y = y -x 8: x = x - y
6: if (x < y)
5-J:
7: y = y - x;
else 9: d_o = x
8: x = x - y;
} 1-J:
9: d_o = x;
}
-8-
State diagram templates
!cond
a=b C: C:
cond c1 !c1*c2 !c1*!c2
next loop-body-
statements
c1 stmts c2 stmts others
statement
J: J:
next next
statement statement
-9-
Creating the datapath
1-J:
- 10 -
Creating the controller’s FSM
!1 go_i
1:
1 !(!go_i)
Controller
0000 1:
!1 ❑ Same structure as FSMD
2:
!go_i
0001 2:
1 !(!go_i)
❑ Replace complex
!go_i
2-J:
0010 2-J: actions/conditions with
datapath configurations
3: x = x_i x_sel = 0
0011 3: x_ld = 1
4: y = y_i
y_sel = 0 x_i y_i
0100 4: y_ld = 1
!(x!=y) Datapath
5: !x_neq_y
0101 5: x_sel
x!=y n-bit 2x1 n-bit 2x1
x_neq_y y_sel
6: 0110 6:
x_ld
x<y !(x<y) x_lt_y !x_lt_y 0: x 0: y
y_ld
7: y = y -x 8: x = x - y 7: y_sel = 1 8: x_sel =1
y_ld = 1 x_ld = 1
- 11 -
Splitting into a controller and
datapath
go_i
1010 5-J:
1011 9: d_ld = 1
1100 1-J:
- 12 -
Controller state table for the GCD
example
Inputs Outputs
Q3 Q2 Q1 Q0 x_ne x_lt_ go_i I3 I2 I1 I0 x_sel y_sel x_ld y_ld d_ld
q_y y
0 0 0 0 * * * 0 0 0 1 X X 0 0 0
0 0 0 1 * * 0 0 0 1 0 X X 0 0 0
0 0 0 1 * * 1 0 0 1 1 X X 0 0 0
0 0 1 0 * * * 0 0 0 1 X X 0 0 0
0 0 1 1 * * * 0 1 0 0 0 X 1 0 0
0 1 0 0 * * * 0 1 0 1 X 0 0 1 0
0 1 0 1 0 * * 1 0 1 1 X X 0 0 0
0 1 0 1 1 * * 0 1 1 0 X X 0 0 0
0 1 1 0 * 0 * 1 0 0 0 X X 0 0 0
0 1 1 0 * 1 * 0 1 1 1 X X 0 0 0
0 1 1 1 * * * 1 0 0 1 X 1 0 1 0
1 0 0 0 * * * 1 0 0 1 1 X 1 0 0
1 0 0 1 * * * 1 0 1 0 X X 0 0 0
1 0 1 0 * * * 0 1 0 1 X X 0 0 0
1 0 1 1 * * * 1 1 0 0 X X 0 0 1
1 1 0 0 * * * 0 0 0 0 X X 0 0 0
1 1 0 1 * * * 0 0 0 0 X X 0 0 0
1 1 1 0 * * * 0 0 0 0 X X 0 0 0
1 1 1 1 * * * 0 0 0 0 X X 0 0 0
- 13 -
Completing the GCD custom single-
purpose processor design
❑ We finished the datapath … …
- 14 -
Optimizing single-purpose processors
original program
Optimization FSMD
opportunities datapath
FSM
15
Optimizing the original program
- 16 -
Optimizing the original program
(cont’)
original program optimized program
0: int x, y; 0: int x, y, r;
1: while (1) { 1: while (1) {
2: while (!go_i); 2: while (!go_i);
3: x = x_i; // x must be the larger number
4: y = y_i; 3: if (x_i >= y_i) {
5: while (x != y) { 4: x=x_i;
replace the subtraction
6: if (x < y) 5: y=y_i;
operation(s) with modulo
7: y = y - x; }
operation in order to speed
else 6: else {
up program
8: x = x - y; 7: x=y_i;
} 8: y=x_i;
9: d_o = x; }
} 9: while (y != 0) {
10: r = x % y;
11: x = y;
12: y = r;
}
13: d_o = x;
}
GCD(42, 8) - 9 iterations to complete the loop GCD(42,8) - 3 iterations to complete the loop
x and y values evaluated as follows : (42, 8), (43, x and y values evaluated as follows: (42, 8),
8), (26,8), (18,8), (10, 8), (2,8), (2,6), (2,4), (2,2). (8,2), (2,0)
- 17 -
Optimizing the FSMD
- 18 -
Optimizing the FSMD (cont.)
int x, y; !1 optimized FSMD
original FSMD
1:
int x, y;
1 !(!go_i) eliminate state 1 – transitions have constant values 2:
2:
!go_i go_i !go_i
2-J: x = x_i
3: y = y_i
merge state 2 and state 2J – no loop operation in
3: x = x_i between them
5:
x!=y
9: d_o = x
6: merge state 5 and state 6 – transitions from state 6
x<y !(x<y) can be done in state 5
y = y -x 8: x = x - y
7:
eliminate state 5J and 6J – transitions from each
6-J: state can be done from state 7 and state 8,
respectively
5-J:
eliminate state 1-J – transition from state 1-J can be
d_o = x done directly from state 9
9:
1-J:
- 19 -
Optimizing the datapath
b R1
R1 R2
0 1 mux1
0 1
mux2
Sharing of functional
units
Multiplier, 0
Controller • one-to-one mapping, as done
FSM previously, is not necessary
• if same operation occurs in
R1
we_r1
different states, they can
R2
we_r2 share a single functional unit
Multi-functional units
20
Optimizing the FSM
❑ State encoding
– task of assigning a unique bit pattern to each state in an FSM
– size of state register and combinational logic vary
– can be treated as an ordering problem
❑ State minimization
– task of merging equivalent states into a single state
• state equivalent if for all possible input combinations the two states
generate the same outputs and transitions to the next same state
21
A modern design flow
- 22 -
HLS: front-end
- 23 -
HLS: middle-end
- 24 -
HLS: back-end
- 25 -