Chapter 4 Retiming: 1 ECE734 VLSI Arrays For Digital Signal Processing
Chapter 4 Retiming: 1 ECE734 VLSI Arrays For Digital Signal Processing
Chapter 4 Retiming: 1 ECE734 VLSI Arrays For Digital Signal Processing
• Retiming
Retiming is a mapping from a given DFG, G
to a retimed DFT, Gr such that the
corresponding transfer function of G and Gr
differ by a pure delay z-L.
• Purposes
– To facilitate pipelining to reduce clock cycle
time
– To reduce number of registers needed.
(C) 2004-2006 by Yu Hen Hu ECE734 VLSI Arrays for Digital Signal Processing 2
Cut-set Retiming
(C) 2004-2006 by Yu Hen Hu ECE734 VLSI Arrays for Digital Signal Processing 3
Feed-forward Cut-Set Retiming
• Consider the FIR digital filter • Retiming:
and its DFG: ynew(n) = b0x(n-1) + b1x(n-2)
y(n) = b0x(n) + b1x(n-1)
ynew(n) = y(n-1)
• Critical path = Max(TM, TA)
x(n) D
x(n-1)
X b0 X b1 x(n) D
x(n-1)
+ y(n) X b0 X b1
(C) 2004-2006 by Yu Hen Hu ECE734 VLSI Arrays for Digital Signal Processing 4
Feed-back Cut Set Retiming
a
a
(C) 2004-2006 by Yu Hen Hu ECE734 VLSI Arrays for Digital Signal Processing 5
Timing Diagram
• Assume tM = tA = 1 t.u.
• Before retiming
x(1) x(2) x(3) x(4)
MAC 1 2 3 4
y(1) y(2) y(3) y(4)
• After retiming
Mul 0 1 2 3 4 5 6 7 8
(C) 2004-2006 by Yu Hen Hu ECE734 VLSI Arrays for Digital Signal Processing 6
Feed-back Cut Set Retiming
x(n) y(n)
+ x(m) y(m)
+
D
2D
a
a
(C) 2004-2006 by Yu Hen Hu ECE734 VLSI Arrays for Digital Signal Processing 7
Slowdown + Retiming
Start with Start with
y(n) = a y(n-1) + x(n) y(n) = a y(n-2) + x(n)
x(n) y(n)
x(m)
+
y(m) +
D
D D
D
a
a
(C) 2004-2006 by Yu Hen Hu ECE734 VLSI Arrays for Digital Signal Processing 8
Example 3.2.1
a2 D a4
a6
a1
• Node delay = 1 t.u. D
• Before retiming: a3 a5
– Critical path: a3 a4 a5
a6
– Clock cycle time = 4 D a4
D a2
– 2 delay units a6
a1 D
• After cut-set retiming D
– Critical path: a3 a5, a4 a6 D
– Clock cycle time = 2 D
a3 a5
– 6 delay units
• After additional retiming D a2
2D a4
D
– Critical path: none a6
a1 2D
– Clock cycle time = 1 D
– 11 delay units D
D 2D
a3 a5
(C) 2004-2006 by Yu Hen Hu ECE734 VLSI Arrays for Digital Signal Processing 9
Slow Down for Cut-Set Retiming
(C) 2004-2006 by Yu Hen Hu ECE734 VLSI Arrays for Digital Signal Processing 10
Node Retiming
• Transfer delay through a node • Retiming equation:
in DFG:
u e v
3D D
2D
r(v) = 2 wr (e) w(e) r (v) - r (u)
v v subject to wr(e) 0.
2D
D 3D • Let p be a path from v0 to vk
v0
e0
v1
e1 … ek
vk
• r(v) = # of delays transferred p
k -1
from out-going edges to
incoming edges of node v w(e) then wr ( p) wr (ei )
i 0
= # of delays on edge e k -1
• wr(e) = # of delays on edge e w(ei ) r (vi 1 ) - r (vi ) )
after retiming i 0
w( p) r (vk ) - r (v0 )
(C) 2004-2006 by Yu Hen Hu ECE734 VLSI Arrays for Digital Signal Processing 11
Invariant Properties
(C) 2004-2006 by Yu Hen Hu ECE734 VLSI Arrays for Digital Signal Processing 12
Node Retiming Examples
r(2) = 1
(C) 2004-2006 by Yu Hen Hu ECE734 VLSI Arrays for Digital Signal Processing 13
DFG Illustration of the Example
(C) 2004-2006 by Yu Hen Hu ECE734 VLSI Arrays for Digital Signal Processing 14
Retiming for Minimizing Clock Period
(C) 2004-2006 by Yu Hen Hu ECE734 VLSI Arrays for Digital Signal Processing 15
Retiming Example Revisited
(C) 2004-2006 by Yu Hen Hu ECE734 VLSI Arrays for Digital Signal Processing 16
Solution continues
(C) 2004-2006 by Yu Hen Hu ECE734 VLSI Arrays for Digital Signal Processing 17
Systematic Solutions
(C) 2004-2006 by Yu Hen Hu ECE734 VLSI Arrays for Digital Signal Processing 18
Bellman-Ford Algorithm
Find shortest path from an -3
arbitrarily chosen origin node U 1 2
to each node in a directed 1 1
graphif no negative cycle exists. 1
Given a direct graph 2
4 3
w(m,n): weight on edge from
node m to node n, = if there
is no edge from m to n
0 -3 2 2 2
r(i,j): the shortest path from node U 0
to node i within j-1 steps. 1 1 0 0 -1 -1
W r
r(i,1) = w(U,i), 0 2 1 1 1 0
r(i,j+1) = min {r(k,j) + w(k,i)},
j = 1, 2, …, N-1 1 0 1 1 1 0
if max(r(:,n-1)-r(:,n))>0, then
there is a negative cycle. Else,
r(i,n-1) gives shortest cycle Note that 1 > 0, hence there is at
length from i to U. least one negative cycle.
spbf.m
(C) 2004-2006 by Yu Hen Hu ECE734 VLSI Arrays for Digital Signal Processing 19
Floyd-Warshall Algorithm
-3
Find shortest path between all 1 2
2 1
possible pairs of nodes in 1
the graph provided no 2
4 3
negative cycle exists.
Algorithm: 0 -3 0 -3 -2 -1
Initialization: R(1) =W; 0 1 2 (2) 3 0 1 2
W R
For k=1 to N 0 2 3 0 2
R(k+1)(u,v) = min{R(k)(u,:) + 1 0 1 -2 0
R(k)(:,v)}
0 -3 -2 -1
If R(k)(u,u) < 0 for any k, u, then 3 0 1 2
a negative cycle exist. Else, R (3) R (4) R
(5)
3 0 0 2
R(N+1)(u,v) is SP from u to v
1 -2 -1 0
(C) 2004-2006 by Yu Hen Hu ECE734 VLSI Arrays for Digital Signal Processing 20
Retiming Example
(C) 2004-2006 by Yu Hen Hu ECE734 VLSI Arrays for Digital Signal Processing 21
Retiming Example
• Floyd-Warshall algorithm
0 1 0 1 0 0
0 -1 -1 -1 0 -1 -1
W R (1) 0 0 R (3) R (4) R (5) R (6) 0 1 0 0
1 0 1 2 1 0
0 0 0 0 0 -1 0 -1 -1 0
0 1 0 0
-1 0 -1 -1
R (2) 0 1 0
1 2 0
0 0 -1 -1 0
(C) 2004-2006 by Yu Hen Hu ECE734 VLSI Arrays for Digital Signal Processing 22
Retiming to Reduce Registers
D
D
Delay
reduction
D
(C) 2004-2006 by Yu Hen Hu ECE734 VLSI Arrays for Digital Signal Processing 23
Time Scaling (Slow Down)
• Transform each delay … x(3) x(2) x(1) … y(3) y(2) y(1)
+
element (register) D to ND
and reduce the sample D
frequency by N fold will slow
down the computation N
times.
• During slow down, the … -- x(3) -- x(2) -- x(1) … y(3) -- y(2) -- y(1)
processor clock cycle time +
remains unchanged. Only
the sampling cycle time 2D
increased.
• Provides opportunity for
retiming, and interleaving.
(C) 2004-2006 by Yu Hen Hu ECE734 VLSI Arrays for Digital Signal Processing 24