Exercises DSP Design
Exercises DSP Design
Exercises DSP Design
• Retiming:
ωr (e) = ω(e) + r(V ) − r(U )
• Folding:
DF (U → V ) = N w(e) − Pu + v − u
• Unfolding:
• Unfolding of switch:
Switching instance
Edge:
Uu%J → Vu%J
L = lcm{W, J}
• Pipelining:
• Parallel processing:
Cch · βVdd
Tpar = L · Tseq =
k(βVdd − Vt )2
3
Assignments
1. Short Questions
(a) List 3 architectural technologies that are used to implement Digital Sig-
nal Processing. Differentiate them in terms of performance, energy con-
sumption, programmability and development time.
(b) What are FPGAs and ASICs? How are they different from each other?
(c) Describe/explain the following
FFT.
FIR filter.
IIR filter.
(d) Cut-off frequency in a digital filter.
• 0010.1101
• 1011.0011
4. Add the following three numbers in two steps, first carry out ν = ν1 + ν2 , then
ν + ν3 . Use 5 bits in 2’s complement notation for ν1 = 0.6875, ν2 = 0.8125,
and ν3 = −0.5625.
(a) What is the safe scaling factor if h = [0.1, −0.3, 0.7, −0.3, 0.1]?
(b) Draw an architecture that uses the property of h.
(c) What is the complexity reduction compared to a direct implementation?
(d) Consider another impulse response g = [0.1, 0.5, 0, −0.5, −0.1]. Modify
the previous architecture to cope with g.
5
6
a
z −1
a
u(n) y(n)
D
1/β x(n)
u(n) x(n)
e(n)
(a) (b)
Figure 1: A first order all-pole IIR filter (a) and its scaled signal graph (b).
(a) Calculate the intervals for round-off and truncation errors of e(n).
(b) What is the mean and variance for the above two types of quantization
errors?
(c) Calculate the SQNR for rounding and truncation.
x(n) D D D D
h0 h1 h2 h3 h4
y(n)
A1 A4
In D D Out
M2 A2 M1 A3
(b) Find the iteration bound of the filter. Assume that a multiplication
requires 2 t.u. and an add operation 1 t.u.
(c) Transpose the filter.
D D
10. In Figure 5, calculate the critical path and the iteration bound.
(2)
f
(2)
(1) a b c d (1)
(1) D 3D
2D
e
(1)
11. † Consider the graph shown in Figure 6. Assume that Tadd and Tmult are 1
t.u. and 2 t.u. .
(3)
(2) 1 2 4
2D D (1)
3
(2)
12. † Consider the DFG in Figure 8 and assume the computation time for each
node to be T .
A1 A3
In
M1 M3
M2 M4
Out 4D 4D
A2 A4
B D F
2D
A D H
D
C E G
Figure 8: A DFG.
13. † Consider the IIR filter in Figure 9. Assume Tmult 2 t.u. and Tadd 1 t.u.
A2
x(n)
M2 M1
A1 D D
M3 M4 M5
y(n)
A3 A4
(a) Pipeline this filter such that the clock period is approximately T .
9
(b) Draw a block filter for a block size of 3. Pipeline this filter such that the
clock period is T . What is the sample rate of the system?
(c) Pipeline the filter in (b) such that the clock period is T /2. Show the
cutset and label the outputs. What is the sample rate now?
15. † A recursive filter is defined by
18. † Two implementations of an 8-tap FIR filter are shown in Figure 10. Assume
the critical path of a multiplier to be twice that of an adder, that is, Tmult =
2Tadd . Therefore, the charging capacitance of a multiplier is twice that of
an adder. The critical path of the direct form structure in Figure 10(a) is
Tmult + 7Tadd = 9Tadd . The structure in Figure 10(b) can be operated with a
lower supply voltage to meet the clock period or sampling period constraint
of 9Tadd . Thus, the structure in Figure 10(b) can be used to reduce power
consumption. Assume that the structure in Figure 10(a) is operated with a
supply voltage of 4 V. Assume the technology threshold voltage to be 0.5 V.
The supply must be greater than 1.2 V to achieve the acceptable noise margin.
What is the minimum supply voltage at which the structure shown in Fig-
ure 10(b) can be operated to achieve the desired sampling period of 9Tadd ?
Calculate the percentage of reduction in power consumption for this structure
as compared with the one from Figure 10(a). Neglect the propagation delay
and capacitance of delay elements in calculation of the critical path or power
consumption.
19. † A datapath has a total capacitance of Ctot . This datapath is pipelined by
M levels. Let Clatch represent the total capacitance of the latches used for 1
pipelining stage. The pipelined system is operated with lower supply voltage
to reduce the power consumption. Assume both systems are operated at same
10
x(n) D D D D D D D
h0 h1 h2 h3 h4 h5 h6 h7
y(n)
(a)
x(n) D D D D
h0 h1 h2 h3 h4 h5 h6 h7
y(n) D D D
(b)
h2 h0
D y(2n)
x(2n)
h3 h1
h2 h0
D y(2n + 1)
x(2n + 1)
h3 h1
D D
speed and assume the propagation delay of the latch to be negligible. Let
Ctot = 10Clatch , Vdd = 4 V, and Vt = 0.6 V. Calculate the power consumption
of the pipelined system as a percentage of that of the sequential systems for
different values of M . What is the optimal M for least power consumption?
11
x(2n)
h2 h0
D y(2n)
h2 + h3 h0 + h1
-
D y(2n + 1)
-
x(2n + 1)
h3 h1
D D
Y0 = X0 H0 + z −2 X1 H1
Y1 = X0 H0 + X1 H1 − (X0 − X1 )(H0 − H1 )
H(z) = a + bz −1 + bz −3 + cz −4 + cz −5 .
D D
2D 2D
(10)
(20) A B C D (5)
(10)
D
26. † Recall the 8th -order IIR filter from Figure 7. Assume that Tmult = 2 ns and
Tadd = 1 ns.
(a) Retime the system in Figure 15(a) to achieve interblock pipelining, that
is, each interblock communicating edge should have at least one delay
element.
(b) To obtain interblock pipelining for the system shown in Figure 15(b), use
an appropriate slow-down approach and then use retiming. What is the
hardware utilization efficiency of this system?
5D 5D 5D
7D 7D 7D
D D D
(a)
D D D
(b)
(2)
(10) A B C (4)
D
3D D
D
(6)
29. There are two sequences x and y with length N and M , respectively. How
many points are needed to represent the frequency response of their convolu-
tion.
30. † The DFG in Figure 16 is subject to unfolding. The numbers in parentheses
are the computation times of the nodes.
(a) What is the iteration bound of this DFG? What is the actual iteration
period?
(b) Retime this DFG to minimize the iteration period. What is the actual
iteration period of the retimed DFG?
(c) Unfold both the original and the retimed DFG by a factor of 2. What
are their iteration periods?
X S
’0’ 12l
D
31. † Unfold the bit-serial adder with wordlength 12 in Figure 17 by factors 4 and
5 to obtain the corresponding digit-serial adders.
32. Consider the 2-section biquad filter shown in Figure 18. Assume a folding
factor N = 5, a multiplier which is pipelined by two stages, and an adder
which is pipelined by one stage.
14
SM 1 = {M 2, M 1, M 3, M 6, M 7}
SM 2 = {M 4, —, M 5, M 8, M 9}
SA1 = {A4, —, A1, A2, A3}
SA2 = {A5, A6, A7, A8, —}
M1 A2 A4 A6 A8
D D
A1 A3 A5 A7
M2 M3 M6 M7
D D
M4 M5 M8 M9
33. Recall the lattice filter from Figure 7 with a folding factor N = 2.
SM 1 = {M 2, M 1}
SM 2 = {M 3, M 4}
SA1 = {A1, A2}
SA2 = {A4, A3}
(a) Fold this architecture with folding factor 4 using the folding set
35. Fold the digital filter from Figure 3 with a folding factor N = 4 using the
provided folding set. Assume the multiplier to be pipelined by 2 stages and
the adder by 1 stage, that is, a multiplier requires 2 t.u. and an adder 1 t.u.
A1 M1
In
A2
A3 D
M2 M3
Out
A4
(b) Draw the folded architecture and show the switching instances.
1. Consider the lattice filter in Figure 20. Assume that an addition takes 1 t.u.
and a multiplication 2 t.u.
A1 M1 A3 M2
In
A2 A4
D D
A5 A6
M3 M4 M5
Out
A7 A8
2. A 6th -order orthogonal filter is shown in Figure 21. All operations (R) are
CORDIC rotation operations. Each rotation takes T t.u.
(a) Determine the iteration bound and the critical path of this filter.
(b) Improve the filter to achieve a critical path of 2T t.u. Show all the cutset
locations used for retiming explicitly.
D D D D D D
R R R R R R
D D D D D D
Out R R R R R R In
17
18
3. From a specification you should design 2 FIR filter architectures. The filter
has 8 taps computed by MATLAB’s fir1 function resulting in
h = [0.0136, 0.0557, 0.1655, 0.2652, 0.2652, 0.1655, 0.0557, 0.0136]
A multiplier has a propagation delay of 3 ns and an adder of 1 ns. The delay
introduced by the registers is neglected. Furthermore, you can assume that
all signals are −1 ≤ x < 1.
(a) Calculate the values of the coefficients when a wordlength of 8 bits and
truncation are used.
(b) The coefficients have certain properties which can be used to reduce the
algorithmic strength and increase the precision of the calculation for a
given number of bits. Describe such possibilities.
(c) To avoid overflow we can either scale the input signals or allow for am
increased internal dynamic range. Describe one internal and one external
way to ensure avoidance of overflow.
(d) Design a filter architecture for a sampling rate greater than 300 Msam-
ples/s. Motivate your choice of architecture by comparing with alterna-
tive architectures.
(e) In another application, we have a lower requirement on the sampling rate
(> 30 Msamples/s). Propose another architecture that uses the relaxed
requirement to reduce the area.
4. Shown in Figure 22 are the probability distributions of the quantization error
for two common quantization schemes. Assume that ∆ = 2−(w−1) , where w
is the desired wordlength. Assign the names of the schemes and fill in the
number on the respective axes in terms of ∆. Calculate mean values and
variances for both schemes.
(a) (b)
6. Consider the FIR filter in Figure 24. The time required for a multiply op-
eration is 4T and addition takes T . Assume that Vdd = 5 V, Vt = 0.6 V,
Cadd = C, and Cmult = 4C.
(a) Transpose the filter and describe the transformation process.
(b) Pipeline the transposed filter to achieve a critical path of ≤ 4T and
compare the power consumption of the pipelined filter with the non-
pipelined version.
19
A 4l + 1
C
D
4l
4l + 2
4l + 3
3D
B
(c) Parallelize the transposed filter 2 times and compare the power con-
sumption of the parallel filter with the two previous ones. Note that the
number of registers should be the same as in the original filter.
x(n) D D D
h0 h1 h2 h3
y(n)
2D
M3
A1 A2 A3
In 2D Out
A4
2D
A5 D D
M1
M2
1. Consider the digital filter in Figure 26, where a multiplication takes 2 t.u.
and an addition takes 1 t.u. The units’ capacitances are proportional to their
execution times.
A1 A4
In D D Out
M2 A2 M1 A3
Figure 26: .
(a) Determine iteration bound and critical path of this filter. Pipeline and/or
retime to achieve a critical path equal to T∞ .
(b) Describe the concept of supply voltage scaling. What is the achievable
power reduction due to this scaling? Assume Vdd = 1.8 V and Vt = 0.6 V.
4l+3
In A D 2D
4l+1
Out
4l+2
2D B 2D 2D
Figure 27: .
21
22
hi
MAi
Figure 28: .
4. Consider a 16-point FFT as in Figure 29, where the complex arithmetic unit
adder/subtractor has a delay Tadd = 1 ns and the multiplier Tmult = 3 ns. The
input and coefficient wordlengths are both 8 bits in 2’s complement notation
in the range [−1, 1). Furthermore, the inputs are limited to ± √12 .
x(0)
x(1)
x(2)
x(3)
x(4)
x(5)
x(6)
x(7)
x(8)
x(9)
x(10)
x(11)
x(12)
x(13)
x(14)
x(15)
Figure 29: .
(a) Label the outputs top to bottom in Figure 29. How is this addressing
mode called?
(b) What is the critical path? Pipeline the structure to achieve Tcrit = 4 ns.
(c) How much does the wordlength need to be increased to avoid overflow?
(d) Figure 30 shows a pipeline FFT structure, where each R2 BF block
consists of a radix-2 butterfly unit and a multiplier. Now consider a
1024-point FFT. How many R2 BF stages do we need?
23
(e) Assume the input data arrives in natural order, that is, x(0), x(1), . . ..
What is the memory requirement (in number of samples) for each stage?
Explain the functionality of the architecture.
R2 R2 R2 R2 R2
BF BF BF BF BF
Figure 30: .
5. The filter in Figure 31 is subject to folding. The pipeline stages for adder
and multiplication units are 1 and 2, respectively. Assume a folding factor of
N = 6 and the following folding sets:
A1 M1
In
M2
A2
D
M3
Out
A3
Figure 31: .