Exercises DSP Design

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Exercises in

DSP Design 2020


&
Exam from 2005-12-12
Exam from 2004-12-13

Dept. of Electrical and Information Technology


Some helpful equations

• Retiming:
ωr (e) = ω(e) + r(V ) − r(U )

• Folding:
DF (U → V ) = N w(e) − Pu + v − u

• Unfolding:

Ui → V(i+w)%J , ⌊(i + w)/J⌋ delays for i = 0, . . . , J − 1

• Unfolding of switch:

Switching instance

W l + u = J(W ′ l + ⌊u/J⌋) + (u%J)

Edge:
Uu%J → Vu%J

If the wordlength is not a multiple of J, determine

L = lcm{W, J}

Replace W l + u with the L/W instances

Ll + u + wW, for w = 0, . . . , L/W − 1

• Pipelining:

Cch · Vdd (Cch /M ) · βVdd


Tseq = , Tpip =
k(Vdd − Vt )2 k(βVdd − Vt )2

• Parallel processing:
Cch · βVdd
Tpar = L · Tseq =
k(βVdd − Vt )2

3
Assignments

1. Short Questions

(a) List 3 architectural technologies that are used to implement Digital Sig-
nal Processing. Differentiate them in terms of performance, energy con-
sumption, programmability and development time.
(b) What are FPGAs and ASICs? How are they different from each other?
(c) Describe/explain the following
FFT.
FIR filter.
IIR filter.
(d) Cut-off frequency in a digital filter.

2. A digital word has a wordlength of 6 bits.

(a) What is the resolution if Vmax = 1 V?


(b) What is the dynamic range if VLSB = 0.1 V?

3. Express the following 2’s complement numbers in decimal notation.

• 0010.1101
• 1011.0011

(a) Truncation at the binary point results in . . .?


(b) What is the difference to rounding?
(c) Explain the term DC error.
(d) Apply sign extension to the numbers above.

4. Add the following three numbers in two steps, first carry out ν = ν1 + ν2 , then
ν + ν3 . Use 5 bits in 2’s complement notation for ν1 = 0.6875, ν2 = 0.8125,
and ν3 = −0.5625.

5. Describe the concept of safe scaling.

(a) What is the safe scaling factor if h = [0.1, −0.3, 0.7, −0.3, 0.1]?
(b) Draw an architecture that uses the property of h.
(c) What is the complexity reduction compared to a direct implementation?
(d) Consider another impulse response g = [0.1, 0.5, 0, −0.5, −0.1]. Modify
the previous architecture to cope with g.

5
6

a
z −1
a
u(n) y(n)
D
1/β x(n)

u(n) x(n)
e(n)
(a) (b)

Figure 1: A first order all-pole IIR filter (a) and its scaled signal graph (b).

6. What is the maximum output at x(n) if we have a single feedback section


(|a| < 1) as shown in Figure 1(a) and u(n) is the step function. What can we
do to avoid internal overflows?
If the introduced error e(n) is uniformly distributed white noise and uncorre-
lated to all other signals

(a) Calculate the intervals for round-off and truncation errors of e(n).
(b) What is the mean and variance for the above two types of quantization
errors?
(c) Calculate the SQNR for rounding and truncation.

x(n) D D D D

h0 h1 h2 h3 h4

y(n)

Figure 2: A 5-tap FIR filter in direct form.

7. Consider the 5-tap FIR filter in Figure 2 in the time domain.

(a) Transpose the filter using the SFG.


(b) Show that the functionality is not altered between direct and transposed
form.
(c) Introduce a one-stage pipeline and show the functionality.

A1 A4
In D D Out

M2 A2 M1 A3

Figure 3: An IIR filter structure.

8. For the digital filter in Figure 3:

(a) Define the iteration bound and explain its meaning.


7

(b) Find the iteration bound of the filter. Assume that a multiplication
requires 2 t.u. and an add operation 1 t.u.
(c) Transpose the filter.

9. Consider the lattice filter in Figure 4.

(a) What is the iteration bound expressed in Tadd and Tmult ?


(b) What is the critical path?

D D

Figure 4: Lattice filter.

10. In Figure 5, calculate the critical path and the iteration bound.

(2)
f

(2)
(1) a b c d (1)
(1) D 3D

2D
e
(1)

Figure 5: A DFG with node computation times.

11. † Consider the graph shown in Figure 6. Assume that Tadd and Tmult are 1
t.u. and 2 t.u. .

(a) Compute the iteration bound according to the LPM-algorithm.


(b) The LPM algorithm is more suitable for calculation using a computer
program. Construct a program (MATLAB, C, . . .) and verify it by cal-
culating the iteration bound for the graph in Figure 6

(3)
(2) 1 2 4
2D D (1)

3
(2)

Figure 6: A DFG with node computation times.

12. † Consider the DFG in Figure 8 and assume the computation time for each
node to be T .

(a) What is the maximum achievable sample rate?


8

A1 A3
In

M1 M3

M2 M4

Out 4D 4D
A2 A4

Figure 7: A 4-level pipelined all-pass 8th order IIR filter.

(b) Place feedforward registers at appropriate feedforward cutsets such that


the sample rate will be approximately equal to 1/T . Count the number
of registers.
(c) You only got 4 pipeline registers available. What is the achievable sample
rate now?

B D F
2D

A D H

D
C E G

Figure 8: A DFG.

13. † Consider the IIR filter in Figure 9. Assume Tmult 2 t.u. and Tadd 1 t.u.

(a) Calculate the critical path.


(b) Pipeline the filter to reduce the critical path to 3 t.u.

A2
x(n)

M2 M1

A1 D D

M3 M4 M5

y(n)

A3 A4

Figure 9: Digital IIR filter.

14. † A direct form implementation of the FIR filter is expressed as

y(n) = ax(n) + bx(n − 2) + cx(n − 3).

Assume the time for one multiply-add operation is T .

(a) Pipeline this filter such that the clock period is approximately T .
9

(b) Draw a block filter for a block size of 3. Pipeline this filter such that the
clock period is T . What is the sample rate of the system?
(c) Pipeline the filter in (b) such that the clock period is T /2. Show the
cutset and label the outputs. What is the sample rate now?
15. † A recursive filter is defined by

x(n) = ax(n − 2) + u(n).

(a) Pipeline this multiply-add operator by 2 stages by first breaking up the


multiply-add operation into 2 components and by redistributing the delay
elements in the loop.
(b) Interleave the computation from above with

y(n) = by(n − 2) + v(n)

using the same hardware. Now pipeline the multiply-add operation by 4


stages. Show all the circuits needed for this implementation.
16. Name the different sources of power consumption in digital CMOS circuits.
Elaborate on the influence of technology scaling on these sources?
17. Consider a 5-tap FIR filter
4
X
y(n) = hk x(n − k)
k=0

Assume a multiplier delay of 3 t.u. and an adder delay of 1 t.u.


(a) Pipeline along 1 cutset in the direct form to reduce the critical path. You
should achieve the minimum Tcrit .
(b) Assume that Vdd = 3 V and Vt = 0.4 V. What is the power consumption
of the pipelined filter as a percentage of the original filter (power ratio)?

18. † Two implementations of an 8-tap FIR filter are shown in Figure 10. Assume
the critical path of a multiplier to be twice that of an adder, that is, Tmult =
2Tadd . Therefore, the charging capacitance of a multiplier is twice that of
an adder. The critical path of the direct form structure in Figure 10(a) is
Tmult + 7Tadd = 9Tadd . The structure in Figure 10(b) can be operated with a
lower supply voltage to meet the clock period or sampling period constraint
of 9Tadd . Thus, the structure in Figure 10(b) can be used to reduce power
consumption. Assume that the structure in Figure 10(a) is operated with a
supply voltage of 4 V. Assume the technology threshold voltage to be 0.5 V.
The supply must be greater than 1.2 V to achieve the acceptable noise margin.
What is the minimum supply voltage at which the structure shown in Fig-
ure 10(b) can be operated to achieve the desired sampling period of 9Tadd ?
Calculate the percentage of reduction in power consumption for this structure
as compared with the one from Figure 10(a). Neglect the propagation delay
and capacitance of delay elements in calculation of the critical path or power
consumption.
19. † A datapath has a total capacitance of Ctot . This datapath is pipelined by
M levels. Let Clatch represent the total capacitance of the latches used for 1
pipelining stage. The pipelined system is operated with lower supply voltage
to reduce the power consumption. Assume both systems are operated at same
10

x(n) D D D D D D D

h0 h1 h2 h3 h4 h5 h6 h7

y(n)

(a)

x(n) D D D D

h0 h1 h2 h3 h4 h5 h6 h7

y(n) D D D

(b)

Figure 10: Two implementations of an 8-tap FIR filter.

h2 h0

D y(2n)

x(2n)

h3 h1

h2 h0

D y(2n + 1)

x(2n + 1)

h3 h1

D D

Figure 11: Decomposed 4-tap FIR filter.

speed and assume the propagation delay of the latch to be negligible. Let
Ctot = 10Clatch , Vdd = 4 V, and Vt = 0.6 V. Calculate the power consumption
of the pipelined system as a percentage of that of the sequential systems for
different values of M . What is the optimal M for least power consumption?
11

x(2n)

h2 h0

D y(2n)

h2 + h3 h0 + h1

-
D y(2n + 1)
-
x(2n + 1)

h3 h1

D D

Figure 12: Fast FIR implementation of a 4-tap FIR filter.

20. † Calculate the power reduction of a computation if it is pipelined by 4 stages


and processed using a block structure with block size 4, but is operated with
the same sample rate as the original system. Assume that the original system
was operated at a supply voltage of 5 V, and assume the threshold voltage
of the CMOS process to be 0.4 V. Calculate the power consumption of the
parallel-pipelined system as compared with the original system. What is the
operating supply voltage of the parallel-pipelined system?
21. For the two 4-tap FIR filters in Figure 11 (decomposed form) and Figure 12
(fast FIR) show that their output sequence is the same as the sequence of the
original filter.
(a) What is the complexity per sample compared to the original filter?
(b) What is the critical path compared to the original filter, both direct and
transposed form?
(c) What happens with the complexity when the filter length is increased?
22. † Express the 2-parallel filter algorithm

Y0 = X0 H0 + z −2 X1 H1
Y1 = X0 H0 + X1 H1 − (X0 − X1 )(H0 − H1 )

in terms of a post-processing matrix, a diagonal matrix, and a pre-processing


matrix. Obtain another 2-parallel structure using the transpose of this for-
mulation.
23. Consider a 5-tap FIR filter defined by the transfer function

H(z) = a + bz −1 + bz −3 + cz −4 + cz −5 .

Design a filter with minimum number of multipliers. Design a 2-parallel filter


using the fast FIR approach which also uses a minimum number of multipliers.
24. † Consider the wave digital filter shown in Figure 13. Assume that Tmult = 20
ns and Tadd = 8 ns.
12

(a) Calculate the iteration bound.


(b) Calculate the critical path.
(c) Pipeline and/or retime the filter to achieve a critical path equal to the
iteration bound.

D D

2D 2D

Figure 13: A wave digital filter structure.

25. † Consider the DFG in Figure 14.

(a) What is the maximum sample rate?


(b) What is the fundamental limit of the sample rate?
(c) Retime to minimize the clock period.

(10)
(20) A B C D (5)

(10)
D

Figure 14: A DFG with node computation times.

26. † Recall the 8th -order IIR filter from Figure 7. Assume that Tmult = 2 ns and
Tadd = 1 ns.

(a) Calculate the iteration bound.


(b) Calculate the critical path.
(c) Pipeline and/or retime the filter to achieve a critical path of 2 ns.

27. † Interblock pipelining of recursive DFGs.

(a) Retime the system in Figure 15(a) to achieve interblock pipelining, that
is, each interblock communicating edge should have at least one delay
element.
(b) To obtain interblock pipelining for the system shown in Figure 15(b), use
an appropriate slow-down approach and then use retiming. What is the
hardware utilization efficiency of this system?

28. Recall the FFT algorithm.

(a) How many non-trivial twiddle factors has an 8-point FFT?


(b) Elaborate on hardware-mapped, pipelined, and time-multiplexed imple-
mentations of this algorithm.
(c) Describe the concept of bit-reversed addressing.
13

5D 5D 5D
7D 7D 7D

D D D

(a)

D D D

(b)

Figure 15: Two recursive DFGs.


(8)
E

(2)
(10) A B C (4)

D
3D D
D
(6)

Figure 16: A DFG with node computation times.

29. There are two sequences x and y with length N and M , respectively. How
many points are needed to represent the frequency response of their convolu-
tion.
30. † The DFG in Figure 16 is subject to unfolding. The numbers in parentheses
are the computation times of the nodes.
(a) What is the iteration bound of this DFG? What is the actual iteration
period?
(b) Retime this DFG to minimize the iteration period. What is the actual
iteration period of the retimed DFG?
(c) Unfold both the original and the retimed DFG by a factor of 2. What
are their iteration periods?

X S

’0’ 12l
D

Figure 17: Bit-serial adder.

31. † Unfold the bit-serial adder with wordlength 12 in Figure 17 by factors 4 and
5 to obtain the corresponding digit-serial adders.
32. Consider the 2-section biquad filter shown in Figure 18. Assume a folding
factor N = 5, a multiplier which is pipelined by two stages, and an adder
which is pipelined by one stage.
14

(a) Perform retiming for folding.


(b) Draw the folded architecture with the following folding sets:

SM 1 = {M 2, M 1, M 3, M 6, M 7}
SM 2 = {M 4, —, M 5, M 8, M 9}
SA1 = {A4, —, A1, A2, A3}
SA2 = {A5, A6, A7, A8, —}

M1 A2 A4 A6 A8

D D

A1 A3 A5 A7

M2 M3 M6 M7
D D

M4 M5 M8 M9

Figure 18: Biquad filter with 2 sections.

33. Recall the lattice filter from Figure 7 with a folding factor N = 2.

(a) Retime for folding.


(b) Fold the retimed graph.
(c) Minimize the number of registers by lifetime analysis.

SM 1 = {M 2, M 1}
SM 2 = {M 3, M 4}
SA1 = {A1, A2}
SA2 = {A4, A3}

34. Consider the digital filter in Figure 19.

(a) Fold this architecture with folding factor 4 using the folding set

SA = {A4, A3, A2, A1}


SM = {M 1, M 3, M 2, —}

Multipliers and adders are pipelined by 2 and 1 stages, respectively.


Preprocess the DFG for folding by pipelining/retiming if necessary.
(b) Draw the folded architecture and show the switching instances.
(c) Make a lifetime chart and find the lowest number of registers needed.
(d) What are the advantages and disadvantages with folding?

35. Fold the digital filter from Figure 3 with a folding factor N = 4 using the
provided folding set. Assume the multiplier to be pipelined by 2 stages and
the adder by 1 stage, that is, a multiplier requires 2 t.u. and an adder 1 t.u.

(a) Preprocess the DFG for folding by pipelining if necessary.


15

A1 M1
In

A2

A3 D

M2 M3

Out

A4

Figure 19: An IIR filter structure.

(b) Draw the folded architecture and show the switching instances.

SA = {A1, A2, A3, A4}


SM = {—, —, M 2, M 1}
16
Exam from 2004-12-13

1. Consider the lattice filter in Figure 20. Assume that an addition takes 1 t.u.
and a multiplication 2 t.u.

(a) What is the critical path of this architecture?


(b) What is the formal definition of the iteration bound and what does it
mean? Determine the iteration bound of this algorithm.
(c) What can you do to improve the architecture so it achieves the iteration
bound? (3 approaches)
(d) Draw the improved architecture that results in minimum overhead.

A1 M1 A3 M2
In

A2 A4

D D
A5 A6
M3 M4 M5

Out

A7 A8

Figure 20: The lattice filter for problem 1.

2. A 6th -order orthogonal filter is shown in Figure 21. All operations (R) are
CORDIC rotation operations. Each rotation takes T t.u.

(a) Determine the iteration bound and the critical path of this filter.
(b) Improve the filter to achieve a critical path of 2T t.u. Show all the cutset
locations used for retiming explicitly.

D D D D D D
R R R R R R

D D D D D D

Out R R R R R R In

Figure 21: A 6th -order orthogonal filter.

17
18

3. From a specification you should design 2 FIR filter architectures. The filter
has 8 taps computed by MATLAB’s fir1 function resulting in
h = [0.0136, 0.0557, 0.1655, 0.2652, 0.2652, 0.1655, 0.0557, 0.0136]
A multiplier has a propagation delay of 3 ns and an adder of 1 ns. The delay
introduced by the registers is neglected. Furthermore, you can assume that
all signals are −1 ≤ x < 1.
(a) Calculate the values of the coefficients when a wordlength of 8 bits and
truncation are used.
(b) The coefficients have certain properties which can be used to reduce the
algorithmic strength and increase the precision of the calculation for a
given number of bits. Describe such possibilities.
(c) To avoid overflow we can either scale the input signals or allow for am
increased internal dynamic range. Describe one internal and one external
way to ensure avoidance of overflow.
(d) Design a filter architecture for a sampling rate greater than 300 Msam-
ples/s. Motivate your choice of architecture by comparing with alterna-
tive architectures.
(e) In another application, we have a lower requirement on the sampling rate
(> 30 Msamples/s). Propose another architecture that uses the relaxed
requirement to reduce the area.
4. Shown in Figure 22 are the probability distributions of the quantization error
for two common quantization schemes. Assume that ∆ = 2−(w−1) , where w
is the desired wordlength. Assign the names of the schemes and fill in the
number on the respective axes in terms of ∆. Calculate mean values and
variances for both schemes.

(a) (b)

Figure 22: Two probability distributions for quantization schemes.

5. Consider the system in Figure 23.


(a) Unfold the system 2 times and draw the new system.
(b) Change the switching instances from 4l to 3l and unfold again. Note
that the instance 3l + 3 becomes obsolete.
(c) Which are the advantages and disadvantages of the unfolded systems
compared to the original one?

6. Consider the FIR filter in Figure 24. The time required for a multiply op-
eration is 4T and addition takes T . Assume that Vdd = 5 V, Vt = 0.6 V,
Cadd = C, and Cmult = 4C.
(a) Transpose the filter and describe the transformation process.
(b) Pipeline the transposed filter to achieve a critical path of ≤ 4T and
compare the power consumption of the pipelined filter with the non-
pipelined version.
19

A 4l + 1
C
D

4l
4l + 2
4l + 3
3D
B

Figure 23: The DFG for problem 5.

(c) Parallelize the transposed filter 2 times and compare the power con-
sumption of the parallel filter with the two previous ones. Note that the
number of registers should be the same as in the original filter.

x(n) D D D

h0 h1 h2 h3

y(n)

Figure 24: A 4-tap FIR filter in direct form.

7. Consider the digital filter in Figure 25.


(a) Fold the architecture with folding factor 3 using the folding set

SA = {A3, A5, —} and {A4, A1, A2}


SM = {M 2, M 1, M 3}

Assume the multiplier to be pipelined by two stages and the adder by


one stage. Preprocess the DFG for folding by pipelining/retiming if
necessary.
(b) Draw the folded architecture and show the switching instances.
(c) Make a lifetime chart and find the lowest number of registers needed.
(d) What are the advantages and disadvantages with folding?

2D
M3

A1 A2 A3
In 2D Out
A4

2D

A5 D D
M1

M2

Figure 25: The IIR filter for problem 7.


20
Exam from 2005-12-12

1. Consider the digital filter in Figure 26, where a multiplication takes 2 t.u.
and an addition takes 1 t.u. The units’ capacitances are proportional to their
execution times.

A1 A4
In D D Out

M2 A2 M1 A3

Figure 26: .

(a) Determine iteration bound and critical path of this filter. Pipeline and/or
retime to achieve a critical path equal to T∞ .
(b) Describe the concept of supply voltage scaling. What is the achievable
power reduction due to this scaling? Assume Vdd = 1.8 V and Vt = 0.6 V.

2. Unfold the DFG in Figure 27 by a factor J = 3.


4l

4l+3
In A D 2D
4l+1
Out

4l+2
2D B 2D 2D

Figure 27: .

3. Consider a 7-tap linear-phase FIR filter. Additions and multiplications require


2 ns and 4 ns, respectively.

(a) Given a coefficient wordlength of 8 bits in 2’s complement notation. How


do you guarantee to use the given number range [−1, 1) as efficient as
possible. Show with a number example.
(b) Draw an area-efficient architecture that fulfills the filter’s sample rate
requirement of ≥ 90 Msamples/s. Motivate your choice by comparing to
other possible architectures.
(c) Now the sample rate only has to be ≥ 20 Msamples/s. Draw and explain
another area-efficient architecture that fulfills the new requirement.

21
22

hi

MAi

Figure 28: .

(d) Figure 28 shows a multiply-add operation that involves coefficient hi .


This block is to be used for a general 6-tap FIR filter. Draw a hardware-
mapped architecture that uses MAi and assign the corresponding labels
i.
(e) Fold the filter given that only three such multiply-add units are available.
Each multiply-add unit is 1-stage pipelined. Derive a folding set by
assuming that operation MAi is processed in unit S⌊i/N ⌋ at times (i +
1)%N .
(f) What is the sample rate of the folded architecture? Compare to the
original architecture from (3d).

4. Consider a 16-point FFT as in Figure 29, where the complex arithmetic unit
adder/subtractor has a delay Tadd = 1 ns and the multiplier Tmult = 3 ns. The
input and coefficient wordlengths are both 8 bits in 2’s complement notation
in the range [−1, 1). Furthermore, the inputs are limited to ± √12 .

x(0)
x(1)
x(2)
x(3)
x(4)
x(5)
x(6)
x(7)
x(8)
x(9)
x(10)
x(11)
x(12)
x(13)
x(14)
x(15)

Figure 29: .

(a) Label the outputs top to bottom in Figure 29. How is this addressing
mode called?
(b) What is the critical path? Pipeline the structure to achieve Tcrit = 4 ns.
(c) How much does the wordlength need to be increased to avoid overflow?
(d) Figure 30 shows a pipeline FFT structure, where each R2 BF block
consists of a radix-2 butterfly unit and a multiplier. Now consider a
1024-point FFT. How many R2 BF stages do we need?
23

(e) Assume the input data arrives in natural order, that is, x(0), x(1), . . ..
What is the memory requirement (in number of samples) for each stage?
Explain the functionality of the architecture.

Mem Mem Mem Mem Mem

R2 R2 R2 R2 R2
BF BF BF BF BF

Figure 30: .

5. The filter in Figure 31 is subject to folding. The pipeline stages for adder
and multiplication units are 1 and 2, respectively. Assume a folding factor of
N = 6 and the following folding sets:

SA = {A1, —, —, A2, A3, —}


SM = {—, M 2, M 3, —, M 1, —}

A1 M1
In

M2
A2

D
M3

Out

A3

Figure 31: .

(a) What is the iteration bound of this filter?


(b) Preprocess the filter to carry out folding.
(c) Draw the folded architecture.
(d) Determine the minimum number of registers needed and show the cor-
responding allocation table.
(e) Redraw the register-minimized architecture.
24

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy