6-Structures For DSP
6-Structures For DSP
6-Structures For DSP
x n Z X z X z z n 1dz
1 1
2j C
The ROC is the set of points in the complex plane for which the Z-transform
converges, i.e.
ROC z :
x n z
n
n
2
Regions of convergence (ROC)
The ROC of a Z-transform can be the empty set. Also, different ROCs can
be associated with the same Z-transform. This means that the Z-transform
is uniquely defined when and only when the ROC is specified.
EXAMPLES
1. Given the sequence x(n)=an (|a|<1) for n=- ∞,…, + ∞
the Z-transform does not converge. Unit circle
Im{z} |a|-radius circle
az
X z 1 n 1
a n z n for z a ROC ROC
n 0 n 0 1 az 1
Im{z} |a|-radius circle
Unit circle
3. Given the anticausal sequence x(n)=an u(-n-1)
(|a|<1) the Z-transform converges for |z|<|a|.
a z 1 1 az
1 ROC
X z n n 1 k 1
a z 1
for z a Re{z}
n n 0
In (2) and (3) Z-transforms are equal, but ROCs are complementary sets
3
Z-transform vs. DTFT relationship
If the unit circle belongs to ROC: X z z e j
x n e jn X e j DTFT
n
Re{z}
High (i.e. Nyquist)
frequency - π
• If the unit circle belongs to the ROC of the Z-transform the DTFT certainly
exists.
4
Summary of Z-transform properties - 1
• Linearity
If x(n) has a z-transform X(z) with a region of convergence Rx, and
if y(n) has a z-transform Y(z) with a region of convergence Ry,
5
Summary of Z-transform properties - 2
• Multiplication by an exponential
If a sequence x(n) is multiplied by a complex exponential αn.
n x(n)
Z
X ( 1 z)
• Convolution theorem
If x(n) has a z-transform X(z) with a region of convergence Rx, and if
h(n) has a z-transform H(z) with a region of convergence Rh, then
• Conjugation
If X(z) is the z-transform of x(n), the z-transform of x*(n) is
x (n)
Z
X (z )
6
Summary of Z-transform properties - 3
• Derivative
If X(z) is the z-transform of x(n), the z-transform of nx(n) is
dX ( z )
nx(n) z
Z
dz
x(n)
• Initial value theorem
If X(z) is the z-transform of x(n) and x(n) is equal to zero for n<0,
the initial value x(0), may be found from X(z) as follows:
x(0) lim X ( z )
z
7
Summary of basic Z-transforms
8
Transfer function
As we know, in the time-domain an LTI system can be represented in
two equivalent forms, i.e.
h n i
Z
h n i y n xn * hn Y z H z X z
i 0
Y z
M
k real-valued coefficients
bk z
H z
• The coefficients are the same as the difference
k 0
X z equation
N
k
ak z • If the ROC of H(z) includes the unit circle, H(z)
k 0
poles for z=ejω is the Freq. Response of the system
9
Partial fraction decomposition
If M ≥ N and all poles are simple H z Q z F z Q z DRzz
M N N
B z 1 p z
r Ak
r 1
r 0 k 1 k
where Ak F z 1 z 1pk z pk
is the residue associated with the kth pole
Z-1
M N N
x n r 0
Br n r
k 1
Ak pkn un
If all coefficients of the transfer function are real, all poles and residues
are either real numbers or complex conjugate pairs. For each complex
j j
conjugate pair of poles (i.e. if pk re 0 and pk re 0 ) we have that:
*
Ak* Z-1
2 Ak r n cos 0n ArgAk un
Ak
1 pk z 1 1 p*k z 1
10
Stability criterion in the Z-domain
N
• The term F z
Ak
1 of a transfer function associated to a
k 1 1 p k z
• Theorem: an LTI causal system is stable if and only if all poles are
inside the unit circle
Proof:
If a system is stable and causal, then hn hn B
n 0
. Then,
hn z hn z
n n n
B z z 1
n 0 n 0 n 0
If all poles are inside the unit circle. Then, ROC max pk 1
k 1,...,N
n 0
hn z n
11
Special cases
M
• If N=0 (no poles) and a0=1 H z
k 0
bk z k All-zero transfer function
12
Inverse systems
x(n) z(n) y(n)
H1(z) H2(z)
Y z H1 z H 2 z X z
• Def. We say that H2(z) is the inverse of H1(z) if and only if
H1 z H 2 z 1 H 2 z
1
H1 z
• For H2(z) to be stable, its poles (namely the zeros of H1(z)) must be
within the unit circle.
• A causal and stable LTI system has an inverse which is also causal
and stable if and only if both poles and zeros lie inside the unit circle.
This kind of systems are called Minimum Phase Systems
13
Differences in LTI implementations
For a causal LTI system, the processing complexity depends on
algorithm implementation
• Impulse response: y n hk xn k
k 0
N. of ADD & MUL tends to infinity when input sequence x(n)
M N
y n k
bk ak
• Difference equation: y n x n k
a
k 0 0
a
k 1 0
14
Block diagrams and signal flow graphs
• Usually, there are two equivalent representations for
representing graphically an LTI system:
– Block diagrams
– Signal flow (or data flow graphs)
• They are very useful in DSP algorithm design & modelling, i.e.
before HW or SW implementation
Example
Block diagram Signal flow graph
2-input ADD
+
Delay unit
2-input MUL
z-1
z-1 a
x
a
15
Properties of the
signal flow graphs: basics
Graph representation is simpler than block diagram and takes
advantages of graph theory properties
• Nodes with multiple inputs (usually two) and 1 output are
adders
• Nodes with 1 input and multiple outputs are branches
• Each edge of the graph is provided with a direction and a value
called transference:
– If transference is a number multiplication
– If transference is z-1 delay element
– If transference is not expressed means 1
16
Properties of the
signal flow graphs: feedback
• Definition: For a given oriented graph G, a loop can be defined
as a subset of G in which:
– Each node is associated exactly with two edges
– The direction of all edges is the same
17
Properties of the signal flow graphs:
transposition
Property: For a given oriented graph modelling an LTI system, the new
graph obtained following the two following rules:
– Invert the direction of each edge without changing the trasference values;
– Exchange input and output
is perfectly equivalent to the original graph (the H(z) is the same)
Example
x[n] y[n] x[n] y[n]
z-1 z-1
a T a
20
Why computability is important?
• Graph computability is essential in DSP applications because only
systems that can be modelled with computable graph can be
implemented as SW algorithms, i.e. as routines in DSPs.
• A large number of DSP algorithms (i.e. IIR filters, FIR filters, digital
resonators) not necessarily linear can be implemented by means of
difference equations
Example: a delta-sigma… resonator
int Delta_sigma_resonator (void){
int n;
for (n=0; n<=100; n++){
(v1[j-1] >= 0.0) ? y[j]=1.0 : y[j] =-1.0;
x2[j]=x2[j-1]-a21*y[j];
x1[j]=x1[j-1]+a12*o[j];
C u1[j]=u1[j-1]+s0[j]-y[j];
code v1[j]=v1[j-1]+u1[j]-y[j];
}
3 nested loops, but it is computable return;}
21
LTI structures: direct form I
• The Direct Form I LTI structure results immediately from the numerator
and the denominator of the transfer function expressed as polynomials
(hp. a0=1) M N
M
y n b xn k a yn k
k k
b z
k 0 k 1
k
k
M
Y z X z H numz H den z X z X z
k 0 k 1
bk z
N
N
1 ak z k k 0
1 a k z k
k 1 k 1
b0
x(n) y(n)
z-1 Cost:
b1 z-1
a1
• M+N+1 MUL
z-1 z-1
b2 a2 • M+N ADD
… • M+N Delays
M-size ladder N-size ladder
22
LTI structures: direct form II (canonical)
• Direct Form II can be obtained simply by reversing the numerator transfer
function and the denominator transfer function.
• In this way the delay lines can be shared and the number of memory
location is roughly halved, i.e.:
M
bk z k
Y z k 0
N
X z H numz H den z X z H den z H numz X z
1
k 1
ak z k
x[n] b0 y[n]
z-1 Cost:
a1 b1
• M+N+1 MUL
z-1
a2 b2 • M+N ADD
z-1 • max(M,N)
… Delays
23
LTI structures: transposed form II
b0
It results from direct form II by
transposing the signal flow graph z-1
a1 b1
1. Reverse IN, OUT and edge direction z-1
a2 b2
2. Replace adder with branches and vice versa
z-1
3. Re-order nodes …
T
b0
Cost: b1 z-1 a
1
• M+N+1 MUL
b2 z-1 a
2
• M+N 2-term ADD
z-1
• max(M,N) Delays …
Transposed forms are naturally pipelined-oriented: i.e. much more
suitable for HW implementations, because long data-paths are broken by
registers into shorter data paths
24
LTI structures: cascade form (1)
• LTI cascade structure results from the decomposition of both numerator
and denominator polynomials in a product of first-or second-order
polynomials.
• Each polynomial is implemented as a Direct I or Direct II form
1 g k z 1 hk z 1 1 hk* z 1
M M1 M2
bk z k
1
Y z k 0
X z b0 k 1 k 1
X z
1 ck z 1 1 d k z 1 1 d k* z 1
N M1 M2
1 ak z k
k 1 k 1 k 1
where:
M=M1+2M2 is the number of H(z) zeros (M1 real, 2M2 complex)
N=N1+2N2 is the number of H(z) poles (N1 real, 2N2 complex)
25
LTI structures: cascade form (2)
• In general, in DSP applications it is profitable to have modular
structures (i.e. regular compositions of similar blocks) working on
real (i.e. not complex) data.
N
Y z b0 H k z X z N s N 1 / 2
s
k 1
b1k z1 z2 2 Rehk
where: ~
H k z
1 hk z 1 hk z 1 b1k z b2k z
1 * 1 ~ 1 ~ 2 ~
b2 k z1 z2 hk
2
26
LTI structures: cascade form (3)
• The cascade structure consists of Ns Biquad structures that are
usually implemented using a Direct II form, and with the common gain
b0 distributed among all terms, i.e. b0= b01 b02… b0Ns
~
b1k b1k b0 k b2 k b~2 k b0 k
b0 k b1k z 1 b2 k z 2
H k z
1 a~1k z 1 a~ 2 k z 2
• If M=N (even) Ns=N/2 and all coefficients are different from 0
• If M=N (odd) or M<N, Ns=N/2 with some Biquad coefficients in some
sections are equal to 0
Cost (max):
• 5 Ns MUL Number of MULs in worst
case is 25% higher but it is
• 4 Ns ADD
more robust to finite precision
• 2 Ns Delays issues (see later)
27
LTI structures: parallel form (1)
• The parallel form structure of a LTI is obtained by expanding the
transfer function H(z) in partial fractions
M
k
b z k
N N N
*
Y z X z Ck z X z
p
A D D
1 2
k 0 k k k
k
N k 1 k 1 1 pk z
1
k 1 1 d k z
1
1 d k z
* 1
1 ak z k
k 1
where:
• N=N1+2N2 is the number of H(z) poles (N1 real, 2M2 complex)
• Np=M-N, but generally is equal to 0
• Ck results from the quotient between numerator and denominator
• Ak, Dk and D*k are the residuals of the k-th pole, i.e. in the case of
simple (1-order ) poles:
M
bk z
k
Ak lim 1 pk z k 0N
1
z p k
k
k
1 a z
k 1
28
LTI structures: parallel form - 2
• Similarly to the cascade case, the terms of two poles are usually
grouped together in order to have an only, real, 2-nd order expression
e0 k e1k z 1
Dk
Dk*
e0 k A1 A2 2 ReDk
e1k A1 p2 A2 p1 2 ReDk d k*
1 dk z 1
1 dk z
* 1
1 a1k z 1 a2 k z 1
a1k p1 p2 2 Red k
2
a2 k p1 p2 d k
Generally, M N N s N 1 / 2
Cost (max):
• 4Ns+ 1 MUL
Note: Parallel structures are seldom used
• 4 Ns+1 ADD
because require a final Ns-input adder and
they are not robust (see later) • 2 Ns Delays
29
LTI structures for FIR systems
• For FIR systems:
M
Y z bk z k X z
M
y n bk x n k
k 0 k 0
…
b0 b1 b2 bM bM bM-1 bM-2 b0
… z-1 z-1 z-1
Transp.
Direct Form Transposed Form
30
Effects of finite numerical precision
• As known, the finite numerical introduces errors in elementary
mathematical operations, i.e. additions and multiplications
31
Quantization of coefficients: overview
• Coefficients in transfer functions different than expected Transfer
function of the system different than specified risk for instability in IIR
systems
• In processors with large data paths (e.g. 32 or 64 bits) and in the
presence of floating point units the quantization effects are less relevant
Example:
Transfer function with complex poles at 0.4965±j0.8663
32
Quantization of coefficients - 1
• Goal: determine how quantization of coefficients affects
the position of zeros and poles. We will focus on poles.
The case of zeros is dual.
b0
M
a1
z-1
b1 bk z k
H z k 0
N
z-1
a2 b2 1 ak z k
k 1
z-1
…
• By using the Taylor series expansion for the denominator
of the transfer function around each pole pi it follows that:
N
pi Sensitivity of pi
pi ak to coefficient ak
k 1 ak
Position perturbation Quantization error
33
Quantization of coefficients - 2
• In general:
pi piN k
ak N
pi pl
l 1
l i
34
Quantization of coefficients: conclusions
• Transposed and direct forms: very sensitive to coefficients
the risk for instability is high for high-order system
35
Rounding noise
• Rounding noise can be modeled as a white noise and these
sources are connected to every quantization point (noise
analysis)
• Noise signal is processed in the system like any other
signal, i.e. it is colored and typically amplified at the output
• System may introduce high gain for the quantization noise
depending on the transfer function and algorithm structure
Example
Ideal case Real case
x[n] y[n] x[n] y[n]
z-1 z-1
a QB(a)
36
Rounding noise analysis
• The quantization effects of a software implementation can be
analyzed by using previously defined noise models for either
fixed or floating point (usually negligible)
37
Noise analysis example
x(n) z (n) = y(n)+f (n)
z-1
a Due to x(n) Due to e(n)
e(n)
Hypotheses
1. e(n) is a white random process and each r.v. e[n] is uniformly
distributed in [-Δ/2, Δ/2] (conv. rounding)
2. e(n) and x(n) are uncorrelated
Given that z (n) = y(n)+f (n) (linear system) it can be shown that,
compared to the ideal case, the SNR at the output is finite:
y2 2
1
SNR where: f
2
Colored noise:
2f 12 1 a 2 Critical when |a| 1
38
Noise analysis: a more involved example
ea(n) eb(n)
x(n) b0 y(n) x(n) b0 z(n) x(n) b0 z(n)
39
Noise analysis: the general case
• The numerical noise can also be analyzed through computer
simulations
• The input signal can be random or a sine wave
• The DSP algorithm is executed in two parallel branches:
– simulation: computation with the same word length and quantized
coefficients as the real implementation
– "precise” realization: computation with quantized coefficients and
with the highest possible precision (utilize double precision numbers)
• The difference between results is the numerical noise
40
Reducing numerical noise
• In order to minimize the numerical problems:
– the overflow probability has to be minimized
– the dynamics of the system has to be maximized
Risk of
overflow
k k
41
Scaling example
Risk of overflow
Ideal case
x(n) y(n) x(n) C k 1/C
z (n) = y(n)+f (n)
z-1 z-1
e2[n]
a a
e1[n]
If we assume that noise sources are merged together in k, we have that:
2
2
1
e2 2 2f 2
12
12 C 2 1 a 2
Therefore, the noise power at the output is increased by 1/C2, whereas
the power of the output useful signal is the same. In general:
43
Choice of the scaling factor
• Generally, norms are used
– It is easy to calculate the most probable maximum sample values for
each node for different type of input signals
– In order to calculate the scaling factor for node k we need the transfer
function Hk(z) (or the corresponding impulse response hk(n)) from the
input of the system to node k
Safe scaling
1 + the overflow probability is
If x(n)<1 C
very small after scaling
h n
n 0
k
- limits the dynamics
Lp norms
1
H k e
1 1
H k z
j
p p
+ better SNR
d C
p
2 H k z p
- overflow may occur
Usually p=2 or p=
44
Summary: a practical example
• Implement the following example cascade IIR filter
• The processor has 16-bit memory word width, multiplier produces full
precision results and the accumulator is 40-bit long
45
Summary: a practical example
• The difference equation for the direct form II structure, (e.g. 1° stage,
the expression for the other section is roughly the same) is:
w1(n) = a11w1(n-1) + a12w1(n-2) + x(n)
y(n) = b11w1(n)+b12w1(n-1) + b13w1(n-2)
…
• In the current example, quantization points are w1(n), w2(n), w3(n), and
possibly the output o(n) because data have to be stored in memory
whose width is 16 bits against 40 bits of the accumulator.
46
Solution 1: assembly implementation
.bss W,3*3
without scaling
;space for w(n)
.bss X,1
.data
COEF:
.word 266 ;S*2^15
.word -13682 ;a12*2^15
.word 5435 ;(a11-1)*2^15
.word 27601 ;(b22-1)*2^15
.word -26688 ;a22*2^15
...
• In the .bss section, some space for both input data and
critical (quantization) points is allocated
• In the .data section, the coefficients of the filter (mapped into
integers) are stored
• In the .text section (next slide), the body of the filter
implementation is stored
47
Solution 1: assembly implementation
.text without scaling
RSBX OVM ;overflow
SSBX SXM ;sign ext
SSBX FRCT ;autoshift after multiplier
...
STM #W+8,AR3
STM #COEF,AR4
STM #X,AR1
PORTR 0H,*AR1 ; read input
MPY *AR4+,*AR1,A ; A <= S*x(n) (x(n)=16-bit)
* feedback path
MAC *AR4+,*AR3-,A ; x(n)+a12*w1(n-2)
MAC *AR4+,*AR3,A ; x(n)+a12*w1(n-2)+(a1-1)*w11(n-1)
ADD *AR3-,16,A ; x(n)+a12*w1(n-2)+a1*w11(n-1)
STH A,*AR3+ ; w(n)=x(n)+a12*w1(n-2)+a11*w1(n-1)
* forward path
MAC *AR4+,*AR3,A ; w1(n)+(b12-1)*w1(n-1)
ADD *AR3+,16,A ; w1(n)+b12*w1(n-1)
ADD *AR3-,16,A ; y1=w1(n)+b12*w1(n-1)+w1(n-2)
DELAY *AR3- ; w1(n-2) <= w1(n-1)
DELAY *AR3- ; w1(n-1) <= w1(n)
... ; AR3->w2(n-2), AR4->a22, A=y1(n)
48
Solution 1: results
49
Solution 2: implementation with scaling
• In order to maximize the SNR, the signal levels at the
quantization points has to be maximized
• Inside each second-order structure of the filter for a given value
of the norm the scaling coefficients C1, C2 and C3 are
determined so that the chosen value of the norm is reached
• For example, assume that L∞=1 is the target value
First stage:
H1 z S
1
1 a11z 1 a12 z 1
L g1 H1 z 1
1
g1
H1 z
50
Solution 2: implementation with scaling
Second stage:
L g 2 H 2 z 1
1
g2
H 2 z
51
Solution 2: implementation with scaling
Third stage:
L g3 H 3 z 1
1
g3
H 3 z
52
Solution 2: implementation with scaling
S' = C1*S=g1 *S
b’1i = C2*b1i = (g2/g1)*b1i i = 1, 2, 3
b’2i = C3*b2i = (g3 /g2)*b2i i = 1, 2, 3
b’3i = b3i/(C1C2C3) = (1 /g3)*b3i i = 1, 2, 3
53
Solution 2: results
54