6-Structures For DSP

Download as pdf or txt
Download as pdf or txt
You are on page 1of 54

Systems and Techniques for

Digital Signal Processing


6 – System analysis in the Z-domain
and structures of LTI systems
• Z-transform definition and basic properties
• Block diagrams and signal flow (or data flow) Graphs
• Basic structures for LTI systems implementation
• Direct forms
• Transposed forms
• Parallel and cascades forms
• Finite-precision numerical effects on LTI structures
1
Definition of Z-transform
The bilateral Z-transform of a discrete-time sequence is defined as:
 
X z   Z x n  
n  
x n z n

where z is a complex variable. The inverse Z-transform is instead defined as

x n   Z X z   X z z n 1dz
1 1
2j C

where C is a counterclockwise closed path encircling the origin and entirely


in the region of convergence (ROC) of the transform

The ROC is the set of points in the complex plane for which the Z-transform
converges, i.e.


 

ROC   z :


x n z   
n


 n   
2
Regions of convergence (ROC)
The ROC of a Z-transform can be the empty set. Also, different ROCs can
be associated with the same Z-transform. This means that the Z-transform
is uniquely defined when and only when the ROC is specified.
EXAMPLES
1. Given the sequence x(n)=an (|a|<1) for n=- ∞,…, + ∞
the Z-transform does not converge. Unit circle
Im{z} |a|-radius circle

2. Given the causal sequence x(n)=an u(n) (|a|<1) ROC ROC


the Z-transform converges for |z|>|a|.
 Re{z}

 az 

X z    1 n 1
a n z n   for z  a ROC ROC
n 0 n 0 1  az 1
Im{z} |a|-radius circle
Unit circle
3. Given the anticausal sequence x(n)=an u(-n-1)
(|a|<1) the Z-transform converges for |z|<|a|.

 a z   1  1  az
1  ROC
X z    n n 1 k 1
a z  1
for z  a Re{z}

n   n 0
In (2) and (3) Z-transforms are equal, but ROCs are complementary sets
3
Z-transform vs. DTFT relationship

If the unit circle belongs to ROC: X z  z e j    
x n e  jn  X e j DTFT
n  

Unit circle Im{z}

DC (i.e. zero) frequency)

Re{z}
High (i.e. Nyquist)
frequency - π

• If the unit circle belongs to the ROC of the Z-transform the DTFT certainly
exists.

• However, in the case of finite-energy sequences a (discontinuous) DTFT


may exist even if the unit circle does not belong to the ROC.

4
Summary of Z-transform properties - 1
• Linearity
If x(n) has a z-transform X(z) with a region of convergence Rx, and
if y(n) has a z-transform Y(z) with a region of convergence Ry,

w(n)  ax(n)  by(n) 


Z
W ( z)  aX ( z)  bY ( z)
and the ROC of W(z) will include the intersection of Rx and Ry.
• Shifting property
If x(n) has a z-transform X(z), x(n  n0 ) 
Z
z  n0 X ( z )
• Time reversal
If x(n) has a z-transform X(z) with a region of convergence Rx that
is the ring   z   the z-transform
x(n) 
Z
X ( z 1 )
has a region of convergence 1 / Rx  1   z  1 

5
Summary of Z-transform properties - 2
• Multiplication by an exponential
If a sequence x(n) is multiplied by a complex exponential αn.
 n x(n) 
Z
X ( 1 z)
• Convolution theorem
If x(n) has a z-transform X(z) with a region of convergence Rx, and if
h(n) has a z-transform H(z) with a region of convergence Rh, then

y(n)  x(n)  h(n) 


Z
Y ( z)  X ( z) H ( z)

The ROC of Y(z) will include the intersection of Rx and Rh,

• Conjugation
If X(z) is the z-transform of x(n), the z-transform of x*(n) is

x  (n) 
Z
X  (z )
6
Summary of Z-transform properties - 3
• Derivative
If X(z) is the z-transform of x(n), the z-transform of nx(n) is

dX ( z )
nx(n)   z
Z

dz
x(n)
• Initial value theorem
If X(z) is the z-transform of x(n) and x(n) is equal to zero for n<0,
the initial value x(0), may be found from X(z) as follows:

x(0)  lim X ( z )
z 

7
Summary of basic Z-transforms

8
Transfer function
As we know, in the time-domain an LTI system can be represented in
two equivalent forms, i.e.

• Impulse response: Transfer function




 h   n  i 
Z
h n   i y n   xn * hn  Y z   H z  X z 
i 0

• Constant-coefficient finite difference equation:


N M N M
 a yn  k    b xn  k 
k 0
k
k 0
k
Z
 ak Y z z k   bk X z z k
k 0 k 0
zeros
• The transfer function is generally rational with

Y z 
M
k real-valued coefficients
  bk z
H z  
• The coefficients are the same as the difference
 k 0

X z  equation
N
k
 ak z • If the ROC of H(z) includes the unit circle, H(z)
k 0
poles for z=ejω is the Freq. Response of the system
9
Partial fraction decomposition
If M ≥ N and all poles are simple H z   Q z  F z      Q z  DRzz
M N N

B z 1 p z
r Ak
 r  1
r 0 k 1 k


where Ak  F z  1  z 1pk  z  pk
is the residue associated with the kth pole

Z-1
M N N
x n    r 0
Br n  r   
k 1
Ak pkn un 

If all coefficients of the transfer function are real, all poles and residues
are either real numbers or complex conjugate pairs. For each complex
j  j
conjugate pair of poles (i.e. if pk  re 0 and pk  re 0 ) we have that:
*

Ak* Z-1
2 Ak r n cos 0n  ArgAk un 
Ak

1  pk z 1 1  p*k z 1
10
Stability criterion in the Z-domain
N
• The term F z   
Ak
1 of a transfer function associated to a
k 1 1  p k z

causal impulse response has a ROC given by the intersection of the


ROCs of each individual term, i.e.

ROC   ROC k  max pk


k 1,...,N
k 1,...,N

• Theorem: an LTI causal system is stable if and only if all poles are
inside the unit circle
Proof: 
If a system is stable and causal, then  hn     hn   B  
n 0
. Then,
  

 hn z  hn  z 
n n n
 B z    z 1
n 0 n 0 n 0

If all poles are inside the unit circle. Then, ROC  max pk  1 
k 1,...,N

n 0
hn z n  

11
Special cases
M
• If N=0 (no poles) and a0=1  H z   
k 0
bk z k  All-zero transfer function

All-zero transfer functions represent FIR systems. Since there are no


poles (the poles in 0 should not be taken into account), the all-zero
systems are inherently stable.

• If M=0 (no zeros) and a0=1  H z   N


1
k
 All-pole transfer function
1  ak z
k 0

All-pole transfer functions (the zeros in 0 can be neglected) represent IIR


systems. A system of this type is stable if and only if all poles are inside
the unit circle.
Any causal and stable IIR system (i.e. with hn   0 ) can be well
n
approximated by a FIR system with a truncated impulse response
h n  nM Always stable, but requires a large M to have
ht n    a good approximation: larger computational
0 n M complexity than IIR recursive implementation

12
Inverse systems
x(n) z(n) y(n)
H1(z) H2(z)

Y z   H1 z H 2 z X z 
• Def. We say that H2(z) is the inverse of H1(z) if and only if

H1 z H 2 z   1  H 2 z  
1
H1 z 
• For H2(z) to be stable, its poles (namely the zeros of H1(z)) must be
within the unit circle.
• A causal and stable LTI system has an inverse which is also causal
and stable if and only if both poles and zeros lie inside the unit circle.
This kind of systems are called Minimum Phase Systems

13
Differences in LTI implementations
For a causal LTI system, the processing complexity depends on
algorithm implementation

• Impulse response: y n    hk  xn  k 
k 0
N. of ADD & MUL tends to infinity when input sequence x(n)
M N
      y n  k 
bk ak
• Difference equation: y n  x n  k 
a
k 0 0
a
k 1 0

It requires exactly M+N operations + a preliminary scaling of all


coefficients by a0
• Transfer function: Y z   H z   X z 
Requires a Z-transform and an inverse Z-transform.
Generally not profitable

14
Block diagrams and signal flow graphs
• Usually, there are two equivalent representations for
representing graphically an LTI system:
– Block diagrams
– Signal flow (or data flow graphs)
• They are very useful in DSP algorithm design & modelling, i.e.
before HW or SW implementation
Example
Block diagram Signal flow graph
2-input ADD
+
Delay unit
2-input MUL
z-1
z-1 a
x
a
15
Properties of the
signal flow graphs: basics
Graph representation is simpler than block diagram and takes
advantages of graph theory properties
• Nodes with multiple inputs (usually two) and 1 output are
adders
• Nodes with 1 input and multiple outputs are branches
• Each edge of the graph is provided with a direction and a value
called transference:
– If transference is a number multiplication
– If transference is z-1 delay element
– If transference is not expressed means 1

16
Properties of the
signal flow graphs: feedback
• Definition: For a given oriented graph G, a loop can be defined
as a subset of G in which:
– Each node is associated exactly with two edges
– The direction of all edges is the same

• Property: If a given LTI system is an IIR system, then every


possible signal flow graph representing that system contains at
least 1 loop IIR feedback may be unstable

• Note: This condition is not sufficient. There are some feedack


systems that are not IIR, e.g. moving average recursive
implementation

17
Properties of the signal flow graphs:
transposition
Property: For a given oriented graph modelling an LTI system, the new
graph obtained following the two following rules:
– Invert the direction of each edge without changing the trasference values;
– Exchange input and output
is perfectly equivalent to the original graph (the H(z) is the same)

Example
x[n] y[n] x[n] y[n]
z-1 z-1
a T a

Important: Due to transposition property is possible to achieve


further equivalent solutions which are different from the
computational point of view
18
Signal flow graph analysis
• Analyze a signal flow graph means writing the difference equation
corresponding to the graph on node-by-node basis
Steps
1. Associate a variable to each node
2. Write an equation for each node (except for the source)
3. Combine multiple equations until you reach an equation in which
there are only the input and output variables
Example
x[n] w[n] z[n] y[n]
1 2 3 z-1 5

2. wn  xn  a  un a


4
u[n]
3. zn  wn yn  zn  wn  xn  a  un 
4. un  zn  1  xn  a  yn  1
5. yn  zn Difference equation
19
Computability of a graph
• Definition: A signal flow graph is referred to as computable if and only
if it is possible to compute all the variables of the graph according to a
specified order starting from given initial conditions
• Property: A signal flow graph is computable if and only if each loop in
the graph contains at least 1 delay element
Example

x[n] w[n] z[n] y[n] x[n] w[n] z[n] y[n]


1 2 3 z-1 5 1 2 3 4
a
2. wn  xn  a  zn
a
3. zn  wn
In the previous graph an order exist:
Eq. 4, 2, 3, 5 computable 4. yn  zn
Impossible to decide whether to compute 2
before 3 or vice versa not computable

20
Why computability is important?
• Graph computability is essential in DSP applications because only
systems that can be modelled with computable graph can be
implemented as SW algorithms, i.e. as routines in DSPs.

• A large number of DSP algorithms (i.e. IIR filters, FIR filters, digital
resonators) not necessarily linear can be implemented by means of
difference equations
Example: a delta-sigma… resonator
int Delta_sigma_resonator (void){
int n;
for (n=0; n<=100; n++){
(v1[j-1] >= 0.0) ? y[j]=1.0 : y[j] =-1.0;
x2[j]=x2[j-1]-a21*y[j];
x1[j]=x1[j-1]+a12*o[j];
C u1[j]=u1[j-1]+s0[j]-y[j];
code v1[j]=v1[j-1]+u1[j]-y[j];
}
3 nested loops, but it is computable return;}
21
LTI structures: direct form I
• The Direct Form I LTI structure results immediately from the numerator
and the denominator of the transfer function expressed as polynomials
(hp. a0=1) M N

M
y n   b xn  k    a yn  k 
k k
 
b z
k 0 k 1
k  
k
M   
Y z   X z   H numz  H den z  X z     X z   
k 0  k  1
bk z 
N   
N

1  ak z k  k 0 
 1   a k z k

k 1  k 1 
b0
x(n) y(n)
z-1 Cost:
b1 z-1
a1
• M+N+1 MUL
z-1 z-1
b2 a2 • M+N ADD

… • M+N Delays
M-size ladder N-size ladder
22
LTI structures: direct form II (canonical)
• Direct Form II can be obtained simply by reversing the numerator transfer
function and the denominator transfer function.
• In this way the delay lines can be shared and the number of memory
location is roughly halved, i.e.:
M

 bk z k
Y z   k 0
N
X z   H numz  H den z  X z   H den z  H numz  X z 
1 
k 1
ak z  k

x[n] b0 y[n]
z-1 Cost:
a1 b1
• M+N+1 MUL
z-1
a2 b2 • M+N ADD
z-1 • max(M,N)
… Delays

23
LTI structures: transposed form II
b0
It results from direct form II by
transposing the signal flow graph z-1
a1 b1
1. Reverse IN, OUT and edge direction z-1
a2 b2
2. Replace adder with branches and vice versa
z-1
3. Re-order nodes …
T
b0

Cost: b1 z-1 a
1
• M+N+1 MUL
b2 z-1 a
2
• M+N 2-term ADD
z-1
• max(M,N) Delays …
Transposed forms are naturally pipelined-oriented: i.e. much more
suitable for HW implementations, because long data-paths are broken by
registers into shorter data paths
24
LTI structures: cascade form (1)
• LTI cascade structure results from the decomposition of both numerator
and denominator polynomials in a product of first-or second-order
polynomials.
• Each polynomial is implemented as a Direct I or Direct II form

1  g k z   1  hk z 1 1  hk* z 1 
M M1 M2

 bk z k
 1

Y z   k 0
X z   b0 k 1 k 1
 X z 
1  ck z 1   1  d k z 1 1  d k* z 1 
N M1 M2
1   ak z  k 
k 1 k 1 k 1

where:
M=M1+2M2 is the number of H(z) zeros (M1 real, 2M2 complex)
N=N1+2N2 is the number of H(z) poles (N1 real, 2N2 complex)

It is equivalent to cascade 1-order systems

25
LTI structures: cascade form (2)
• In general, in DSP applications it is profitable to have modular
structures (i.e. regular compositions of similar blocks) working on
real (i.e. not complex) data.

• It is better to group pairs of complex poles and zeros together,


thus turning the structure into a cascade of 2-order systems
instead of 1-order systems, i.e. (if M N )

N 
Y z   b0   H k z   X z  N s  N  1 / 2
s

 k 1 
b1k  z1  z2   2 Rehk 
where: ~

H k z  
1  hk z 1  hk z   1  b1k z  b2k z
1 * 1 ~ 1 ~  2 ~
b2 k  z1 z2  hk
2

1  d k z 1 1  d k* z 1  1  a~1k z 1 a~ 2k z 2 ~  p  p  2 Red 


a1k 1 2 k
Biquad a~2 k   p1 p2   d k
2

26
LTI structures: cascade form (3)
• The cascade structure consists of Ns Biquad structures that are
usually implemented using a Direct II form, and with the common gain
b0 distributed among all terms, i.e. b0= b01 b02… b0Ns
~
b1k  b1k b0 k b2 k  b~2 k b0 k
b0 k  b1k z 1  b2 k z 2
H k z  
1  a~1k z 1 a~ 2 k z  2
• If M=N (even) Ns=N/2 and all coefficients are different from 0
• If M=N (odd) or M<N, Ns=N/2 with some Biquad coefficients in some
sections are equal to 0

Cost (max):
• 5 Ns MUL Number of MULs in worst
case is 25% higher but it is
• 4 Ns ADD
more robust to finite precision
• 2 Ns Delays issues (see later)
27
LTI structures: parallel form (1)
• The parallel form structure of a LTI is obtained by expanding the
transfer function H(z) in partial fractions
M

 k
b z k
N N N
 *

Y z   X z    Ck z      X z 
p
A D D
   
1 2
k 0 k k k
 k
N  k 1 k 1 1  pk z
1
k 1  1  d k z
1
1  d k z  
* 1
1   ak z  k 
k 1
where:
• N=N1+2N2 is the number of H(z) poles (N1 real, 2M2 complex)
• Np=M-N, but generally is equal to 0
• Ck results from the quotient between numerator and denominator
• Ak, Dk and D*k are the residuals of the k-th pole, i.e. in the case of
simple (1-order ) poles:
 M 
  bk z
k

Ak  lim 1  pk z  k 0N
1  
z p  k 
  k

k
1 a z 
 k 1 
28
LTI structures: parallel form - 2
• Similarly to the cascade case, the terms of two poles are usually
grouped together in order to have an only, real, 2-nd order expression
e0 k  e1k z 1
Dk

Dk*
 e0 k  A1  A2  2 ReDk 
e1k   A1 p2  A2 p1  2 ReDk d k* 
1  dk z 1
1  dk z
* 1
1  a1k z 1  a2 k z 1
a1k  p1  p2  2 Red k 
2
a2 k   p1 p2   d k

Generally, M N N s  N  1 / 2

Cost (max):
• 4Ns+ 1 MUL
Note: Parallel structures are seldom used
• 4 Ns+1 ADD
because require a final Ns-input adder and
they are not robust (see later) • 2 Ns Delays

29
LTI structures for FIR systems
• For FIR systems:
M

 Y z    bk z  k X z 
M
y n   bk x n  k 
k 0 k 0

– the direct I and II form coincide (tapped delay line);


– the cascade form is possible but not very used
– the parallel form does not exist (no poles)
– the transposed form has the same computational complexity as the
direct form, but it is much more profitable for hardware implementations


b0 b1 b2 bM bM bM-1 bM-2 b0
… z-1 z-1 z-1
Transp.
Direct Form Transposed Form
30
Effects of finite numerical precision
• As known, the finite numerical introduces errors in elementary
mathematical operations, i.e. additions and multiplications

• Such numerical issues tend to propagate when long sequences


of operations are executed, like in filter or transformations

• Finite precision limitations can be divided into 3 groups:


– quantization of coefficients
– quantization (rounding) of intermediate results
– overflows

• Finite precision limitations have to be analyzed separately to


understand how they affect a DSP algorithm

31
Quantization of coefficients: overview
• Coefficients in transfer functions different than expected Transfer
function of the system different than specified risk for instability in IIR
systems
• In processors with large data paths (e.g. 32 or 64 bits) and in the
presence of floating point units the quantization effects are less relevant
Example:
Transfer function with complex poles at 0.4965±j0.8663

Coefficient truncation with 3 bits after the comma:

Transfer function with two poles at 1.08! unstable

32
Quantization of coefficients - 1
• Goal: determine how quantization of coefficients affects
the position of zeros and poles. We will focus on poles.
The case of zeros is dual.
b0
M
a1
z-1
b1  bk z k
H z   k 0
N
z-1
a2 b2 1   ak z  k
k 1
z-1

• By using the Taylor series expansion for the denominator
of the transfer function around each pole pi it follows that:
N
pi Sensitivity of pi
pi    ak to coefficient ak
k 1 ak
Position perturbation Quantization error
33
Quantization of coefficients - 2
• In general:
pi piN  k

ak N
  pi  pl 
l 1
l i

• where |pi-pl| represents the distance between poles in


the complex plane

• When very selective filters are considered, poles are


very close (clusters of poles) high sensitivity

• A similar formula holds also for zeros, but it is less


critical, i.e. in the worst-case it modifies the expected
transfer function of the filter

34
Quantization of coefficients: conclusions
• Transposed and direct forms: very sensitive to coefficients
the risk for instability is high for high-order system

• Special cases: 1-order and 2-order filters are robust

• Parallel form: The position of pairs of poles depend only on


the coefficients of each individual section, but the position of
zeros is difficult to control

• Cascade form: Each section is indipendent of the others.


Finite precision affects only zeros and poles of each
individual section. Performance can be kept under control.

35
Rounding noise
• Rounding noise can be modeled as a white noise and these
sources are connected to every quantization point (noise
analysis)
• Noise signal is processed in the system like any other
signal, i.e. it is colored and typically amplified at the output
• System may introduce high gain for the quantization noise
depending on the transfer function and algorithm structure
Example
Ideal case Real case
x[n] y[n] x[n] y[n]
z-1 z-1
a QB(a)
36
Rounding noise analysis
• The quantization effects of a software implementation can be
analyzed by using previously defined noise models for either
fixed or floating point (usually negligible)

• Normally, quantization occurs either when an intermediate result


is stored or when a MAC operation is performed
1. First step is to define the locations where the quantization is
performed (quantization point)
2. A white noise source is added at every quantization point

37
Noise analysis example
x(n) z (n) = y(n)+f (n)
z-1
a Due to x(n) Due to e(n)
e(n)
Hypotheses
1. e(n) is a white random process and each r.v. e[n] is uniformly
distributed in [-Δ/2, Δ/2] (conv. rounding)
2. e(n) and x(n) are uncorrelated

Given that z (n) = y(n)+f (n) (linear system) it can be shown that,
compared to the ideal case, the SNR at the output is finite:

 y2 2
1
SNR  where:  f 
2
 Colored noise:
 2f 12 1  a 2 Critical when |a| 1

38
Noise analysis: a more involved example
ea(n) eb(n)
x(n) b0 y(n) x(n) b0 z(n) x(n) b0 z(n)

z-1 z-1 z-1


a1 b1 a1 b1 a1 b1

z-1 z-1 z-1


a2 b2 a2 b2 a2 b2

z-1 z-1 z-1


… … …
Ideal Linear model: Noise contributions
direct II form A noise source for each MUL are grouped together due
to the linearity of the graph
N 
2
ea n   e
k 1
ak n  z (n) = y(n)+fa (n)+fb (n)  2f a N 
 h 2 n 
12 n 0
N
2
eb n   e
k 1
bk n  Due to x(n) Due to ea(n) Due to eb(n)  2f b  M  1 
12
 f 
2
 2f a   2fb SNR deteriorates linearly as the filter order grows

39
Noise analysis: the general case
• The numerical noise can also be analyzed through computer
simulations
• The input signal can be random or a sine wave
• The DSP algorithm is executed in two parallel branches:
– simulation: computation with the same word length and quantized
coefficients as the real implementation
– "precise” realization: computation with quantized coefficients and
with the highest possible precision (utilize double precision numbers)
• The difference between results is the numerical noise

40
Reducing numerical noise
• In order to minimize the numerical problems:
– the overflow probability has to be minimized
– the dynamics of the system has to be maximized

• Scaling of the internal signals:


– A scaling factor C is inserted so that the signal levels inside the curve
reach the desired value
– A factor 1/C is added to output branches to compensate the overall effect
– The power of the noise is multiplied by 1/C 2 If C<1 the overflow is
avoided, but the quantization noise increases worse SNR

Risk of
overflow
k k

41
Scaling example
Risk of overflow
Ideal case
x(n) y(n) x(n) C k 1/C
z (n) = y(n)+f (n)
z-1 z-1
e2[n]
a a
e1[n]
If we assume that noise sources are merged together in k, we have that:

x(n)+e(n) = x(n)+ e1(n)+e2(n) z(n) = y(n)+f (n)

2
2
1
 e2  2   2f  2  
12 
12 C 2  1  a 2 
Therefore, the noise power at the output is increased by 1/C2, whereas
the power of the output useful signal is the same. In general:

SNR'  C 2  SNR  SNR


42
Choice of the scaling factor
• Despite of the possible decrease in SNR, scaling is essential in
fixed-point algorithms

• Two questions are left:


1. Where to insert the scaling factor C
2. How large C must be

1. Summarizing, the scaling has to be used:


• before a major quantization is performed, to avoid an excessive
loss of resolution (C>1)
• before a node where an overflow may occur (C<1). If
2’complement notation is used only the final stage of the
sequence of addition must be scaled.

2. The value of C depends on the amplitude of the internal signals

43
Choice of the scaling factor
• Generally, norms are used
– It is easy to calculate the most probable maximum sample values for
each node for different type of input signals

– In order to calculate the scaling factor for node k we need the transfer
function Hk(z) (or the corresponding impulse response hk(n)) from the
input of the system to node k

Safe scaling
1 + the overflow probability is
If x(n)<1 C 
very small after scaling

 h n 
n 0
k
- limits the dynamics

Lp norms
1

 H k e 
 1  1
H k z 
  j
p p
+ better SNR
 d  C
p
 2  H k z  p
- overflow may occur
Usually p=2 or p=
44
Summary: a practical example
• Implement the following example cascade IIR filter
• The processor has 16-bit memory word width, multiplier produces full
precision results and the accumulator is 40-bit long

Coefficients of the filter before quantization and scaling

45
Summary: a practical example
• The difference equation for the direct form II structure, (e.g. 1° stage,
the expression for the other section is roughly the same) is:
w1(n) = a11w1(n-1) + a12w1(n-2) + x(n)
y(n) = b11w1(n)+b12w1(n-1) + b13w1(n-2)

• In the current example, quantization points are w1(n), w2(n), w3(n), and
possibly the output o(n) because data have to be stored in memory
whose width is 16 bits against 40 bits of the accumulator.

• Let us compare the implementation without and with scaling to realize


the benefits of scaling

46
Solution 1: assembly implementation
.bss W,3*3
without scaling
;space for w(n)
.bss X,1
.data
COEF:
.word 266 ;S*2^15
.word -13682 ;a12*2^15
.word 5435 ;(a11-1)*2^15
.word 27601 ;(b22-1)*2^15
.word -26688 ;a22*2^15
...
• In the .bss section, some space for both input data and
critical (quantization) points is allocated
• In the .data section, the coefficients of the filter (mapped into
integers) are stored
• In the .text section (next slide), the body of the filter
implementation is stored
47
Solution 1: assembly implementation
.text without scaling
RSBX OVM ;overflow
SSBX SXM ;sign ext
SSBX FRCT ;autoshift after multiplier
...
STM #W+8,AR3
STM #COEF,AR4
STM #X,AR1
PORTR 0H,*AR1 ; read input
MPY *AR4+,*AR1,A ; A <= S*x(n) (x(n)=16-bit)
* feedback path
MAC *AR4+,*AR3-,A ; x(n)+a12*w1(n-2)
MAC *AR4+,*AR3,A ; x(n)+a12*w1(n-2)+(a1-1)*w11(n-1)
ADD *AR3-,16,A ; x(n)+a12*w1(n-2)+a1*w11(n-1)
STH A,*AR3+ ; w(n)=x(n)+a12*w1(n-2)+a11*w1(n-1)
* forward path
MAC *AR4+,*AR3,A ; w1(n)+(b12-1)*w1(n-1)
ADD *AR3+,16,A ; w1(n)+b12*w1(n-1)
ADD *AR3-,16,A ; y1=w1(n)+b12*w1(n-1)+w1(n-2)
DELAY *AR3- ; w1(n-2) <= w1(n-1)
DELAY *AR3- ; w1(n-1) <= w1(n)
... ; AR3->w2(n-2), AR4->a22, A=y1(n)

48
Solution 1: results

The SNR with given input


signal is 57.4 dB

49
Solution 2: implementation with scaling
• In order to maximize the SNR, the signal levels at the
quantization points has to be maximized
• Inside each second-order structure of the filter for a given value
of the norm the scaling coefficients C1, C2 and C3 are
determined so that the chosen value of the norm is reached
• For example, assume that L∞=1 is the target value

First stage:

H1 z   S
1
1  a11z 1  a12 z 1
L  g1 H1 z    1
1
g1 
H1 z  

50
Solution 2: implementation with scaling
Second stage:

b11  b12 z 1  b13z 2


H 2 z   S
1
1 1

1  a11z  a12 z 1  a21z 1  a22 z 1

L  g 2 H 2 z    1
1
g2 
H 2 z  

51
Solution 2: implementation with scaling
Third stage:

Similarly, to previous conditions


b11  b12 z 1  b13z 2 b21  b22 z 1  b23z 2
H 2 z   S
1
1 1
 1 1

1  a11z  a12 z 1  a21z  a22 z 1  a31z 1  a32 z 1

L  g3 H 3 z    1
1
g3 
H 3 z  

52
Solution 2: implementation with scaling

S' = C1*S=g1 *S
b’1i = C2*b1i = (g2/g1)*b1i i = 1, 2, 3
b’2i = C3*b2i = (g3 /g2)*b2i i = 1, 2, 3
b’3i = b3i/(C1C2C3) = (1 /g3)*b3i i = 1, 2, 3
53
Solution 2: results

By replacing the new


coefficient values in the
assembly program, the output
SNR with the given input signal
becomes 82.0 dB, i.e. > 25 dB
larger than without scaling

54

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy