chapter_3 Performance Surface and Search Method

Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

2/19/2023

CHAPTER THREE
Ch3_Eigenanalysis and Performance Surface
Ch4_Search Methods
adaptive filters theory and applications
Behrouz Farhang-Boroujeny

EIGENVALUES AND EIGENVECTORS


 Let

 A nonzero N-by-1 vector q is said to be an eigenvector of R,


if it satisfies the equation

 for some scalar constant λ. The scalar λ is called the


eigenvalue of R associated with the eigenvector q.
 We note that if q is an eigenvector of R, then for any
nonzero scalar a, aq is also an eigenvector of R,
corresponding to the same eigenvalue, λ
 To find the eigenvalues and eigenvectors of R, we note that
Eq. (4.2) may be rearranged as 2

1
2/19/2023

2 0
Example: find eigenvalues and eigenvectors of 𝑅 = .
0 3
Solution:
2 0 1 0
det(𝑅 − 𝜆𝐼) = 0 ⇒ det( −𝜆 )=0
0 3 0 1
2−𝜆 0
det( ) = 0 ⇒ (2 − 𝜆)(3 − 𝜆)=0
0 3−𝜆
𝜆 = 2 𝑎𝑛𝑑 𝜆 = 3
For 𝜆 = 2
2−2 0 𝑥
(𝑅 − 𝜆𝐼)𝑞 = 0 ⇒ ( . 𝑦 =0
0 3−2
0∗𝑥+0∗𝑦 0
= ⇒∴ 𝑦 =0
0∗𝑥+1∗𝑦 0
𝒙 𝟏
𝒒𝟏 = 𝒘𝒉𝒆𝒓𝒆 𝒙 ≠ 𝟎 𝒇𝒐𝒓 𝒔𝒊𝒎𝒑𝒍𝒊𝒄𝒊𝒕𝒚 𝒒𝟏 =
𝟎 𝟎
At the same manner for For 𝜆 = 3
𝟎 𝟎
𝒒𝟐 = 𝒘𝒉𝒆𝒓𝒆 𝒚 ≠ 𝟎 𝒇𝒐𝒓 𝒔𝒊𝒎𝒑𝒍𝒊𝒄𝒊𝒕𝒚 𝒒𝟐 =
𝒚 𝟏 4

2
2/19/2023

Example 1 : Find the eigenvalues and the eigenvectors


1 3
of the matrix 𝐴 = .
4 2
Solution:
1 3 1 0 1 3  0
𝐴 − 𝜆. 𝐼 = − . = −
4 2 0 1 4 2 0 
1− 3
𝐴 − 𝜆. 𝐼 =
4 2−
In general to form A - I from A one simply subtracts 
from the diagonal entries.
∴The characteristic equation is
1− 3
𝐴 − 𝜆. 𝐼 = = 1− 2− −3∗4 = 0
4 2−

∴ 2 −  − 2 +  − 12 = 0
 − 3 − 10 = 0
So the eigenvalues are
5
1 = 5 and 2 = - 2

For 1 = 5
𝑥
Let 𝑉 = 𝑦 is the eigenvector of the matrix A
corresponding to 1 = 5.
∵ 𝐴 − 𝜆. 𝐼 . 𝑉 = 0
1 3 1 0 𝑥 0
∴ −5 . 𝑦 =
4 2 0 1 0
1−5 3 𝑥 0
. =
4 2−5 𝑦 0
−4𝑥 + 3𝑦 0
=
4𝑥 − 3𝑦 0
So
- 4x + 3y = 0

4x - 3y = 0 6

3
2/19/2023

Since the second equation is the negative of the


first, any solution to the first equation is also a
solution to the second. So it suffices to solve the
first equation.
3
𝑥= 𝑦
4

y 1 4 8 …..

x 3/4 3 6 ……

𝟑
So any multiple of the vector 𝑽𝟏 = is an eigenvector
𝟒
for 1 = 5.
7

For 2 = - 2
𝑥
Let 𝑉 = 𝑦 is the eigenvector of the matrix A
corresponding to 2 = -2.
∵ 𝐴 − .𝐼 .𝑉 = 0
1 3 1 0 𝑥 0
∴ +2 . 𝑦 =
4 2 0 1 0
1+2 3 𝑥 0
. =
4 2+2 𝑦 0
3𝑥 + 4𝑦 0
=
4𝑥 + 4𝑦 0
So
3x + 3y = 0

4x +4y = 0 8

4
2/19/2023

 Since the second equation is the multiple of the


first, any solution to the first equation is also a
solution to the second. So it suffices to solve the
first equation.
x = -y

y -1 1 2 …..
x 1 -1 -2 ……

1
any multiple of the vector 𝑉 = is an eigenvector for 2 = -2.
−1

PROPERTIES OF
EIGENVALUES AND EIGENVECTORS
 ome of the properties derived here are directly related
to the fact that the correlation matrix R is Hermitian
and nonnegative definite.
 A matrix A, in general, is said to be Hermitian (or
self-adjoint matrix) if 𝑨 = 𝑨 = 𝐴 .
 The N-by-N Hermitian matrix A is said to be
nonnegative definite or positive semidefinite, if :
𝑣 . 𝐴. 𝑣 ≥ 0
 The fact that A is Hermitian implies that 𝒗𝑯 . 𝑨. 𝒗 is
real-valued.
 the correlation matrix R is almost always
positive definite.

10

5
2/19/2023

1. The eigenvalues of the correlation matrix R are


all real and nonnegative.
2. If qi and qj are two eigenvectors of the
correlation matrix R that correspond
to two of its distinct eigenvalues, then:

In other words, eigenvectors associated with the


distinct eigenvalues of the correlation
matrix R are mutually orthogonal.
3. Assume the eigenvectors q0, q1, . . . , qN-1 are all
normalized to have a length of unity, and define the
N-by-N matrix 𝑄 = [𝑞 𝑞 … 𝑞 ] is then a unitary
matrix, i.e., 𝑄 𝑄 = 𝐼, This implies that the matrices
Q and 𝑄 are the inverse of each other. 11

4. For any N-by-N correlation matrix R, one can


always find a set of mutually orthogonal
eigenvectors. Such a set may be used as a basis
to express any vector in the N-dimensional
space of complex vectors.
5. Unitary Similarity Transformation. The
correlation matrix R can always be decomposed
as:

12

6
2/19/2023

6. Let λ0, λ1, . . . , λN−1 be the eigenvalues of the


correlation matrix R. Then,

where tr[R] denotes trace of R and is defined as the


sum of the diagonal elements of R.
7. Minimax Theorem:

for i = 1, 2, . . . , N - 1

13

8. The eigenvalues of the correlation matrix R of a


discrete-time stationary stochastic process {x(n)}
are bounded by the minimum and maximum
values of the power spectral density, Φ (𝑒 ),
of the process.

14

7
2/19/2023

9. Karhunen–Lo´eve expansion

15

THE CANONICAL FORM OF


THE ERROR-PERFORMANCE SURFACE

 We recall from the last lecture the performance


function {mean-squared error (MSE)} of a
transversal Wiener filter with a real-valued input
sequence x(n) and a desired output sequence d(n)
is

 Also, we recall that the optimum value of the


Wiener filter tap-weight vector is obtained from
the Wiener-Hopf equation
16

8
2/19/2023

 The performance function ξ may be


rearranged as follows:

17

 we use eigen-decomposition to express the


correlation matrix R of the tap-input vector in
terms of its eigenvalues and associated
eigenvectors (see Appendix E in Haykin Textbook)

18

9
2/19/2023

 This new formulation of the mean-square error


contains no cross-product terms, as shown by

 where vk is the kth component of the vector v.

19

 Example 4.3: Consider the case where a two-tap


transversal Wiener filter is characterized by the
following parameters:

We want to explore the performance surface of this


filter for values of α ranging from 0 to 1.
The performance function of the filter is obtained
by substituting the above parameters in Eq. (4.81).
This gives

20

10
2/19/2023

 Solving the Wiener–Hopf equation to obtain the


optimum tap weights of the filter, we obtain:

21

 To convert this to its canonical form, we should


first find the eigenvalues and eigenvectors of R.
To find the eigenvalues of R, we should solve the
characteristic equation

22

11
2/19/2023

23

24

12
2/19/2023

25

SEARCH METHODS
1. METHOD OF STEEPEST DESCENT

26

13
2/19/2023

 We recall from Chapter 2 that the optimum tap-


weight vector wo is the one that minimizes the
performance function

 where e(n) = d(n) - y(n) is the estimation error of the


Wiener filter. Also, we recall that the performance
function ξ can be expanded as:

 Here, we assume that R and p are available.


 however, this approach need difficult arithmetic
circuits (requiring the (computationally challenging)
inversion of the matrix Rx) not suitable for many
applications, therefore in the next section we
introduce another approach to find the tap-weight
vector w.
27

ITERATIVE SEARCH METHOD


 In this chapter we present a set of algorithms that
iteratively search for the minimum of the cost function.
 They do this based (at least) on the gradient of the cost
function, so they are often called deterministic gradient
algorithms.
 In order for the cost function to depend only on the filter w,
the statistics Rx and rxd must be given.
 In this way, these algorithms solve the Wiener-Hopf
equation iteratively, most of them without requiring the
(computationally challenging) inversion of the matrix Rx.
 However, all the information from the environment is
captured in the second-order statistics and these
algorithms do not have a learning mechanism for adapting
to changes in the environment. In the next chapter we will
see how adaptive filters solve this issue. 28

14
2/19/2023

STEEPEST DESCENT ALGORITHM

 The method of steepest descent is a general scheme


that uses the following steps to search for the
minimum point of any convex function of a set of
parameters:
1. Start with an initial guess of the parameters whose
optimum values are to be found for minimizing the
function.
2. Find the gradient of the function with respect to
these parameters at the present point.
3. Update the parameters by taking a step in the
opposite direction of the gradient vector obtained in
Step 2. This corresponds to a step in the direction of
steepest descent in the cost function at the present
point. Furthermore, the size of the step taken is
chosen proportional to the size of the gradient
vector.
4. Repeat Steps 2 and 3 until no further significant 29
change is observed in the parameters.

 To implement this procedure in the case of the


transversal filter shown in Figure 5.1, we recall
from Chapter 2 that

30

15
2/19/2023

 As we shall soon show, the convergence of w(k) to


the optimum solution wo and the speed at which
this convergence takes place are dependent on
the size of the step-size parameter μ.
 A large step-size may result in divergence of this
recursive equation

 where I is the N-by-N identity matrix.

31

THE V-AXES

 we substitute for p from Eq. (5.6). Also,we


subtract wo from both sides of Eq. (5.11) and
rearrange the result to obtain

 This is the tap-weight update equation in terms


of the v-axes
32

16
2/19/2023

THE V’-AXES
 Recall that R has the following unitary similarity
decomposition

33

 The vector recursive Eq. (5.18) may be separated


into the scalar recursive equations

 the step-size parameter μ is selected so that

34

17
2/19/2023

35

Starting with an initial value w(0) = [w0(0) w1(0)]T and letting the
recursive equation (5.29) to run, we get two sequences of the tap-weight
variables w0(k) and w1(k). 36

18
2/19/2023

LEARNING CURVE

 The curve obtained by plotting ξ(k) as a function


of the iteration index, k, is called learning curve.
A learning curve of the steepest-descent
algorithm, as can be seen from Eq. (5.31),
consists of a sum of N exponentially decaying
terms, each of which corresponds to one of the
modes of convergence of the algorithm.

 Each exponential term may be characterized by a


time constant, which is obtained as follows.
37

38

19
2/19/2023

39

The existence of two distinct time constants on the learning curve in


Figure 5.5 is clearly observed.
40

20
2/19/2023

41

42

21
2/19/2023

EFFECT OF EIGENVALUE SPREAD


 Our study in the last two sections shows that the
performance of the steepest-descent algorithm is highly
dependent on the eigenvalues of the correlation matrix R.
 In general, a wider spread of the eigenvalues results in a
poorer performance of the steepest-descent algorithm.
 To gain further insight into this property of the steepest-
descent algorithm, we find the optimum value of the step-
size parameter μ, which results in the fastest possible
convergence of the steepest-descent algorithm.

43

THE GEOMETRICAL RATIO FACTOR

44

22
2/19/2023

NEWTON’S METHOD
 Our discussions in the last few sections show that
the steepest-descent algorithm may suffer from
slow modes of convergence, which arise due to
the spread in the eigenvalues of the correlation
matrix R.
 This means that if we can somehow get rid of the
eigenvalue spread, we can get a much better
convergence performance. This is exactly what
Newton’s method does.

45

 To derive Newton’s method for the quadratic


case, we start from the steepest descent
algorithm given in Eq. (5.10). Using p = Rwo, Eq.
(5.10) becomes:

46

23
2/19/2023

47

48

24

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy