NN LMS DR Gamal PDF

CSC445: Neural Networks
CHAPTER 03
THE LEAST-MEAN SQUARE
ALGORITHM
Prof. Dr. Mostafa Gadal-Haqq M. Mostafa
Computer Science Department
Faculty of Computer & Information Sciences
AIN SHAMS UNIVERSITY
(most of figures in this presentation are copyrighted to Pearson Education, Inc.)

Model Building Through Regression
 Introduction
 Filtering Structure of the LMS Algorithm
 Unconstrained Optimization: A Review
 Method of Steepest Descent
 Newton’s Method
 Gauss-Newton Method
 The Least-Mean Square Algorithm
 Computer Experiment
ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq 2

Introduction
 In Least-Mean Square (LMS) , developed by Widrow
and Hoff (1960), was the first linear adaptive-
filtering algorithm (inspired by the perceptron) for
solving problems such as prediction:
 Some features of the LMS algorithm:

 Linear computational complexity with respect to adjustable
parameters.
 Simple to code
 Robust with respect to external disturbance

Filtering Structure of the LMS Algorithm
 Unknown dynamic system described by:
Τ : x(i), d (i), i  1,...,n,...

where
x  [ x1(i), x2 (i),...,xM (i)]T
 M is the input dimensionality
 The stimulus vector x(i) can arise in
either way of the following:
 Snapshot data, The M input elements of x
originate at different point in space
 Temporal data, The M input elements of x
represent the set of present and (M-1) past
values of some excitation.
Figure 3.1 (a) Unknown dynamic system. (b) Signal-flow graph of adaptive model
for the system; the graph embodies a feedback loop set in color.
 The problem addressed is “how to design a multi-
input-single-output model of the unknown dynamic
system by building it around a single linear neuron”
 Adaptive filtering (system identification):
 Start from an arbitrary setting of the adjustable weights.
 Adjustment of weights are made on continuous basis.
 Computation of adjustments to the weight are completed inside
one interval that is one sampling period long.
 Adaptive filter consists of two continuous processes:
 Filtering process: computation of the output signal y(i) and the
error signal e(i).
 Adaptive process: The automatic adjustment of the weights
according to the error signal.
 These two processes together constitute a feedback
loop acting around the neuron.
 Since the neuron is linear:
M
y (i )  v(i )   wk (i ) xk (i)  y (i)  wT (i )x(i )
k 1
where
w(i)  [ w1(i), w2 (i),...,wM (i)]T
The error signal
e(i)  d (i)  y(i)
is determined by the cost function used to derive the
adaptive-filtering algorithm, which is closely
related to optimization process.
Unconstrained Optimization: A Review
 Consider the cost function E (w) that is continuously
differentiable function of unknown weights w. we need to find
the optimal solution w* that satisfies:
E (w*)  E (w)
 That is, we need to solve the unconstrained-optimization
problem stated as:
“Minimize the cost function E (w) with respect to the weight
vector w”
 The necessary condition for optimality:
E (w*)  0
 Usually the solution is based on the idea of local iterative
descent:
Starting with a guess w(0), generate a sequence of weight
vectors w(1), w(2), …, such that:
E (w(n  1))  E (w(n))
Method of Steepest descent
 The successive adjustments applied to the weight vector w are
in the direction of steepest descent; that is in a direction
opposite to the gradient of E (w).
 If g = E (w) , then the steepest descent algorithm is
w(n  1)  w(n)   g(n)
 Where  is a positive constant called the step size, or learning-
rate, parameter. In each step, the algorithm applies the
correction:
w (n)  w (n  1)  w (n)
  g(n)
 HW: Prove the convergence of this algorithm and show how it
is influenced by the learning rate . (Sec. 3.3 of Haykin)

 When  is small, the transient response of the algorithm is
overdamped, i.e., w(n) follows smooth path.
 When  is large, the transient response of the algorithm is
underdamped, i.e., w(n) follows zigzaging (oscillatory) path.
 When  exceeds critical value, the algorithm becomes unstable.
Figure 3.2 Trajectory of the method of steepest descent in a two-dimensional space for two
different values of learning-rate parameter: (a) small η (b) large η .The coordinates w1 and w2
are elements of the weight vector w; they both lie in the W -plane.
Newton’s Method
 The idea is to minimize the quadratic approximation of the
cost function E (w) around the current point w(n):
E (w (n))  E (w (n  1))  E (w (n))
1
 g T (n)w (n)  wT (n)H(n)w (n)
2
 Where H(n) is the Hessian matrix of E (w)
H (n)   2E (w (n))
 The weights are updated by   2E  2E  2E 
  
minimizing E (w) as:  w1
2 w1w2 w1wM 
  2E  2E  2E 
  w w  
w (n  1)  w (n)  w (n)  2 1 w22 w2wM 
     
 w (n)  H 1(n)g(n)  E E E 
2 2 2
  
 wM w1 wM w2 wM 2


Newton’s Method
 Generally, Newton’s method converges quickly and does not

exhibit the zigzagging behavior of the method of steepest-
descent.
 However, Newton’s method has two main disadvantages:

 the Hessian matrix H(n) has to be a positive definite matrix for
all n, which is not guaranteed by the algorithm.
 This is solved by the modified Gauss-Newton method.
 It has high computational complexity.

The Least-Mean Square Algorithm
 The aim of the LMS algorithm is to minimize the
instantaneous value of the cost function E (w) :
1 2
ˆ )  e ( n)
E (w
2
 Differentiation of E (w), with respect to w, yields :
E (wˆ) e(n)
 e( n )
wˆ w
 As with the least-square filters, the LMS operates on linear
neuron, we can write:
e(n)
e(n)  d (i )  w T
ˆ ( n ) x ( n)    x( n)
and wˆ
E (wˆ)
g ( n)    x ( n ) e( n )
Finally wˆ
ˆ (n)  g (n)  w
ˆ (n  1)  w
w ˆ (n)  x(n)e(n)
 The inverse of the learning-rate acts as a memory of the LMS
algorithm. The smaller the learning-rate , the longer the
memory span over the past data,
 which leads to more accurate results but with slow convergence rate.
 In the steepest-descent algorithm the weight vector w(n) follows

a well-defined trajectory in the weight space for a prescribed .
 In contrast, in the LMS algorithm, the weight vector w(n) traces a

random trajectory. For this reason, the LMS algorithm is
sometimes referred to as “stochastic gradient algorithm.”
 Unlike the steepest-descent, the LMS algorithm does not require

knowledge of the statistics of the environment. It produces an
instantaneous estimate of the weight vector.

 Summary of the LMS Algorithm

Virtues and Limitation of the LMS Alg.
Computational Simplicity and Efficiency:
 The Algorithm is very simple to code, only two or

three line of code.
 the computational complexity of the algorithm is

linear in the adjustable parameters.

A COMPUTATIONAL
EXAMPLE
16
Cost Function
Usually linear models produce concave cost functions

Figure copyright of Andrew Ng.

Cost Function
J(0,1)
1
0
Different starting point could lead to different solution


Finding a Solution
The corresponding solution (Contour of the cost function


21
22
23
24
25
26
27
28
Robustness
 Since the LMS is model independent, therefore it is robust with
respect to disturbance.
 That is, the LMS is optimal in accordance with H norm (the
response of the transfer function T of the estimator).
 The philosophy of the optimality is to accommodate with the worst-case
scenario:
• “If you don’t know what you are up against, plan for the worst scenario and
optimize.”
Figure 3.8 Formulation of the optimal H∞ estimation problem. The generic estimation error at the transfer
operator’s output could be the weight-error vector, the explanational error, etc.

Factors Limiting the LMS Performance
 The primary limitations of the LMS algorithm are:
 Its slow rate of convergence (which become serious when the
dimensionality of the input space becomes high), and
 Its sensitivity to variation in the eigen structure of the input. (it
typically requires a number of iterations equal to about 10 times the
dimensionality of the input data space for it to converge to a stable
solution.
• The sensitivity to changes in environment become particularly
acute when the condition number of the LMS algorithm is high.
• The condition number,
(R) = max /min,
• Where max and min are the maximum and minimum
eigenvalues of the correlation matrix, R xx.

Learning-rate Annealing Schedules
 The slow-rate convergence may be
attributed to maintaining the learning-
rate parameter, , constant.
 In stochastic approximation, the
learning-rate parameter is time-
varying parameter
 The most common form used is:
c
 ( n) 
n
 An alternative form, is the search-then-
converge schedule.
Figure 3.9 Learning-rate annealing
o schedules: The horizontal axis,
 ( n)  printed in color, pertains to the
1  (n /  ) standard LMS algorithm.

Computer Experiment
 d=1
Figure 3.6 LMS classification with distance 1, based on the double-moon

configuration of Fig. 1.8.
Computer Experiment
 d=-4
Figure 3.7 LMS classification with distance –4, based on the double-moon
configuration of Fig. 1.8.
Homework 3
•Problems:
•3.3
•Computer Experiment
•3.10, 3.13
34
Next Time
Multilayer Perceptrons
35

NN LMS DR Gamal PDF

Uploaded by

Copyright:

Available Formats

NN LMS DR Gamal PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

NN LMS DR Gamal PDF

Uploaded by

Copyright:

Available Formats

CSC445: Neural Networks

(most of figures in this presentation are copyrighted to Pearson Education, Inc.)

 Filtering Structure of the LMS Algorithm

 Unconstrained Optimization: A Review

 Method of Steepest Descent

 The Least-Mean Square Algorithm

ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq 2

 Some features of the LMS algorithm:

ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq 3

Τ : x(i), d (i), i  1,...,n,...

ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq 8

ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq 10

 Generally, Newton’s method converges quickly and does not

 However, Newton’s method has two main disadvantages:

 It has high computational complexity.

ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq 11

 In the steepest-descent algorithm the weight vector w(n) follows

 In contrast, in the LMS algorithm, the weight vector w(n) traces a

 Unlike the steepest-descent, the LMS algorithm does not require

ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq 13

ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq 14

Computational Simplicity and Efficiency:

 The Algorithm is very simple to code, only two or

 the computational complexity of the algorithm is

ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq 15

Usually linear models produce concave cost functions

ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq 17

Different starting point could lead to different solution

ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq 18

The corresponding solution (Contour of the cost function

ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq 20

ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq 29

ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq 30

ASU-CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq 31

Figure 3.6 LMS classification with distance 1, based on the double-moon

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.