Error
Error
Error
• Introduction
• Adaline (Adaptive Linear Neuron) Networks
• Derivation of the LMS algorithm
• Example
• Limitation of Adaline
Limit of Perceptron Learning rule
• quadratic optimization for finding the optimal hyperplane when the
two classes are linearly separable and when they are not;
• If there is no separating hyperplane, the perceptron will never
classify the samples 100% correctly.
• But there is nothing from trying. So we need to add something to
stop the training, like:
• Put a limit on the number of iterations, so that the algorithm will terminate
even if the sample set is not linearly separable.
• Include an error bound. The algorithm can stop as soon as the portion of
misclassified samples is less than this bound. This ideal is developed in the
Adaline training algorithm.
Error Correcting Learning
• The objective of this learning is to start from an
arbitrary point error and then move toward a global
minimum error, in a step-by-step fashion.
• The arbitrary point error determined by the initial values
assigned to the synaptic weights.
• It is closed-loop feedback learning.
• Examples of error-correction learning:
• the least-mean-square (LMS) algorithm (Windrow and Hoff),
also called delta rule
• and its generalization known as the back-propagation (BP)
algorithm.
Error Correcting Learning consider as a
Search problem
• The task can be seen as a search problem in the weight
space:
• Start from a random position (defined by the initial weights)
and find a set of weights that minimizes the error on the given
training set .
Let
Initial state: a random set of weights
Goal state: a set of weights that
minimizes the error on the training set.
Evaluation function (performance
index, or cost function): an error Local
Minimum
function.
Operators: how to move from one
state to the other; defined by the
Global Minimum
learning algorithm.
Lecture 4-4
Adaline (Adaptive Linear Neuron) Networks
• 1960 - Bernard Widrow and his student Marcian Hoff
introduced the ADALINE Networks and its learning rule
which they called the Least mean square (LMS) algorithm
(or Widrow-Hoff algorithm or delta rule)
• The Widrow-Hoff algorithm
• can only train by single-Layer network.
• Both the Perceptron and Adaline can only solve linearly
separable problems
• (i.e., the input patterns can be separated by a linear plane into
groups, like AND and OR problems).
Adaline Architecture
Given:
xk(n): an input value for a neuron k at iteration n,
dk(n): the desired response or the target response for
neuron k.
Let:
yk(n) : the actual response of neuron k.
The error function is the Mean Square Error
• ADALINEs use the Widrow-Hoff algorithm or Least
Mean Square (LMS) algorithm to adjusts the weights
of the linear network in order to minimize the mean
square error
• Error : difference between the target and actual
network output (delta rule).
error signal for neuron k at iteration n: ek(n) = dk(n)- yk(n)
• mean square error in batch mode:
• batch mode means take the mean squared error over all m
training patterns: m
1 1
E ( n) ( d p y p ) 2
m p 1 2
Error Landscape in Weight Space
•Total error signal is a function of the weights
Ideally, we would like to find the global minimum (i.e. the
optimal solution)
E(w)
Decreasing E(w)
w1
E(w)
Decreasing E(w)
w2
Error Landscape in Weight Space, cont.
• The error space of
• the linear networks
(ADALINE’s) is a parabola (in
1d: one weight vs. error) or
• a paraboloid (in high
dimension)
• and it has only one minimum
called the global minmum.
Error Landscape in Weight Space, cont.
Takes steps downhill (w1,w2)
(w1+w1,w2 +w2)
Dw = - η * E/w
LMS Algorithm - Derivation
• Steepest gradient descent rule for change of the
weights:
Given
• xk(n): an input value for a neuron k at iteration n,
• dk(n): the desired response or the target response for neuron k.
Let:
• yk(n) : the actual response of neuron k.
• ek(n) : error signal = dk(n)- yk(n)
Train the wi’s such that they minimize the squared error after each
iteration 1 2
E (n) ek (n)
2
LMS Algorithm – Derivation, cont.
• The derivative of the error with respect to each
E
weight wij
wij j i
can be written as: 1 1
ei2 d i yi 2
E wij
E
2 2
w w w
ij ij ij
Next we use the chain rule to split this into two derivatives:
1
E
d i yi 2 y
2 i
wij 2 wij
E R
d i yi * x j * f wij x j
'
wij j 1
LMS Algorithm – Derivation, cont.
E
E wij d i yi * x j * f ' neti
wij
E
E wij d i yi * x j
wij
• The widrow-Hoff learning rule is:
stopping criteria: if the Mean Squared Error across all the training samples is less than a
specified value, stop the training.
Otherwise , cycle through the training set again (go to step 2)
Convergence Phenomenon
• The performance of an ADALINE neuron depends heavily
on the choice of the learning rate h.
• How to choose it?
• Too big
• the system will oscillate and the system will not converge
• Too small
• the system will take a long time to converge
• Typically, h is selected by trial and error
• typical range: 0.01 < h < 1.0
• often start at 0.1
• sometimes it is suggested that:
0.1/m < h < 1.0 /m
where m is the number of inputs
Example
• The input/target pairs for our test problem are
e = t – y = -1 – 0 = -1
w(1) w(0) * e * P1
0 1 0.4
w(1) 0 0.4 1 0.4
0 1 0.4
Example Iteration Two
• Second iteration – p2
1
y w(1) * P2 0.4 0.4 0 .4 1 0.4
1
e = t – y = 1 – (-0.4) = 1.4
w( 2) w(1) * e * P2
0.4 1 0.96
w( 2) 0.4 0.4(1.4) 1 0.16
0.4 1 0.16