Lecture 2
Lecture 2
Lecture 2
ADALINE
ADAptive LINEar neuron
Proposed by Widrow & Hoff 1960
Typically uses bipolar (+1,-1) activations for its
inputs and targets
Not restricted to such values
1
Architecture
Bipolar Input
Bipolar Target
Net Input: Yin=WTX+b
If the net is being used
for pattern classification
with bipolar class labels,
a threshold function (with
threshold = 0) is applied
to the net input to obtain
the activations
2
Difference between Perceptron and Adaline
3
Difference between Perceptron and Adaline
4
Difference between Perceptron and Adaline
5
Training
The learning rule minimizes the mean squared error
between the activations and the target value
Delta Rule, LMS or Widrow-Hoff Rule
6
Training: Proof for Delta Rule
Error for all P samples: mean square error
1 P
=E ∑
P p =1
(t ( p ) − y _ in( p )) 2
∂E 2 P ∂
∂wi
=
[ ∑
P p =1
(t ( p ) − y _ in ( p ))]
∂wi
(t ( p ) − y _ in( p )
2 P
−[ ∑ (t ( p ) − y _ in( p ))]xi
=
P p =1
∂E 2 P
∆wi =
∝− [ ∑ (t ( p ) − y _ in( p ))]xi 7
∂wi P 1
Error Surface Error Contour
2 24
22
1.5
25 20
20 1
18
15
10
0.5 16
Sum Squared Error
0 14
Bias B
-5 0
12
-10
-15 -0.5
10
-20
-25 8
-1
2
6
1 2
1 -1.5
0
4
0
-1
-1
-2 2
Bias B -2 -2 -1 0 1 2
-2 Weight W
Weight W 8
Training…
Application of Delta Rule
Method 1 (sequential mode): change w i after each training
pattern by α (t ( p) − y _ in( p)) xi
Method 2 (batch mode): change w i at the end of each epoch.
Within an epoch, cumulate α (t ( p) − y _ in( p)) xi
for every pattern (x(p), t(p))
Method 2 is slower but may provide slightly better results
(because Method 1 may be sensitive to the sample ordering)
Notes:
E monotonically decreases until the system reaches a state
with (local) minimum E (a small change of any w i will cause E
to increase).
At a local minimum E state, ∂E / ∂wi = 0 ∀i , but E is not
guaranteed to be zero
9
Training Algorithm
10
Parameter Initialization
Learning Rate
Hecht & Nielson
11
Application
12