Perceptron 1
Perceptron 1
Simple Perceptrons
Perform supervised learning
correct I/O associations are provided
Feed-forward networks
connections are one-directional
One layer
input layer + output layer
PR , ANN, & ML 2
Notations
N: dimension of the input vector
M: dimension of the output vector
inputs x j , j = 1,..., N
real outputs yi , i =1,...,M
weight vectors w ij , i = 1, ..., M , j = 1, ..., N
activation function g
y 1
y 2 y 3
N
w34 yi = g (neti ) = g (∑ wij x j )
j =1
x1 x2 x3 x4
PR , ANN, & ML 3
Perceptron Training
given
u
input patterns x
desired output patterns O u
how to adapt the connection weights such that
the actual outputs conform the desired outputs
u u
O i = y i i = 1, L , M
PR , ANN, & ML 4
Simplest case
two types of inputs
binary outputs (-1,1)
u u
thresholding sgn(w ⋅ x) = sgn(w1 x1 + w2 x2 + wo ) = y u
x2
x1
( w1 , w2 )
PR , ANN, & ML 5
Examples u u
x2 g(net) = sgn(w x + w x −1.5)
1 1 2 2
O
x1 x2 O w0 = 15
.
0 0 -1
w w1 = 1 w2 = 1
0 1 -1 x2
1 0 -1
1 1 1 x1 x1 x2
x1 x2 O x2 x2
0 0 -1
0 1 1
1 0 1 x1
x1
1 1 -1 PR , ANN, & ML 6
Linear separability
Is it at all possible to learn the desired I/O
associations?
yes, if wij can be found such that
N
Oi = sgn( ∑ wij x j − wi 0 ) = yi for all i and u
u u u
j =1
no, otherwise
PR , ANN, & ML 7
Perceptron Learning
Linear separable or not, how to find the set
of weights?
Using tagged samples
closed form solution
iterative solutions
PR , ANN, & ML 8
Closed Form Solution
x11 ... x 1n
1 1 w O
1
2 2 2
1x ... x n 1 w 2 O
=
O O O O M M
u u u
x1 ... x n 1 w o O
AW = B
W = (A T A) − 1 A T B
x1
(w1, w2)
PR , ANN, & ML 10
Perceptron Learning Rule (cont.)
If a positive pattern becomes a negative
pattern
x2
x1
(w1, w2)
PR , ANN, & ML 11
Perceptron Learning Rule (cont.)
If a negative pattern becomes a positive
pattern
x2
x1
(w1, w2 )
PR , ANN, & ML 12
w ( k ) + cx w ( k ) ⋅ x < 0, x ∈ +
(k )
w ( k +1) = w − cx w ( k ) ⋅ x > 0, x ∈ −
w (k ) otherwise
PR , ANN, & ML 13
Perceptron Learning Rule (cont.)
Weight is a signed, linear combination of
training points
Use those informative points (those the
classifier made a mistake, mistake driven)
This is VERY important, lead later to
generalization to Support Vector Machines
PR , ANN, & ML 14
Comparison
Version space
The (w1-w2) space of all feasible solutions
Perceptron learning
Greedy, gradient descent that often ends up at boundary
of the version space with little space for error
SVM learning
Center of largest imbedded sphere in the version space
(maximum margin) w 2
Bayes point machine SVM
Centroid of the version space perceptron
w1
vs Bayes
PR , ANN, & ML 15
Perceptron Usage Rules
After the weight has been determined
y = w ⋅ x = ( ∑ α i yi x i ) ⋅ x = ∑ α i yi x i ⋅ x
i i
PR , ANN, & ML 16
Hebb’s Learning Rule
Synapse strength should be increased when
both pre- and post-synapse neurons fire
vigorously
for binarynew
outputs old
w ij = w ij + ∆ w ij
2η y i u x j u u
if y i ≠ O i
u
∆ w ij =
0 otherwise
u u u u
= η (1 − y i O i ) y i x j
u u2 u u
= η ( yi − yi Oi ) x j
u u u
= η ( yi − Oi ) x j
δ PR , ANN, & ML 17
Case 1: O = −1, y = 1 Case 2: O=1, y =−1
x2 new x2
w
w old w old
2η x x1 x1
− 2η x
w new
PR , ANN, & ML 18
LMS (Widrow-Hoff, Delta)
Not restricted to binary outputs
Gradient search
N
1 u 2 1
E(w) = ∑∑(Oi − yi ) = ∑∑(Oi − g(∑wijxj ))
u u u 2
2u i 2u i j=1
∂E (w )
= − ∑ ( O i − g ( net i )) g ' ( net i ) x j
u u u u
∂ w ij u
new old
w ij = w ij + ∆ w ij
PR , ANN, & ML 19
Nothing but Chain Rule
u u 2 u u u u
∂E ( w ) ∂ (Oi − y i ) ∂ (Oi − y i ) ∂y i ∂net i
= u u u u
∂w ij ∂ ( Oi − y i ) ∂y i ∂net i ∂w ij
PR , ANN, & ML 20
O = g(net) = sgn(w1x1 + w2 x2 + b)
w1 = 0.4299
w1 w2
w2 = −0.2793
b = −0.1312
x1 = x x2 = y PR , ANN, & ML 21
Final training results Error vs. training epoch
PR , ANN, & ML 22
y = g(net) = sgn(w1x1 + w2 x2 + w3x3 + b) w1 = 0.4232
w1 w3 w2 = −0.7411
w2
w3 = −0.3196
x1 = x x2 = y x3 = z PR , ANN, & ML b = 0.7550 23
Final training results Error vs. training epoch
PR , ANN, & ML 24
O1 O2
y1 = g (w11x1 + w12 x2 + b1 )
w12 w21
w11 w22
y2 = g (w21x1 + w22 x2 + b2 )
x1 x2 PR , ANN, & ML 25
Final training results Error vs. training epoch
PR , ANN, & ML 26
PR , ANN, & ML 27
Final training results Error vs. training epoch
PR , ANN, & ML 28