0% found this document useful (0 votes)
7 views

Perceptron 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Perceptron 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Simple Perceptrons

Simple Perceptrons
 Perform supervised learning
 correct I/O associations are provided

 Feed-forward networks
 connections are one-directional

 One layer
 input layer + output layer

PR , ANN, & ML 2
Notations
 N: dimension of the input vector
 M: dimension of the output vector
 inputs x j , j = 1,..., N
 real outputs yi , i =1,...,M
 weight vectors w ij , i = 1, ..., M , j = 1, ..., N
 activation function g
y 1
y 2 y 3

N
w34 yi = g (neti ) = g (∑ wij x j )
j =1

x1 x2 x3 x4
PR , ANN, & ML 3
Perceptron Training
 given
u
 input patterns x
 desired output patterns O u
 how to adapt the connection weights such that
the actual outputs conform the desired outputs
u u
O i = y i i = 1, L , M

PR , ANN, & ML 4
Simplest case
 two types of inputs
 binary outputs (-1,1)
u u
 thresholding sgn(w ⋅ x) = sgn(w1 x1 + w2 x2 + wo ) = y u

x2

x1

( w1 , w2 )

PR , ANN, & ML 5
Examples u u
x2 g(net) = sgn(w x + w x −1.5)
1 1 2 2
O
x1 x2 O w0 = 15
.
0 0 -1
w w1 = 1 w2 = 1
0 1 -1 x2

1 0 -1
1 1 1 x1 x1 x2
x1 x2 O x2 x2

0 0 -1
0 1 1
1 0 1 x1
x1
1 1 -1 PR , ANN, & ML 6
Linear separability
 Is it at all possible to learn the desired I/O
associations?
 yes, if wij can be found such that
N
Oi = sgn( ∑ wij x j − wi 0 ) = yi for all i and u
u u u

j =1
 no, otherwise

 Single-layer perceptron is severely limited


in what it can learn

PR , ANN, & ML 7
Perceptron Learning
 Linear separable or not, how to find the set
of weights?
 Using tagged samples
 closed form solution
 iterative solutions

PR , ANN, & ML 8
Closed Form Solution
 x11 ... x 1n 
1  1 w  O 
1

 2 2    2 
 1x ... x n 1   w 2  O 
=
O O O O M   M 
 u u    u 
 x1 ... x n 1   w o   O 
AW = B
W = (A T A) − 1 A T B

 Not practical when number of samples is


large (most likely case)
PR , ANN, & ML 9
Perceptron Learning Rule
 If a pattern is correctly classified, no action
x2

x1

(w1, w2)

PR , ANN, & ML 10
Perceptron Learning Rule (cont.)
 If a positive pattern becomes a negative
pattern
x2

x1

(w1, w2)

PR , ANN, & ML 11
Perceptron Learning Rule (cont.)
 If a negative pattern becomes a positive
pattern
x2

x1

(w1, w2 )

PR , ANN, & ML 12
w ( k ) + cx w ( k ) ⋅ x < 0, x ∈ +
 (k )
w ( k +1) = w − cx w ( k ) ⋅ x > 0, x ∈ −
 w (k ) otherwise

w ( k ) + cyx y (w ( k ) ⋅ x) < 0, x ∈ +or −



w ( k +1) =
 w (k ) otherwise

 How should c be decided?
 Fixed increment
 Fractional correction

PR , ANN, & ML 13
Perceptron Learning Rule (cont.)
 Weight is a signed, linear combination of
training points
 Use those informative points (those the
classifier made a mistake, mistake driven)
 This is VERY important, lead later to
generalization to Support Vector Machines

PR , ANN, & ML 14
Comparison
 Version space
 The (w1-w2) space of all feasible solutions
 Perceptron learning
 Greedy, gradient descent that often ends up at boundary
of the version space with little space for error
 SVM learning
 Center of largest imbedded sphere in the version space
(maximum margin) w 2
 Bayes point machine SVM
 Centroid of the version space perceptron
w1
vs Bayes
PR , ANN, & ML 15
Perceptron Usage Rules
 After the weight has been determined
y = w ⋅ x = ( ∑ α i yi x i ) ⋅ x = ∑ α i yi x i ⋅ x
i i

 Classification involves inner product of


training samples and test samples
 This is again VERY important, lead later
to generalization to Kernel Methods

PR , ANN, & ML 16
Hebb’s Learning Rule
 Synapse strength should be increased when
both pre- and post-synapse neurons fire
vigorously
 for binarynew
outputs old
w ij = w ij + ∆ w ij
 2η y i u x j u u
if y i ≠ O i
u
∆ w ij = 
 0 otherwise
u u u u
= η (1 − y i O i ) y i x j
u u2 u u
= η ( yi − yi Oi ) x j
u u u
= η ( yi − Oi ) x j
δ PR , ANN, & ML 17
 Case 1: O = −1, y = 1  Case 2: O=1, y =−1

x2 new x2
w
w old w old

2η x x1 x1
− 2η x
w new

PR , ANN, & ML 18
LMS (Widrow-Hoff, Delta)
 Not restricted to binary outputs
 Gradient search
N
1 u 2 1
E(w) = ∑∑(Oi − yi ) = ∑∑(Oi − g(∑wijxj ))
u u u 2

2u i 2u i j=1

∂E (w )
= − ∑ ( O i − g ( net i )) g ' ( net i ) x j
u u u u

∂ w ij u
new old
w ij = w ij + ∆ w ij

+ η ∑ ( O i − g ( net i )) g ' ( net


old u u u u
= w ij i )x j
u

PR , ANN, & ML 19
Nothing but Chain Rule
u u 2 u u u u
∂E ( w ) ∂ (Oi − y i ) ∂ (Oi − y i ) ∂y i ∂net i
= u u u u
∂w ij ∂ ( Oi − y i ) ∂y i ∂net i ∂w ij

= − ∑ (Oi − g ( net i )) g ' ( net i )x j


u u u u

PR , ANN, & ML 20
O = g(net) = sgn(w1x1 + w2 x2 + b)
w1 = 0.4299
w1 w2
w2 = −0.2793
b = −0.1312
x1 = x x2 = y PR , ANN, & ML 21
Final training results Error vs. training epoch

PR , ANN, & ML 22
y = g(net) = sgn(w1x1 + w2 x2 + w3x3 + b) w1 = 0.4232
w1 w3 w2 = −0.7411
w2
w3 = −0.3196
x1 = x x2 = y x3 = z PR , ANN, & ML b = 0.7550 23
Final training results Error vs. training epoch

PR , ANN, & ML 24
O1 O2
y1 = g (w11x1 + w12 x2 + b1 )
w12 w21
w11 w22
y2 = g (w21x1 + w22 x2 + b2 )

x1 x2 PR , ANN, & ML 25
Final training results Error vs. training epoch

PR , ANN, & ML 26
PR , ANN, & ML 27
Final training results Error vs. training epoch

PR , ANN, & ML 28

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy