0% found this document useful (0 votes)
3K views

Solution 4 Ann Weka 2012

The document discusses machine learning concepts including perceptrons, backpropagation algorithm, and error functions. It provides solutions to several tutorial questions on these topics. The questions cover determining perceptron weights from a decision surface, comparing perceptrons, deriving gradient descent rules for single and multi-unit perceptrons, revising backpropagation for tanh activation, alternative error functions, finding the optimal weight with minimum error, and calculating performance metrics from a confusion matrix.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3K views

Solution 4 Ann Weka 2012

The document discusses machine learning concepts including perceptrons, backpropagation algorithm, and error functions. It provides solutions to several tutorial questions on these topics. The questions cover determining perceptron weights from a decision surface, comparing perceptrons, deriving gradient descent rules for single and multi-unit perceptrons, revising backpropagation for tanh activation, alternative error functions, finding the optimal weight with minimum error, and calculating performance metrics from a confusion matrix.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

1

CS3244 Machine Learning Semester 1, 2012/13


Solution to Tutorial 4

1. What are the values of weights w
0
, w
1
, and w
2
for the perceptron whose decision
surface is illustrated in Figure 4.3? Assume the surface crosses the x
1
axis at 1,
and the x
2
axis at 2.
Answer:
The line for the decision surface corresponds to the equation x
2
=2x
1
+2, and since
all points above the line should be classified as positive, we have x
2
2x
1
2 >0.
Hence w
0
=2, w
1
=2, and w
2
=1.

2. Consider two perceptrons defined by the threshold expression w
0
+w
1
x
1
+w
2
x
2
>0.
Perceptron A has weight values
w
0
=1, w
1
=2, w
2
=1
and Perceptron B has weight values
w
0
=0, w
1
=2, w
2
=1
True or false? Perceptron A is more-general-than perceptron B. (More-general-
than is defined in Chapter 2).
Answer:







True. Perceptron A is more general than B, because any point lying above B will
also be above A. i.e. as per definition in chapter 2 for more-general-than:
)] ) ( ) ) ( )[( ( 1 x A 1 x B X x = = e


B
A
+
+
+



2
3. Derive a gradient descent training rule for a single unit with output o, where
o = w
0
+w
1
x
1
+w
1
x
1
2

+. . . +w
n
x
n
+w
n
x
n
2
Answer:
First, the error function is defined as:
2
2
1
) o t ( ) w ( E
d
D d
d
=

e


The update rule is the same, namely:
i i i
w w : w A + =
i
i
w
E
w
c
c
= A q

For w
0
,
) ( ) 1 )( (
) ( ) ( 2
2
1
) (
2
1
) (
2
1
0
2
0
2
0 0
d
D d
d d
D d
d
d d d
D d
d d
D d
d d
D d
d
o t o t
o t
w
o t o t
w
o t
w w
E
= =

c
c
=
c
c
=
c
c
=
c
c


e e
e e e

Thus
) (
0 d
D d
d
o t w = A

e
q

For w
1
,w
2
,,w
n

)) ( )( (
) ( ) ( 2
2
1
) (
2
1
) (
2
1
2
2 2
id id d
D d
d
d d
i
d
D d
d d
D d
d
i
d
D d
d
i i
x x o t
o t
w
o t o t
w
o t
w w
E
+ =

c
c
=
c
c
=
c
c
=
c
c


e
e e e

Thus
) )( (
2
id id d
D d
d i
x x o t w + = A

e
q


3
4. Consider a two-layer feedforward ANN with two inputs a and b, one hidden unit c,
and one output unit d. This network has five weights (w
ca
, w
cb
, w
c0
, w
dc
, w
d0
), where
w
x0
represents the threshold weight for unit x. Initialize these weights to the values
(0.1, 0.1, 0.1, 0.1, 0.1), then give their values after each of the first two training
iterations of the BACKPROPAGATION algorithm. Assume learning rate q =0.3,
momentum o = 0.9, incremental weight updates, and the following training
examples:
a b d
1 0 1
0 1 0
Answer:
The network and the sigmoid activation function sigmoid function are as follows:

y
e
) y (

+
=
1
1
o


Training example 1:
The outputs of the two neurons, noting that a=1and b=0:
53867 0 15498 0 1 1 0 5498 0 1 0
5498 0 2 0 1 1 0 0 1 0 1 1 0
. ) . ( ) . . . ( o
. ) . ( ) . . . ( o
d
c
= = + =
= = + + =
o o
o o

The error terms for the two neurons, noting that d=1:
002836 0 1146 0 1 0 5498 0 1 5498 0
1146 0 53867 0 1 53867 0 1 53867 0
. . . ) . ( .
. ) . ( ) . ( .
c
d
= =
= =
o
o

Compute the correction terms as follows, noting that a=1, b=0 and q=0.3:
0 0 002836 0 3 0
000849 0 1 002836 0 3 0
000849 0 1 002836 0 3 0
0189 0 5498 0 1146 0 3 0
0342 0 1 1146 0 3 0
0
0
= = A
= = A
= = A
= = A
= = A
. . w
. . . w
. . . w
. . . . w
. . . w
cb
ca
c
dc
d

0
a
b
c d
w
d0
w
c0

w
ca

w
cb

w
dc

4
and the new weights become:
1 0 0 1 0
100849 0 000849 0 1 0
100849 0 000849 0 1 0
1189 0 0189 0 1 0
1342 0 0342 0 1 0
0
0
. . w
. . . w
. . . w
. . . w
. . . w
cb
ca
c
dc
d
= + =
= + =
= + =
= + =
= + =

Training example 2:
The outputs of the two neurons, noting that a=0 and b=1:
5497 0 1996 0 1 1342 0 55 0 1189 0
55 0 200849 0 1 100849 0 1 1 0 0 100849 0
. ) . ( ) . . . ( o
. ) . ( ) . . . ( o
d
c
= = + =
= = + + =
o o
o o

The error terms for the two neurons, noting that d=0:
004 0 1361 0 1189 0 55 0 1 55 0
1361 0 5497 0 0 5497 0 1 5497 0
. ) . ( . ) . ( .
. ) . ( ) . ( .
c
d
= =
= =
o
o

Compute the correction terms as follows, noting that a=0, b=1, q=0.3 and o=0.9:
0012 0 0 9 0 1 004 0 3 0
00086 0 000849 0 9 0 0 004 0 3 0
0004 0 000849 0 9 0 1 004 0 3 0
0055 0 0189 0 9 0 55 0 1361 0 3 0
01 0 0342 0 9 0 1 1361 0 3 0
0
0
. . ) . ( . w
. . . ) . ( . w
. . . ) . ( . w
. . . . ) . ( . w
. . . ) . ( . w
cb
ca
c
dc
d
= + = A
= + = A
= + = A
= + = A
= + = A

and the new weights become:
0988 0 0012 0 1 0
1016 0 00086 0 100849 0
100849 0 0004 0 100849 0
1134 0 0055 0 1189 0
1242 0 01 0 1342 0
0
0
. . . w
. . . w
. . . w
. . . w
. . . w
cb
ca
c
dc
d
= =
= + =
= =
= =
= =



5
5. Revise the BACKPROPAGATION algorithm in Table 4.2 so that it operates on units
using the squashing function tanh in place of the sigmoid function. That is, assume
the output of a single unit is ) x . w tanh( o

= . Give the weight update rule for output
layer weights and hidden layer weights. Hint: ) x ( tanh ) x ( h tan
2
1 = ' .
Answer:
Steps T4.3 and T4.4 in Table 4.2 will become as follows, respectively:

k
outputs k
kh h h
k k k k
w ) o (
) o t )( o (
o o
o

e


2
2
1
1



6
6. Consider the alternative error function described in Section 4.81.

+
e e j , i
ji kd
D d outputs k
kd
w ) o t ( ) w ( E
2 2
2
1


Derive the gradient descent update rule for this definition of E. Show that it can be
implemented by multiplying each weight by some constant before performing the
standard gradient descent update given in Table 4.2.
Answer:

c
c
+
c
c
=
c
c
Vc
c
= A
A +
e e j , i
ji
ji
kd
D d outputs k
kd
ji ji
ji
ji
ji ji ji
w
w
) o t (
w w
) w ( E
w
) w ( E
w
w w w
2 2
2
1


The first term in the R.H.S of the above equation can be derived in the same manner
as in equation (4.27), while we continue to work on the 2
nd
term. For output nodes,
leads to:
ji j ji ji
ji ji j j j j ji ji
ji ji j j j j
ji
x w w
w x ) o ( o ) o t ( w w
w x ) o ( o ) o t (
w
) w ( E
qo |
q q

+
+
+ =
c
c
2 1
2 1


where q | 2 1 = and ) o ( o ) o t (
j
j j j j
= 1 o
Similarly, for hidden units, we can derive:
ji j ji ji
x w w qo | +
where q | 2 1 = and

e
=
) j ( Downstream k
kj k j j
w ) o ( o
j
o o 1
The above shows the update rule can be implemented by multiplying each weight
by some constant before performing the gradient descent update given in Table 4.2.
7
7. Assume the following error function:
2 2
2
1
2 ) ( w w w E o + =

where o, and are constants. The weight w is updated according to gradient
descent with a positive learning rate q. Write down the update equation for
w(k+1) given w(k). Find the optimum weight w that gives the minimal error E(w).
What is the value of the minimal E(w)? (8 marks)

Answer:

) ( ) ( ) 1 (
) (
w k w k w
w
w
E
w
w
w
E
q
q q

+ = +
=
c
c
= A
+ =
c
c


When E(w) becomes the smallest, 0 =
c
c
w
E

Thus, optimal

=
optimal
w

Minimal error:

o
2
2
2
2 ) (
2
2
2 2
2
= + =
optimal
w E

8
8. WEKA outputs the following confusion matrix after training a J 48 decision tree
classifier with the contact-lenses dataset. (a) Count the number of True Positives,
True Negatives, False Positives and False Negatives for each the three classes, i.e.
soft, hard and none. (b) Calculate the TP rate (Recall), FP rate, Precision and F-
measure for each class.
a b c <- - cl assi f i ed as
4 0 1 | a = sof t
0 1 3 | b = har d
1 2 12 | c = none
Answer:

soft:
(a) TP =4
TN =18
FP =1
FN =1

(b) TP rate =Recall =TP / (TP +FN) =4/5 =0.8
FP rate =FP / (FP +TN) =1/19 =0.053
Precision =TP / (TP +FP) =4/5 =0.8
F-Measure =20.80.8/(0.8+0.8) =0.8


hard:
(a) TP =1
TN =18
FP =2
FN =3

(b) TP rate =Recall =TP / (TP +FN) =1 / 4 =0.25
FP rate =FP / (FP +TN) =2/20 =0.1
Precision =TP / (TP +FP) =1 / 3=0.333
F-Measure =20.250.333/(0.25+0.333) =0.286


none:
(a) TP =12
TN =5
FP =4
FN =3

(b) TP rate =Recall =TP / (TP +FN) =12 / 15 =0.8
FP rate =FP / (FP +TN) =4/9 =0.444
Precision =TP / (TP +FP) =12 / 16 =0.75
F-Measure =20.80.75/(0.8+0.75) =0.774

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy