0% found this document useful (0 votes)
111 views16 pages

2nd Exam Question Paper 2

Uploaded by

ragssimoes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
111 views16 pages

2nd Exam Question Paper 2

Uploaded by

ragssimoes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Exam of Machine Learning

2 February 2021. Duration: 2:30

This exam has i) multiple choice questions and ii) open questions. Multiple choice questions should
be answered in a text file (question number followed by selected alinea). Open questions should also
be answered in the text file, if there are no equations involved. Otherwise, they should be answered in
separate sheets of paper photographed and sent by fenix together with the text file.
Portuguese speaking students should answer the questions in Portuguese.

x1 x2 y

1 −1 −5
−1 −3 5
2 −1 7

Problem 1 (2 points)
Consider the table above. We wish to predict the variable y ∈ R knowing the features x1 , x2 , using a
linear regression model ŷ = β0 + β1 x1 + β2 x2 . Please assume that β0 is known and is equal to 1.
Find the coefficients β = (β1 , β2 ) that minimize the sum of squared errors (SSE) criterion.

12 1
a) (− 11 , 3) b) (1, −2) c) ( 13 , − 12
11 ) d) (1, −1) e) none of the others

x1 x2 y x1 x2 y

1 2 0 −2 1 2
2 2 0 −3 1 2
3 1 1 −3 0 3
3 3 1 −1 2 3

Problem 2 (1 point)
Consider the training set defined in the tables above, where (x1 , x2 ) ∈ R2 denotes a feature vector
and y ∈ {0, 1, 2, 3} the class label.
Find the class predicted by the Nearest Neighbor (NN) classifier for an input vector (− 21 , 1):

a) 2 b) 0 c) 1 d) 3 e) none of the others


Problem 3 (1 point)
A supervised model M is said to overfit the data if the model performs

a) much better in the training set than in the test set b) very well in the training set
c) much better in the test set than in the training set d) very well in the test set
e) none of the others

Problem 4 (1 point)
Regularization methods are used to

a) estimate hyperparameters b) avoid overfitting


c) make the model more flexible d) increase the number of parameters
e)none of the others

Problem 5 (1 point)
Ridge regression tends to

a) increase the regression coefficients b) decrease the regression coefficients


c) make some coefficients equal to zero d) make the model more compact
e) none of the others

Problem 6 (2 points)
Consider a random variable x ∈ R+
0 with conditional probability density functions
 
 1 0≤x<1  αe−αx x ≥ 0
p(x|y = 0) = p(x|y = 1) =
 0 otherwise  0 x<0

where y ∈ {0, 1} is a binary class label. Consider a classifier with decision regions R1 = [0, T [, R0 =
[T, +∞[, where T ∈ [0, 1] is a threshold.
Compute the element P11 of the confusion matrix P = (Pij ), i, j = 0, 1.

a) e−αT b) T c)1 − e−αT d) 1 − T e) none of the others


x1 x2 y

0.8 −1.2 0.5

Problem 7 (2+2 points)


Consider the multi-layer perceptron defined above and the training example specified in the table.
Assume that all the weights are initialized with value 0.1 (including biases) and the units of the hidden
layer have ReLU activation functions. The output of the network is obtained by a weighted sum without
non linearity.
1. The network output for the training example is

a) 0.018 b) 0.139 c) − 0.009 d) 0 e) none of the others

2. Consider the squared loss L(y, ŷ) = (y − ŷ)2 . Compute the partial derivative of the loss L(y, ŷ)
with respect to the weight w11 for the training example

a) 0.077 b) 0 c) 0.134 d) − 0.077 e) none of the others

x1 x2 y

0 0 −1

0 2 −1
1 2 +1
√ √
− 2 − 2 +1

Problem 8 (2+1+1 points) Consider a training set defined above. We wish to train a non-linear
support vector machine (SVM) in feature space, using the nonlinear transformation

(x̃1 , x̃2 ) = φ (x1 , x2 ) = x21 , x21 + x22 .




1. Find the decision hyperplane in feature space using the training set (defined in input space):

a) w = ( 21 , − 12 ), b = 2 b) w = (1, 1), b = 2 c) w = (1, −1), b = 2


d) w = ( 12 , 12 ), b = −2 e) none of the others

2. Geometrically the decision boundary in the input space is

a) hyperbole b) parabola c) ellipse d) hyperplane e) none of the others


3. What is the kernel function K(x, y) corresponding to this feature map?

a) 2x12 y12 + x22 y22 + x12 y22 + x22 y12 b) 2x12 y12 + x22 y22
c) x12 y12 + x22 y22 − x12 y22 − x22 y12 d) x12 y12 + x22 y22 e) none of the others

Problem 9 (2 points) - Open question


Suppose you want to train a non-linear SVM using the RBF kernel. Explain how you would choose
parameters C, σ 2 .

Problem 10 (2 points) - Open question


Explain the differences in creating and training an MLP for regression and for classification.

0
Exam of Machine Learning

2 February 2021. Duration: 2:30

This exam has i) multiple choice questions and ii) open questions. Multiple choice questions should
be answered in a text file (question number followed by selected alinea). Open questions should also
be answered in the text file, if there are no equations involved. Otherwise, they should be answered in
separate sheets of paper photographed and sent by fenix together with the text file.
Portuguese speaking students should answer the questions in Portuguese.

x1 x2 y

1 −1 −5
−1 −3 5
2 −1 7

Problem 1 (2 points)
Consider the table above. We wish to predict the variable y ∈ R knowing the features x1 , x2 , using a
linear regression model ŷ = β0 + β1 x1 + β2 x2 . Please assume that β0 is known and is equal to −1.
Find the coefficients β = (β1 , β2 ) that minimize the sum of squared errors (SSE) criterion.

a) ( 31 , − 12
11 ) b) (1, −2) 12 1
c) (− 11 , 3) d) (1, −1) e) none of the others

x1 x2 y x1 x2 y

1 2 0 −2 1 2
2 2 0 −3 1 2
3 1 1 −3 0 3
3 3 1 −1 2 3

Problem 2 (1 point)
Consider the training set defined in the tables above, where (x1 , x2 ) ∈ R2 denotes a feature vector
and y ∈ {0, 1, 2, 3} the class label.
Find the class predicted by the Nearest Neighbor (NN) classifier for an input vector (− 21 , 1):

a) 3 b) 2 c) 1 d) 0 e) none of the others


Problem 3 (1 point)
A supervised model M is said to overfit the data if the model performs

a) much better in the test set than in the training set b) very well in the training set
c) much better in the training set than in the test set d) very well in the test set
e) none of the others

Problem 4 (1 point)
Regularization methods are used to

a) avoid overfitting b) estimate hyperparameters


c) make the model more flexible d) increase the number of parameters
e)none of the others

Problem 5 (1 point)
Ridge regression tends to

a) make the model more compact b) increase the regression coefficients


c) make some coefficients equal to zero d) decrease the regression coefficients
e) none of the others

Problem 6 (2 points)
Consider a random variable x ∈ R+
0 with conditional probability density functions
 
 1 0≤x<1  αe−αx x ≥ 0
p(x|y = 0) = p(x|y = 1) =
 0 otherwise  0 x<0

where y ∈ {0, 1} is a binary class label. Consider a classifier with decision regions R1 = [0, T [, R0 =
[T, +∞[, where T ∈ [0, 1] is a threshold.
Compute the element P00 of the confusion matrix P = (Pij ), i, j = 0, 1.

a) 1 − e−αT b) 1 − T c)e−αT d) T e) none of the others


x1 x2 y

−1.2 0.8 0.5

Problem 7 (2+2 points)


Consider the multi-layer perceptron defined above and the training example specified in the table.
Assume that all the weights are initialized with value 0.1 (including biases) and the units of the hidden
layer have ReLU activation functions. The output of the network is obtained by a weighted sum without
non linearity.
1. The network output for the training example is

a) − 0.009 b) 0.139 c) 0.018 d) 0 e) none of the others

2. Consider the squared loss L(y, ŷ) = (y − ŷ)2 . Compute the partial derivative of the loss L(y, ŷ)
with respect to the weight w11 for the training example

a) 0 b) − 0.077 c) 0.134 d) 0.077 e) none of the others

x1 x2 y

0 0 −1

0 2 −1
1 2 +1
√ √
− 2 − 2 +1

Problem 8 (2+1+1 points) Consider a training set defined above. We wish to train a non-linear
support vector machine (SVM) in feature space, using the nonlinear transformation

(x̃1 , x̃2 ) = φ (x1 , x2 ) = x21 , x21 + x22 .




1. Find the decision hyperplane in feature space using the training set (defined in input space):

a) w = (1, 1), b = 2 b) w = ( 21 , 21 ), b = −2 c) w = (1, −1), b = 2


d) w = ( 12 , − 12 ), b =2 e) none of the others

2. Geometrically the decision boundary in the input space is

a) hyperplane b) parabola c) ellipse d) hyperbole e) none of the others


3. What is the kernel function K(x, y) corresponding to this feature map?

a) x12 y12 + x22 y22 b)x12 y12 + x22 y22 − x12 y22 − x22 y12
c)2x12 y12 + x22 y22 d)2x12 y12 + x22 y22 + x12 y22 + x22 y12 e) none of the others

Problem 9 (2 points) - Open question


Suppose you want to train a non-linear SVM using the RBF kernel. Explain how you would choose
parameters C, σ 2 .

Problem 10 (2 points) - Open question


Explain the differences in creating and training an MLP for regression and for classification.

1
Exam of Machine Learning

2 February 2021. Duration: 2:30

This exam has i) multiple choice questions and ii) open questions. Multiple choice questions should
be answered in a text file (question number followed by selected alinea). Open questions should also
be answered in the text file, if there are no equations involved. Otherwise, they should be answered in
separate sheets of paper photographed and sent by fenix together with the text file.
Portuguese speaking students should answer the questions in Portuguese.

x1 x2 y

1 −1 −5
−1 −3 5
2 −1 7

Problem 1 (2 points)
Consider the table above. We wish to predict the variable y ∈ R knowing the features x1 , x2 , using a
linear regression model ŷ = β0 + β1 x1 + β2 x2 . Please assume that β0 is known and is equal to 1.
Find the coefficients β = (β1 , β2 ) that minimize the sum of squared errors (SSE) criterion.

12 1
a) (− 11 , 3) b) (1, −2) c) (1, −1) d) ( 13 , − 12
11 ) e) none of the others

x1 x2 y x1 x2 y

1 2 0 −2 1 2
2 2 0 −3 1 2
3 1 1 −3 0 3
3 3 1 −1 2 3

Problem 2 (1 point)
Consider the training set defined in the tables above, where (x1 , x2 ) ∈ R2 denotes a feature vector
and y ∈ {0, 1, 2, 3} the class label.
Find the class predicted by the Nearest Neighbor (NN) classifier for an input vector (− 21 , 1):

a) 2 b) 0 c) 3 d) 1 e) none of the others


Problem 3 (1 point)
A supervised model M is said to overfit the data if the model performs

a) very well in the training set b) much better in the test set than in the training set
c) very well in the test set d) much better in the training set than in the test set
e) none of the others

Problem 4 (1 point)
Regularization methods are used to

a) estimate hyperparameters b) make the model more flexible


c) avoid overfitting d) increase the number of parameters
e)none of the others

Problem 5 (1 point)
Ridge regression tends to

a) decrease the regression coefficients b) increase the regression coefficients


c) make some coefficients equal to zero d) make the model more compact
e) none of the others

Problem 6 (2 points)
Consider a random variable x ∈ R+
0 with conditional probability density functions
 
 1 0≤x<1  αe−αx x ≥ 0
p(x|y = 0) = p(x|y = 1) =
 0 otherwise  0 x<0

where y ∈ {0, 1} is a binary class label. Consider a classifier with decision regions R1 = [0, T [, R0 =
[T, +∞[, where T ∈ [0, 1] is a threshold.
Compute the element P11 of the confusion matrix P = (Pij ), i, j = 0, 1.

a) 1 − e−αT b) T c) e−αT d) 1 − T e) none of the others


x1 x2 y

0.8 −1.2 0.5

Problem 7 (2+2 points)


Consider the multi-layer perceptron defined above and the training example specified in the table.
Assume that all the weights are initialized with value 0.1 (including biases) and the units of the hidden
layer have ReLU activation functions. The output of the network is obtained by a weighted sum without
non linearity.
1. The network output for the training example is

a) 0 b) 0.139 c) − 0.009 d) 0.018 e) none of the others

2. Consider the squared loss L(y, ŷ) = (y − ŷ)2 . Compute the partial derivative of the loss L(y, ŷ)
with respect to the weight w11 for the training example

a) 0.077 b) − 0.077 c) 0.134 d) 0 e) none of the others

x1 x2 y

0 0 −1

0 2 −1
1 2 +1
√ √
− 2 − 2 +1

Problem 8 (2+1+1 points) Consider a training set defined above. We wish to train a non-linear
support vector machine (SVM) in feature space, using the nonlinear transformation

(x̃1 , x̃2 ) = φ (x1 , x2 ) = x21 , x21 + x22 .




1. Find the decision hyperplane in feature space using the training set (defined in input space):

a) w = ( 21 , 12 ), b = −2 b) w = (1, 1), b = 2 c) w = (1, −1), b = 2


d) w = ( 12 , − 12 ), b =2 e) none of the others

2. Geometrically the decision boundary in the input space is

a) ellipse b) parabola c) hyperbole d) hyperplane e) none of the others


3. What is the kernel function K(x, y) corresponding to this feature map?

a) x12 y12 + x22 y22 − x12 y22 − x22 y12 b) 2x12 y12 + x22 y22
c) 2x12 y12 + x22 y22 + x12 y22 + x22 y12 d) x12 y12 + x22 y22 e) none of the others

Problem 9 (2 points) - Open question


Suppose you want to train a non-linear SVM using the RBF kernel. Explain how you would choose
parameters C, σ 2 .

Problem 10 (2 points) - Open question


Explain the differences in creating and training an MLP for regression and for classification.

2
Exam of Machine Learning

2 February 2021. Duration: 2:30

This exam has i) multiple choice questions and ii) open questions. Multiple choice questions should
be answered in a text file (question number followed by selected alinea). Open questions should also
be answered in the text file, if there are no equations involved. Otherwise, they should be answered in
separate sheets of paper photographed and sent by fenix together with the text file.
Portuguese speaking students should answer the questions in Portuguese.

x1 x2 y

1 −1 −5
−1 −3 5
2 −1 7

Problem 1 (2 points)
Consider the table above. We wish to predict the variable y ∈ R knowing the features x1 , x2 , using a
linear regression model ŷ = β0 + β1 x1 + β2 x2 . Please assume that β0 is known and is equal to −1.
Find the coefficients β = (β1 , β2 ) that minimize the sum of squared errors (SSE) criterion.

a) (1, −2) b) ( 31 , − 12
11 )
12 1
c) (− 11 , 3) d) (1, −1) e) none of the others

x1 x2 y x1 x2 y

1 2 0 −2 1 2
2 2 0 −3 1 2
3 1 1 −3 0 3
3 3 1 −1 2 3

Problem 2 (1 point)
Consider the training set defined in the tables above, where (x1 , x2 ) ∈ R2 denotes a feature vector
and y ∈ {0, 1, 2, 3} the class label.
Find the class predicted by the Nearest Neighbor (NN) classifier for an input vector (− 21 , 1):

a) 2 b) 3 c) 1 d) 0 e) none of the others


Problem 3 (1 point)
A supervised model M is said to overfit the data if the model performs

a) very well in the test set b) much better in the training set than in the test set
c) very well in the training set d) much better in the test set than in the training set
e) none of the others

Problem 4 (1 point)
Regularization methods are used to

a) increase the number of parameters b) estimate hyperparameters


c) make the model more flexible d) avoid overfitting
e)none of the others

Problem 5 (1 point)
Ridge regression tends to

a) make the model more compact b) increase the regression coefficients


c) decrease the regression coefficients d) make some coefficients equal to zero
e) none of the others

Problem 6 (2 points)
Consider a random variable x ∈ R+
0 with conditional probability density functions
 
 1 0≤x<1  αe−αx x ≥ 0
p(x|y = 0) = p(x|y = 1) =
 0 otherwise  0 x<0

where y ∈ {0, 1} is a binary class label. Consider a classifier with decision regions R1 = [0, T [, R0 =
[T, +∞[, where T ∈ [0, 1] is a threshold.
Compute the element P00 of the confusion matrix P = (Pij ), i, j = 0, 1.

a) 1 − e−αT b) T c)e−αT d) 1 − T e) none of the others


x1 x2 y

−1.2 0.8 0.5

Problem 7 (2+2 points)


Consider the multi-layer perceptron defined above and the training example specified in the table.
Assume that all the weights are initialized with value 0.1 (including biases) and the units of the hidden
layer have ReLU activation functions. The output of the network is obtained by a weighted sum without
non linearity.
1. The network output for the training example is

a) − 0.009 b) 0.018 c) 0.139 d) 0 e) none of the others

2. Consider the squared loss L(y, ŷ) = (y − ŷ)2 . Compute the partial derivative of the loss L(y, ŷ)
with respect to the weight w11 for the training example

a) 0.134 b) − 0.077 c) 0 d) 0.077 e) none of the others

x1 x2 y

0 0 −1

0 2 −1
1 2 +1
√ √
− 2 − 2 +1

Problem 8 (2+1+1 points) Consider a training set defined above. We wish to train a non-linear
support vector machine (SVM) in feature space, using the nonlinear transformation

(x̃1 , x̃2 ) = φ (x1 , x2 ) = x21 , x21 + x22 .




1. Find the decision hyperplane in feature space using the training set (defined in input space):

a) w = (1, 1), b = 2 b) w = (1, −1), b = 2 c) w = ( 21 , 21 ), b = −2


d) w = ( 12 , − 12 ), b = 2 e) none of the others

2. Geometrically the decision boundary in the input space is

a) hyperplane b) parabola c)hyperbole d) ellipse e) none of the others


3. What is the kernel function K(x, y) corresponding to this feature map?

a) x12 y12 + x22 y22 b)2x12 y12 + x22 y22 + x12 y22 + x22 y12
c)2x12 y12 + x22 y22 d)x12 y12 + x22 y22 − x12 y22 − x22 y12 e) none of the others

Problem 9 (2 points) - Open question


Suppose you want to train a non-linear SVM using the RBF kernel. Explain how you would choose
parameters C, σ 2 .

Problem 10 (2 points) - Open question


Explain the differences in creating and training an MLP for regression and for classification.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy