Assignment 4 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
Assignment 4 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
)
Introduction to Machine Learning
Prof. B. Ravindran
1. Which of the following are convex functions?
Pn
(a) f (x) = ( i=1 xpi )1/p , where x ∈ Rn and p ≥ 0
Pn
(b) f (x) = log( i=1 exp xi ) where xi ∈ Rn
Pn
(c) f (x) = i=1 sin(xi ), where x ∈ Rn
Pn
(d) f (x) = i=1 xi log xi , where x ∈ Rn
Solution - (b), (c)
a - This would hold true for p ≥ 1.
b - Log-Sum-Exponential function can be proven to be convex by showing that the hessian is
psd.
d - You can use the hessian to prove convexity. Note that the domain restricts the values xi
can take, hence making it obviously a psd
2. We discussed two approaches to classification, one that learns the discriminant functions, and
the other that is based on modeling hyperplanes. Which of these approaches is more suitable
for multi-class problems and why?
(a) Discriminant functions; because they allow for a probabilistic interpretation of the pre-
dictions.
(b) Hyperplane methods; because they allow for a probabilistic interpretation of the predic-
tions.
(c) Discriminant functions; because an appropriate set of functions will allow us to efficiently
disambiguate class predictions.
(d) Hyperplane methods; because we can use basis expansion to transform the input to a
space where class-boundaries are linear.
Solution - c
min x2 + 1
s.t. (x − 2)(x − 4) ≤ 0
1
Solution - a, c
2
9λ
g(λ) = − 1+λ + 1 + 8λ if λ > −1; g(λ) = −∞ otherwise
Note that λ ≤ −1 will mean the function will have a minima instead of maxima.
Now we want to maximize g, which will occur at λ = 2. x̂ = 2. The given optimization
problem is convex and we can see that there are points in the relative interior of the domain.
Hence strong duality holds.
4. Which of the following is/are true about the Perceptron classifier?
(a) For two dimensional data points, the separating hyperplane learnt by a linear SVM will
be a straight line.
(b) In theory, a Gaussian kernel SVM can model any complex separating hyperplane.
(c) For every kernel function used in a SVM, one can obtain a equivalent closed form basis
expansion.
(d) Overfitting in an SVM is a function of number of support vectors.
Solution- a, b, d
b - Gaussian kernel can be written as a taylor expansion and seen as a basis expansion of
infinite dimensions, hence in theory giving it the ability to model any separating hyperplane.
d - More the number of Support Vectors, higher the chance of the classifier being over fit.
2
Figure 1: Q6
6. Consider a two class problem, whose training points are distributed in the figure below. One
possible separating hyperplane is shown in the figure.
(a) A classifier can be learnt using the perceptron training algorithm.
(b) A linear SVM will not work well.
(c) A linear SVM is sufficient for this data.
(d) A non zero C value is essential for this data.
Solution - b.
Perceptron algorithm can learn linear classifier for linearly separable data. Even a linear SVM
will work for the same reason. d is false because, C represents the cost of miss-classifying a
point, and here irrespective of the cost, you can find a solution for the optimization problem.
7. For a two-class classification problem, we use an SVM classifier and obtain the following
separating hyperplane. We have marked 4 instances of the training data. Identify the point
which will have the most impact on the shape of the boundary on it’s removal.
(a) 1
(b) 2
(c) 3
(d) 4
Solution - a
We need to identify support vectors on which the hyperplane is supported. The support vectors
lie on a margin at a fixed distance from the separating hyperplane. By removing point 1, we
separating hyperplane will change, since it is on the boundary.
8. For the dataset 1, train linear and radial basis function kernel SVMs. What are the number
of Support Vectors in each of the case?
(a) 100, 100
(b) 10, 105
3
Figure 2: Q7
(c) 3, 104
(d) 500, 50
Solution - c
The point of observing the number of support vectors is to see the quality of the model learnt
in some sense. If you get a model with very high number of support vectors, it means that the
model has overfitted to the given data, and might not work very well on unseen data. This
can be used as an indicator to decide on the kernel to be used, as you will notice in a later
question.
9. For dataset 2, train 5 degree polynomial (5 degree, coef0 = 0), 10 degree polynomial (10 degree,
coef0 = 0) and radial basis kernel functions. What are the number of support vectors for each?
(a) 10, 300, 56
(b) 324, 20, 27
(c) 43, 98, 76
(d) 12, 27, 20
Solution - b
10. Based on the previous experiments, which would you think is the ideal classifier for Dataset 1.
(a) Linear SVM
(b) Polynomial SVM
(c) Radial basis SVM
Solution - a
Linear SVM gives the classifier with the least number of support vectors which means the least
overfitting thus, it would be the best.