Assignment 2: Introduction To Machine Learning Prof. B. Ravindran
Assignment 2: Introduction To Machine Learning Prof. B. Ravindran
Assignment 2: Introduction To Machine Learning Prof. B. Ravindran
Sol. (a)
2. Consider forward selection, backward selection and best subset selection with respect to the
same data set. Which of the following is true?
(a) Best subset selection can be computationally more expensive than forward selection
(b) forward selection and backward selection always lead to the same result
(c) best subset selection can be computationally less expensive than backward selection
(d) best subset selection and forward selection are computationally equally expensive
(e) both (b) and (d)
Sol. (a)
Explanation Best subset selection has to explore all possible subsets which takes exponential
time. It is not guaranteed that forward selection and backward selection take the same time.
Forward selection and backward selection are computational much cheaper than best subset
selection.
3. Adding interaction terms (such as products of two dimensions) along with original features in
linear regression
(a) can reduce training error.
(b) can increase training error.
(c) cannot affect training error.
Sol. (a)
Adding interaction terms gives us additional modeling freedom. This can fit the training data
better and hence leads to lower training error.
4. Consider the following five training examples
X = [2 3 4 5 6]
1
(a) (4 3)
(b) (5 3)
(c) (5 1)
(d) (1 5)
Sol. (b)
5. A study was conducted to understand the effect of number of hours the students spent studying
to their performance in the final exams. You are given the following 8 samples from the study.
What is the best linear fit on this dataset?
Number of hours spent studying (x) Score in the final exam (0-100) (y)
10 95
9 80
2 10
15 50
10 45
16 98
11 38
16 93
6. Which of the following shrinkage method is more likely to lead to sparse solution?
7. Consider the design matrix X of dimension N × (p + 1). Which of the following statements
are true?
(a) The rowspace of X is the same as the columnspace of X >
(b) The rowspace of X is the same as the rowspace of X >
(c) both (a) and (b)
(d) none of the above
2
Sol. (a)
8. How does LASSO differ from Ridge Regression? (multiple options may be correct)
(a) LASSO uses L1 regularization while Ridge Regression uses L2 regularization.
(b) LASSO uses L2 regularization while Ridge Regression uses L1 regularization.
(c) The LASSO constraint is a high-dimensional rhomboid while the Ridge Regression con-
straint is a high-dimensional ellipsoid.
(d) Ridge Regression shrinks more coefficients to 0 compared to LASSO.
(e) The Ridge Regression constraint is a high-dimensional rhomboid while the LASSO con-
straint is a high-dimensional ellipsoid.
(f) Ridge Regression shrinks less coefficients to 0 compared to LASSO.
(a) X
(b) S
(c) Xc
(d) V
(e) U
Sol. (d)
10. Let v1 , v2 , . . . vp denote the Principal Components of some data X, as extracted by Principal
Components Analysis and where v1 is the First Principal Component. What can you say
about the variance of X in the directions defined by v1 , v2 , . . . vp ? (multiple options may be
correct)
(a) X has the highest variance along v1
(b) X has the lowest variance along vp
(c) X has the lowest variance along v1
(d) X has the highest variance along vp
(e) Order of variance : v1 ≥ v2 ≥ . . . ≥ vp
(f) Order of variance : vp ≥ vp−1 ≥ . . . ≥ v1