Week 5
Week 5
Note:
1. In the following assignment, denotes the data matrix of shape where and are
the number of features and samples, respectively.
2. denotes the sample and denotes the corresponding label.
3. denotes the weight vector (parameter) in the linear regression model.
Question 1
Statement
An ML engineer comes up with two different models for the same dataset. The performances of
these two models on the training dataset and test dataset are as follows:
Options
(a)
Model 1
(b)
Model 2
Answer
(a)
Solution
In model , the test error is very low compared to model even though the training error is high
in model . We choose model as it worked well on unseen data.
Question 2
Statement
Consider a model for a given -dimensional training data points and
corresponding labels as follows:
where is the average of all the labels. Which of the following error function will always give the
zero training error for the above model?
Options
(a)
(b)
(c)
(d)
Answer
(c)
Solution
The sum of squared error and absolute error will give zero error only if predicted values are the
same as actual values for all the examples.
This error function will give zero error for the above model.
label ( )
-1 5
0 7
1 6
Question 3
Statement
We want to fit a linear regression model of the form . Assume that the initial weight
vector is . What will be the weight after one iteration using the gradient descent algorithm
assuming the squared loss function? Assume the learning rate is .
Answer
No range is required
Solution
At iteration , we have
At , we have
Here
Question 4
Statement
If we stop the algorithm at the weight calculated in question 1, what will be the prediction for the
data point ?
Answer
8 No range is required
Solution
The model is given as
at ,
Question 5
Statement
Assume that denotes the updated weight after the iteration in the stochastic gradient
descent. At each step, a random sample of the data points is considered for weight update. What
will be the final weight after iterations?
Options
(a)
(b)
(c)
(d)
any of the
Answer
(c)
Solution
The final weight is given by the average of all the weights updated in all the iterations. That is why
option (c) is correct.
Question 6
Statement
Which data point plays the most important role in predicting the outcome for an unseen data
point? Write the data point index as per matrix assuming indices start from 1.
Answer
3, No range is required
Solution
Since is written as , the data point which is associated with the
highest weight (coefficient) will have the most importance. The third data point is associated with
the highest coefficient ( ) therefore, the third data point has the highest importance.
Question 7
What will be the prediction for the data point ?
Answer
6.5 No range is required
Solution
The polynomial kernel of degree is given by
Since,
That is
Question 8
Statement
If be the solution to the optimization problem of the linear regression model, which of the
following expression is always correct?
Options
(a)
(b)
(c)
(d)
Answer
(b)
Solution
We know that is the projection of labels on the subspace spanned by the features that is
will be orthogonal to . For datails, check the lecture 5.4.
Question 9
Statement
The gradient descent with a constant learning rate of for a convex function starts oscillating
around the local minima. What should be the ideal response in this case?
Options
(a)
(b)
Answer
(b)
Solution
One possible reason of oscillation is that the weight vector jumps the local minima due to greater
step size . That is if we decrease the value of , the weight vector may not jump the local
minima and the GD will converge to that local minima.
Question 10
Statement
Is the following statement true or false?
Options
(a)
True
(b)
False
Answer
(a)
Solution
We make the assumption in the regression model that the error follows gaussian distribution with
zero mean and a constant variance.