0% found this document useful (0 votes)
22 views

Week 5

Uploaded by

Subham Patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Week 5

Uploaded by

Subham Patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Graded Assignment

Note:

1. In the following assignment, denotes the data matrix of shape where and are
the number of features and samples, respectively.
2. denotes the sample and denotes the corresponding label.
3. denotes the weight vector (parameter) in the linear regression model.

Question 1
Statement
An ML engineer comes up with two different models for the same dataset. The performances of
these two models on the training dataset and test dataset are as follows:

Model 1: Training error = ; Test error =


Model 2: Training error = ; Test error =

Which model you would select?

Options
(a)

Model 1

(b)

Model 2

Answer
(a)

Solution
In model , the test error is very low compared to model even though the training error is high
in model . We choose model as it worked well on unseen data.

Question 2
Statement
Consider a model for a given -dimensional training data points and
corresponding labels as follows:

where is the average of all the labels. Which of the following error function will always give the
zero training error for the above model?
Options
(a)

(b)

(c)

(d)

Answer
(c)

Solution
The sum of squared error and absolute error will give zero error only if predicted values are the
same as actual values for all the examples.

But for option (3), we have

This error function will give zero error for the above model.

Common data for questions 3 and 4


Consider the following dataset with one feature :

label ( )

-1 5

0 7

1 6

Question 3
Statement
We want to fit a linear regression model of the form . Assume that the initial weight
vector is . What will be the weight after one iteration using the gradient descent algorithm
assuming the squared loss function? Assume the learning rate is .

Answer
No range is required

Solution
At iteration , we have

At , we have

Here

Put the values and we get

Question 4
Statement
If we stop the algorithm at the weight calculated in question 1, what will be the prediction for the
data point ?

Answer
8 No range is required

Solution
The model is given as

at ,

Question 5
Statement
Assume that denotes the updated weight after the iteration in the stochastic gradient
descent. At each step, a random sample of the data points is considered for weight update. What
will be the final weight after iterations?

Options
(a)

(b)

(c)

(d)

any of the

Answer
(c)

Solution
The final weight is given by the average of all the weights updated in all the iterations. That is why
option (c) is correct.

Common data for questions 6 and 7


Kernel regression with a polynomial kernel of degree two is applied on a data set . Let the
weight vector be

Question 6
Statement
Which data point plays the most important role in predicting the outcome for an unseen data
point? Write the data point index as per matrix assuming indices start from 1.

Answer
3, No range is required
Solution
Since is written as , the data point which is associated with the
highest weight (coefficient) will have the most importance. The third data point is associated with
the highest coefficient ( ) therefore, the third data point has the highest importance.

Question 7
What will be the prediction for the data point ?

Answer
6.5 No range is required

Solution
The polynomial kernel of degree is given by

The coefficient vector is given as .

The prediction for a point is given by

Since,

That is

Therefore the prediction will be

Question 8
Statement
If be the solution to the optimization problem of the linear regression model, which of the
following expression is always correct?

Options

(a)
(b)

(c)

(d)

Answer
(b)

Solution
We know that is the projection of labels on the subspace spanned by the features that is
will be orthogonal to . For datails, check the lecture 5.4.

Question 9

Statement
The gradient descent with a constant learning rate of for a convex function starts oscillating
around the local minima. What should be the ideal response in this case?

Options
(a)

Increase the value of

(b)

Decrease the value of

Answer
(b)

Solution
One possible reason of oscillation is that the weight vector jumps the local minima due to greater
step size . That is if we decrease the value of , the weight vector may not jump the local
minima and the GD will converge to that local minima.

Question 10
Statement
Is the following statement true or false?

Error in the linear regression model is assumed to have constant variance.

Options
(a)

True

(b)

False

Answer
(a)

Solution
We make the assumption in the regression model that the error follows gaussian distribution with
zero mean and a constant variance.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy