0% found this document useful (0 votes)
6 views

Mid Sem Exam

This document contains a mid-semester exam for a foundations of data science course. The exam contains 6 multiple choice questions covering topics like linear regression, logistic regression, KNN classification, beta distributions, and polynomial regression. Students are instructed to show work and upload all files to Google Classroom.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Mid Sem Exam

This document contains a mid-semester exam for a foundations of data science course. The exam contains 6 multiple choice questions covering topics like linear regression, logistic regression, KNN classification, beta distributions, and polynomial regression. Students are instructed to show work and upload all files to Google Classroom.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

MA515 - Foundations of Data Science

(August - December 2021)


September 29, 2021

MA515 Mid Sem Exam 30 Marks

Question Points Score


1 6
2 8
3 4
4 5
5 4
6 3
Total: 30

Instructions

• This question paper contains 3 pages. (CHECK)


• Write all the necessary steps on the paper.
• The intermediate calculations can be done using calculator, Python or excel.
• For question number 1, 2 and 6, if the questions are done using excel the excel sheets
must be uploaded along with the important details on paper.
• Upload all the files on Google Classroom.
• If two or more students uploaded the duplicate sheets on Google classroom. All will
get zero marks.

1
1. (6 points) Consider the following data where x represents the independent variable and
y represents the dependent variable.

Answer the following questions in relation to the linear regression model y = β0 +β1 x+.

(a) Find the estimates of the simple linear regression coefficients. [2 Marks]
(b) Test the hypothesis for regression coefficients β0 and β1 . For which parameter the
null hypothesis is rejected and at what significance level ? [2 Marks]
(c) Find the R2 and sum of squared residual errors. [1 Mark]
(d) Estimate the error term variance σ 2 . [1 Mark]

2. (8 points) Consider the given data, namely “LDA Data”, where age and estimated
salary are independent variables (or features) and purchased is dependent variable (or
label). Assuming the LDA model, estimate the following:

(a) The priors π1 and π2 . [1 Mark]


(b) The mean vectors µ1 and µ2 . [1 Mark]
(c) Estimate the variance covariance matrix Σ. [2 Marks]
(d) Find the decision boundary of the LDA classifier. [3 Marks]
(e) For the person with age 30 years and estimated salary 82,000, predict the category
i.e. whether the person purchased or not. [1 Mark]

2021 Page 2 of 3 September 29, 2021


3. (4 points) Suppose we have a logistic regression model with variable X1 = hours studied,
X2 = undergrad CGPA and Y = receive an A. The estimated coefficients are β̂0 = −6,
β̂1 = 0.05 and β̂2 = 1.

(a) Estimate the probability that a student who studies for 40 h and has an undergrad
CGPA of 3.5 gets an A in the class. [2 Mark]
(b) How many hours would the student in part (a) need to study to have a 50% chance
of getting an A in the class? [2 Marks]

4. (5 points) For the given data using KNN algorithm for K = 3 (i.e. number of neighbors
are 3), predict the value of (29, 10, 7). Mention all the necessary steps. Use Euclidean
distance for identifying the neighbors. [5 Marks]

5. (4 points) A random variable Θ is said to follow a beta distribution with parameters α


and β if its pdf is given by

θα−1 (1 − θ)β−1 Γ(α)Γ(β)


, 0 < θ < 1, where B(α, β) = .
B(α, β) Γ(α + β)

Further, Γ(α) = (α − 1)! if α is an integer. We denote Θ ∼ B(α, β). Suppose a coin is


tossed 5 times with the following outcomes {H, T, H, T, H}. Let the probability of heads
denoted by Θ has the B(3, 4) prior. Find the estimate of the probability of heads using
Maximum a Posteriori (MAP) estimation. [4 Marks]

6. (3 points) For the Polynomial Regression data fit a degree 4 polynomial regression
model between Level (independent variable) and Salary (dependent variable). Predict
the expected Salary at Level 11. [3 Marks]

2021 Page 3 of 3 September 29, 2021

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy