0% found this document useful (0 votes)

134 views

Midterm

This document provides information for a machine learning midterm exam, including that there are 11 questions worth a total of 100 points. The exam is open book and open notes but no electronic devices are allowed. While the exam will be challenging, grades will be curved to account for difficulty. Students are advised to work efficiently and good luck is wished for the exam.

Uploaded by

Eman Asem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

134 views

Midterm

Uploaded by

Eman Asem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

10-701 Introduction to Machine Learning

Midterm Exam
Instructors: Eric Xing, Ziv Bar-Joseph

17 November, 2015

There are 11 questions, for a total of 100 points.

This exam is open book, open notes, but no computers or other electronic devices.
This exam is challenging, but don’t worry because we will grade on a curve. Work efficiently.
Good luck!

Name:

Andrew ID:

Question Points Score

Basic Probability and MLE 10
Decision Trees 10
Naı̈ve Bayes & Logistic Regression 6
Deep Neural Networks 10
SVM 12
Bias-Variance Decomposition 14
Gaussian Mixture Model 6
Semi-Supervised learning 12
Learning Theory, PAC learning 10
Bayes Network 10
Total 100

1
1 Basic Probability and MLE - 10 points
1. You are trapped in a dark cave with three indistinguishable exits on the walls. One of the exits takes
you 3 hours to travel and takes you outside. One of the other exits takes 1 hour to travel and the other
takes 2 hours, but both drop you back in the original cave. You have no way of marking which exits
you have attempted. What is the expected time it takes for you to get outside?

2. Let X1 , · · · , Xn be iid data from a uniform distribution over the disc of radius θ in R2 . Thus, Xi ∈
R2 and (
1
2 if kxk ≤ θ
p(x; θ) = πθ
0 otherwise

Find the maximum likelihood estimate of θ.

2
2 Decision Trees - 10 points
1. The following figure presents the top two levels of a decision tree learned to predict the attractiveness
of a book. What should be the value of A if the decision tree was learned using the algorithm discussed
in class (you can either say ‘At most X’ or ‘At least X’ or ‘Equal to X’ where you should replace X
with a number based on your calculation), explain your answer?

2. We now focus on all samples assigned to the left side of the tree (i.e. those that are longer than 120
minutes). We know that we have a binary feature, ‘American director’ that after the ‘Action movie’
split provides a perfect split for the data (i.e. all samples on one side are ‘like’ and all those on the
other side ‘didn’t like’. Fill in the missing values in the picture below:

3
3 Naı̈ve Bayes & Logistic Regression - 6 points
1. In online learning, we can update the decision boundary of a classifier based on new data without
reprocessing the old data. Now for a new data point that is an outlier, which of the following classifiers
are likely to be effected more severely? NB, LR, SVM? Please give a one sentence explanation to your
answer.

2. Now to build a classifier on discrete features using small training data, one will need to consider the
scenario where some features have rare values that were never observed in the training data (e.g.,
the word ‘Buenos Aires’ does not appear in training for a text classification problem). To train a
generalizable classifier, do you want to use NB or LR, how will you augment the original formulation
of the classifier under a Bayesian or regularization setting?

3. Now to build a classifier on high-dimensional features using small training data, one will need to
consider the scenario where many features are just irrelevant noises. To train a generalizable classifier,
do you want to use NB or LR, how will you augment the original formulation of the classifier under a
Bayesian or regularization setting?

4
4 Deep Neural Networks - 10 points
In homework 3, we counted the model parameters of a convolutional neural network (CNN), which gives us
a sense how much memory a CNN will consume. Now we estimate the computation overhead of CNNs by
counting the FLOPs (floating point operations). For simplicity we only consider the forward pass.
Consider a convolutional layer C followed by a max pooling layer P . The input of layer C has 50 channels,
each of which is of size 12 × 12. Layer C has 20 filters, each of which is of size 4 × 4. The convolution padding
is 1 and the stride is 2. Layer P performs max pooling over each of the C’s output feature maps, with 3 × 3
local receptive fields, and stride 1.
Given x1 , x2 , · · · , xn all scalars, we assume:
• A scalar multiplication xi · xj accounts for one FLOP;
• A scalar addition xi + xj accounts for one FLOP;

• A max operation max{x1 , x2 , · · · , xn } accounts for n − 1 FLOPs.

All other operations (e.g., x1 = x2 ) do not account for FLOPs.
Quetions:
1. What is size of each of layer P ’s output feature map?

2. How many FLOPs layer C and P conduct in total during one forward pass?

5
5 SVM - 12 points
Recall that the soft-margin primal SVM problem is
1 T
Pn
min 2w w + C i=1 ξi
s.t. ξi ≥ 0, ∀i ∈ {1, · · · , n} (1)
(wT xi + b)yi ≥ 1 − ξi , ∀i ∈ {1, · · · , n}

We can get the kernel SVM by taking the dual of the primal problem and then replace the product of xTi xj
by k(xi , xj ) where k(·, ·) is the kernel function.
Figure 1 plots SVM decision boundaries resulting from using different kernels and/or different slack
penalties. In Figure 1, there are two classes of training data, with labels yi ∈ {−1, 1}, represented by circles
and squares respectively. The SOLID circles and squares represent the support vectors. Label each plot in
Figure 1 with the letter of the optimization problem below. You are NOT required to explain the reasons.

a) A soft-margin linear SVM with C = 0.1.

b) A soft-margin linear SVM with C = 10.
c) A hard-margin kernel SVM with K(u, v) = uT v + (uT v)2 .
d) A hard-margin kernel SVM with K(u, v) = exp(− 41 ku − vk22 ).

e) A hard-margin kernel SVM with K(u, v) = exp(−4ku − vk22 ).

f) None of the above.

Figure 1: Induced Decision Boundaries

6
6 Bias-Variance Decomposition - 14 points
1. To understand bias and variance, we will create a graphical visualization using a bulls-eye. Imagine
that the center of the target is our true model (a model that perfectly predicts the correct values).
As we move away from the bulls-eye, our predictions get worse and worse. Imagine we can repeat
our entire model building process to get a number of separate hits on the target. Each hit represents
an individual realization of our model, given the chance variability in the training data we gather.
Sometimes we will get a good distribution of training data so we predict very well and we are close
to the bulls-eye, while sometimes our training data might be full of outliers or non-standard values
resulting in poorer predictions. Consider these four different realizations resulting from a scatter of hits
on the target. Characterize the bias and variance of the estimates of the following models on the data
with respect to the true model as low or high by circling the appropriate entries below each diagram.

7
2. Explain what effect will the following operations have on the bias and variance of your model. Fill in
one of ‘increases’, ‘decreases’ or ‘no change’ in each of the cells:

Bias Variance
Regularizing the weights in a lin-
ear/logistic regression model
Increasing k in k-nearest neigh-
bor models
Pruning a decision tree (to a cer-
tain depth for example)
Increasing the number of hidden
units in an artificial neural net-
work
Using dropout to train a deep
neural network
Removing all the non-support
vectors in SVM

8
7 Gaussian Mixture Model - 6 points
Consider a mixture distribution given by
K
X
p(x) = πk p(x|zk ). (2)
k=1

Suppose that we partition the vector x into two parts as x = (x1 , x2 ), then the conditional distribution
p(x2 |x1 ) is also a mixture distribution:
K
X
p(x2 |x1 ) = k p(x2 |x1 , zk ). (3)
k=1

Give the expression of k .

9
8 Semi-Supervised learning - 12 points
1. We would like to use semi-supervised learning to classify text documents. We are using the ‘bag of
words’ representation discussed in class with binary indicators for the presence of 10000 words in each
document (so each document is represented by a binary vector of length 10000).
For the following classifiers and learning methods discussed in class, state whether the method can be
applied to improve the classifier (Yes) or not (No) and provide a brief explanation.

(a) ( Yes / No ) Naı̈ve Bayes using EM

Brief explanation:

(b) ( Yes / No ) Naı̈ve Bayes using co-trainin

Brief explanation:

(c) ( Yes / No ) Naı̈ve Bayes using model complexity selection

Brief explanation:

(d) ( Yes / No ) Decision trees using re-weighting

Brief explanation:

(e) ( Yes / No ) Decision tree using EM

Brief explanation:

(f) ( Yes / No ) Decision trees using model complexity selection

Brief explanation:

2. Unlike all other classifiers we discussed, KNN does not have any parameters to tune. For each of the
following semi-supervised methods state whether a KNN classifier (where K is fixed and not allowed
to change) learned for some data using labeled and unlabeled data could be different from a KNN
classifier learned using only the labeled data in this dataset (no need to explain).

(a) Reweighting: Same / Different

(b) Co-training: Same / Different

(c) Model complexity selection: Same / Different

10
9 Learning Theory, PAC learning - 10 points
In class we learned the following agnostic PAC learning bound:
Theorem 1. Let H be a finite concept class. Let D be an arbitrary, fixed unknown distribution over X. For
any , δ > 0, if we draw a sample S from D of size

1 2
m ≥ 2 ln |H| + ln , (4)
2 δ

then with probability at least 1 − δ, all hypothesis h ∈ H have |errD (h) − errS (h)| ≤ .
Our friend Yan is trying to solve a learning problem that fits in the assumptions above.

1. Yan tried a training set of 100 examples and observed some gap between training error and test error,
so he wanted to reduce the overfitting to half. How many examples should Yan use, according to the
above PAC bound?

2. Yan took your suggestion and ran his algorithm again, however the overfitting did not halve. Do you
think it is possible? Explain briefly.

11
10 Bayes Networks - 10 points

Figure 2: A graphical model

Consider the Bayesian network in Figure 2. We use (X ⊥⊥ Y |Z) to denote the fact that X and Y are
independent given Z. Answer the following questions:
1. Are there any pairs of point that are independent? If your answer is yes, please list out all such pairs.

2. Does (B ⊥
⊥ C|A, D) hold? briefly explain.

3. Does (B ⊥
⊥ F |A, D) hold? briefly explain.

4. Assuming that there are d = 10 values that each of these variables can take (say 1 to 10), how many
parameters do we need to model the full joint distribution without using the kowldge incoded in the
graph (i.e. no independence / conditional independence assumptions)? How many parameters do we
need for the Bayesian network for such setting? (you do not need to provide the exact number, a close
approximation or a tight upper / lower bound will do).

ECS7020P Sample Paper Solutions
No ratings yet
ECS7020P Sample Paper Solutions
6 pages
Midterm Solutions For Machine Learning
No ratings yet
Midterm Solutions For Machine Learning
13 pages
1 Analytical Part (3 Percent Grade) : + + + 1 N I: y +1 I 1 N I: y 1 I
No ratings yet
1 Analytical Part (3 Percent Grade) : + + + 1 N I: y +1 I 1 N I: y 1 I
5 pages
10-701 Midterm Exam Solutions, Spring 2007
No ratings yet
10-701 Midterm Exam Solutions, Spring 2007
20 pages
Giant Pile ML Problems
No ratings yet
Giant Pile ML Problems
56 pages
CSCI_5521_Spring_2025_Final_Exam
No ratings yet
CSCI_5521_Spring_2025_Final_Exam
8 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
10 pages
mll
No ratings yet
mll
2 pages
Machine Learning PYQ 2022 Ans
No ratings yet
Machine Learning PYQ 2022 Ans
17 pages
Midterm 2006
No ratings yet
Midterm 2006
11 pages
07au Midterm
No ratings yet
07au Midterm
17 pages
Exam Spring 10
No ratings yet
Exam Spring 10
10 pages
Machine 2021 Jul-Dec
No ratings yet
Machine 2021 Jul-Dec
46 pages
Midterm Sol
No ratings yet
Midterm Sol
23 pages
Huawei H12-211 PRACTICE EXAM HCNA-HNTD H
No ratings yet
Huawei H12-211 PRACTICE EXAM HCNA-HNTD H
117 pages
12s 701 Final
No ratings yet
12s 701 Final
17 pages
ML 19.03 Sidenotes
No ratings yet
ML 19.03 Sidenotes
30 pages
Homework 2: SVM, Kernel Methods, Ensemble Learning, Learning Theory
No ratings yet
Homework 2: SVM, Kernel Methods, Ensemble Learning, Learning Theory
12 pages
10-701 Midterm Exam, Fall 2007
No ratings yet
10-701 Midterm Exam, Fall 2007
25 pages
Qualification Exam Question: 1 Statistical Models and Methods
No ratings yet
Qualification Exam Question: 1 Statistical Models and Methods
4 pages
midterm2008f_sol
No ratings yet
midterm2008f_sol
12 pages
2019-20-I ES Key
No ratings yet
2019-20-I ES Key
4 pages
Assignment 2 Specification
No ratings yet
Assignment 2 Specification
3 pages
ml
No ratings yet
ml
3 pages
Midterm 2002
No ratings yet
Midterm 2002
10 pages
Questions and Solutions On Linear Regression
No ratings yet
Questions and Solutions On Linear Regression
5 pages
SVM Exam Paper For ABA Course To Be Returned With Answers Excel Exercise (25 Marks)
No ratings yet
SVM Exam Paper For ABA Course To Be Returned With Answers Excel Exercise (25 Marks)
12 pages
ML End Sem Nov2024 Paper
No ratings yet
ML End Sem Nov2024 Paper
4 pages
Final 2019
No ratings yet
Final 2019
15 pages
SVM Assignment ABA Course To Be Returned With Your Answers
No ratings yet
SVM Assignment ABA Course To Be Returned With Your Answers
10 pages
SEM7_ML_PB_Batch2021 (1) (1)
No ratings yet
SEM7_ML_PB_Batch2021 (1) (1)
41 pages
MidA-F21
No ratings yet
MidA-F21
8 pages
MachineLearning MidTerm UMT Spring 2021
100% (1)
MachineLearning MidTerm UMT Spring 2021
12 pages
Machine Learning Full Question Bank
No ratings yet
Machine Learning Full Question Bank
14 pages
ml-20240315
No ratings yet
ml-20240315
8 pages
Kernel PCA
No ratings yet
Kernel PCA
13 pages
CS771 IITK EndSem Solutions
100% (1)
CS771 IITK EndSem Solutions
8 pages
2023 ML Assignment
No ratings yet
2023 ML Assignment
57 pages
IML-IITKGP - Assignment 1 Solution
No ratings yet
IML-IITKGP - Assignment 1 Solution
7 pages
CMPE 442 Introduction To Machine Learning: Support Vector Machines
No ratings yet
CMPE 442 Introduction To Machine Learning: Support Vector Machines
64 pages
הרצאה-Classifiers and Decision Trees
No ratings yet
הרצאה-Classifiers and Decision Trees
119 pages
Machine Learning Introduction
No ratings yet
Machine Learning Introduction
56 pages
Lecture 15 - Recap and Midterm Review
No ratings yet
Lecture 15 - Recap and Midterm Review
37 pages
Midterm Practice Questions
No ratings yet
Midterm Practice Questions
14 pages
Final2019 Solutions
No ratings yet
Final2019 Solutions
23 pages
Sem 5 External
No ratings yet
Sem 5 External
12 pages
Solution of Final Exam: 10-701/15-781 Machine Learning: Fall 2004 Dec. 12th 2004
No ratings yet
Solution of Final Exam: 10-701/15-781 Machine Learning: Fall 2004 Dec. 12th 2004
27 pages
MLFA Spring 2024
No ratings yet
MLFA Spring 2024
11 pages
2022 ML Assignments
No ratings yet
2022 ML Assignments
45 pages
MACHINE LEARNING_INFO 4122_2023
No ratings yet
MACHINE LEARNING_INFO 4122_2023
4 pages
IML-IITKGP - Assignment 5 Solution
No ratings yet
IML-IITKGP - Assignment 5 Solution
7 pages
SEM7_AML_PB_BATCH2021 (3)
No ratings yet
SEM7_AML_PB_BATCH2021 (3)
46 pages
SMAI End 2015 S
No ratings yet
SMAI End 2015 S
4 pages
Question Bank
No ratings yet
Question Bank
5 pages
AI-unit-4
No ratings yet
AI-unit-4
91 pages
ML 1 2 3
No ratings yet
ML 1 2 3
54 pages
Machine Learning Cheatsheet Compiled and Curated by Robins Yadav
No ratings yet
Machine Learning Cheatsheet Compiled and Curated by Robins Yadav
14 pages
Flood Fill: Flood Fill: Exploring Computer Vision's Dynamic Terrain
From Everand
Flood Fill: Flood Fill: Exploring Computer Vision's Dynamic Terrain
Fouad Sabry
No ratings yet
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
CSCI 5832 Natural Language Processing: Jim Martin
No ratings yet
CSCI 5832 Natural Language Processing: Jim Martin
46 pages
CSCI 5832 Natural Language Processing: Jim Martin
No ratings yet
CSCI 5832 Natural Language Processing: Jim Martin
47 pages
Word Classes and Part-of-Speech (POS) Tagging: CS4705 Julia Hirschberg
No ratings yet
Word Classes and Part-of-Speech (POS) Tagging: CS4705 Julia Hirschberg
40 pages
Test bank حاسب
No ratings yet
Test bank حاسب
2 pages
Hidden Markov Models: Julia Hirschberg CS4705
No ratings yet
Hidden Markov Models: Julia Hirschberg CS4705
37 pages
CPIT 110 - Course Schedule 2021-Term1 - v3
No ratings yet
CPIT 110 - Course Schedule 2021-Term1 - v3
4 pages
Computer Skills - CPIT 100
No ratings yet
Computer Skills - CPIT 100
4 pages
Mksaad OSAC OpenSourceArabicCorpora EECS10 Rev9
No ratings yet
Mksaad OSAC OpenSourceArabicCorpora EECS10 Rev9
7 pages
Irjet V9i7118
No ratings yet
Irjet V9i7118
8 pages
Bharti Zain Ppt-Final
50% (2)
Bharti Zain Ppt-Final
17 pages
workshop in Communication and Active Listening
No ratings yet
workshop in Communication and Active Listening
13 pages
Introducing Sociology: Suparna Majumdar Kar
No ratings yet
Introducing Sociology: Suparna Majumdar Kar
48 pages
Longterm - Supplyside - Macro - Models - of - Potential - Growth
No ratings yet
Longterm - Supplyside - Macro - Models - of - Potential - Growth
52 pages
Omin Society Crypto INDEX
No ratings yet
Omin Society Crypto INDEX
3 pages
Industrial Training Format 8th Semester
No ratings yet
Industrial Training Format 8th Semester
5 pages
Chap007 Finding and Using Negotiation Po
No ratings yet
Chap007 Finding and Using Negotiation Po
6 pages
Wednesday, October 25, 2023, 07:04 AM
No ratings yet
Wednesday, October 25, 2023, 07:04 AM
16 pages
Introduction To Damped Experiment
No ratings yet
Introduction To Damped Experiment
2 pages
Simple Stress and Strain Relationship: Stress and Strain in Two Dimensions, Principal Stresses, Stress Transformation, Mohr's Circle
No ratings yet
Simple Stress and Strain Relationship: Stress and Strain in Two Dimensions, Principal Stresses, Stress Transformation, Mohr's Circle
67 pages
Peka Physics - Simple Pendulum (Answer)
100% (5)
Peka Physics - Simple Pendulum (Answer)
5 pages
Google Drive
No ratings yet
Google Drive
8 pages
Polycab
No ratings yet
Polycab
4 pages
1 Tan Your Own Hide PDF
0% (4)
1 Tan Your Own Hide PDF
13 pages
A Conceptual Model of Software Testing
No ratings yet
A Conceptual Model of Software Testing
12 pages
Week 1-2 - Science Framwework For Phil Education
No ratings yet
Week 1-2 - Science Framwework For Phil Education
20 pages
Mitsubishi Tractor: Maintenance Manual
No ratings yet
Mitsubishi Tractor: Maintenance Manual
192 pages
ViroGene CMV QPCR Kit 1.0
No ratings yet
ViroGene CMV QPCR Kit 1.0
10 pages
IV-6 Assigmt - Module 5 Abuse and Neglect - GROUP 2
No ratings yet
IV-6 Assigmt - Module 5 Abuse and Neglect - GROUP 2
10 pages
English PP2 QUESTION PAPER
No ratings yet
English PP2 QUESTION PAPER
11 pages
Polymers 14 01626 v3
No ratings yet
Polymers 14 01626 v3
44 pages
PDSA Cheat Sheet: What Is A PDSA?
No ratings yet
PDSA Cheat Sheet: What Is A PDSA?
2 pages
Test Initial A 11 A - A 12 A Engleza R Ii
No ratings yet
Test Initial A 11 A - A 12 A Engleza R Ii
2 pages
Manufaktur Refraktori
No ratings yet
Manufaktur Refraktori
19 pages
Protecting Victims of Violent Patients While Protecting Confidentiality
No ratings yet
Protecting Victims of Violent Patients While Protecting Confidentiality
7 pages
Repair Retrofit And Inspection Of Building Exterior Wall Systems Astm Special Technical Publication 1493 Paul G Johnson instant download
No ratings yet
Repair Retrofit And Inspection Of Building Exterior Wall Systems Astm Special Technical Publication 1493 Paul G Johnson instant download
80 pages
065-E-Ga-Nmc-01 R0 (P7)
No ratings yet
065-E-Ga-Nmc-01 R0 (P7)
1 page
SOFAR 100-125KTLX-G4_Datasheet_2024-05-22_V6.1_en-INT
No ratings yet
SOFAR 100-125KTLX-G4_Datasheet_2024-05-22_V6.1_en-INT
2 pages
ICT Course Outline
No ratings yet
ICT Course Outline
4 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Midterm

Uploaded by

Midterm

Uploaded by

10-701 Introduction to Machine Learning

There are 11 questions, for a total of 100 points.

Question Points Score

Find the maximum likelihood estimate of θ.

• A max operation max{x1 , x2 , · · · , xn } accounts for n − 1 FLOPs.

a) A soft-margin linear SVM with C = 0.1.

e) A hard-margin kernel SVM with K(u, v) = exp(−4ku − vk22 ).

Figure 1: Induced Decision Boundaries

Give the expression of k .

(a) ( Yes / No ) Naı̈ve Bayes using EM

(b) ( Yes / No ) Naı̈ve Bayes using co-trainin

(c) ( Yes / No ) Naı̈ve Bayes using model complexity selection

(d) ( Yes / No ) Decision trees using re-weighting

(e) ( Yes / No ) Decision tree using EM

(f) ( Yes / No ) Decision trees using model complexity selection

(a) Reweighting: Same / Different

(b) Co-training: Same / Different

(c) Model complexity selection: Same / Different

Figure 2: A graphical model

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Midterm

Uploaded by

Midterm

Uploaded by

10-701 Introduction to Machine Learning

There are 11 questions, for a total of 100 points.

Question Points Score

Find the maximum likelihood estimate of θ.

• A max operation max{x1 , x2 , · · · , xn } accounts for n − 1 FLOPs.

a) A soft-margin linear SVM with C = 0.1.

e) A hard-margin kernel SVM with K(u, v) = exp(−4ku − vk22 ).

Figure 1: Induced Decision Boundaries

Give the expression of k .

(a) ( Yes / No ) Naı̈ve Bayes using EM

(b) ( Yes / No ) Naı̈ve Bayes using co-trainin

(c) ( Yes / No ) Naı̈ve Bayes using model complexity selection

(d) ( Yes / No ) Decision trees using re-weighting

(e) ( Yes / No ) Decision tree using EM

(f) ( Yes / No ) Decision trees using model complexity selection

(a) Reweighting: Same / Different

(b) Co-training: Same / Different

(c) Model complexity selection: Same / Different

Figure 2: A graphical model

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Give the expression of k .