ML PG Assignment 3
ML PG Assignment 3
ML PG Assignment 3
Monsoon 2022
Release Time: October 08, 2022; 5:00 pm Submission Time: October 30, 2022 11:59 pm
Instructions
• This assignment should be attempted individually. All questions are compulsory.
• Theory.pdf : For conceptual questions, either a typed or hand-written .pdf file of solutions is acceptable.
• Code Files: For programming questions, the use of any one programming language throughout this assign-
ment is acceptable. For python, either .ipynb or .py file is acceptable. For other programming languages,
submit the files accordingly. Make sure the submission is self-complete & replicable i.e., you are able to
reproduce your results with the submitted files only.
• Regarding Coding Exercises: You can use modules from sklearn or statsmodels or any similar library for
writing the code. Use random seed wherever applicable to retain reproducability.
• Report.pdf : Create a .pdf report of programming questions that contains your applied approach, pre-
processing, assumptions, analysis, visualizations, etc.. Anything not in the report will not be evaluated.
Alternatively, a well-documented .ipynb file with answers to all the questions may be submitted as a part of
both code file and report.
• File Submission: Submit a .zip named A1 RollNo.zip (e.g., A1 PhD22100.zip) file containing Theory.pdf,
Report.pdf, and Code files.
• Submission Policy: Turn-in your submission as early as possible to avoid late submissions. Expect
No Extensions. Besides, submission within 10 min of the passing of the deadline will incur 20% penalty
in the total marks of this assignment. Beyond this, late submissions will not be evaluated and hence will be
awarded zero marks.
• Resource Constraints: In any question, if their is a resource constraint in terms of computational capa-
bilities at your end, you are allowed to sub-sample the data (must be stratified). Make sure to exclusively
mention the same in the report with proper details about the platform that didn’t work for you.
• Clarifications: Symbols have their usual meaning. Assume the missing information. You are free to use any
libraries and need not do anything from scratch unless specifically stated otherwise. Use Google Classroom
for any queries. In order to keep it fair for all, no email queries will be entertained. You may attend office/TA
hours for personal resolutions. No queries will be answered in Google Classroom comments when 12 hours or
less are left for the submission deadline.
• Compliance: The questions in this assignment are structured to meet the Course Outcomes CO2, CO3, and
CO4, as described in the course directory.
• Institute Plagiarism Policy Applicable. Both programming and theoretical questions will be subjected
to strict plagiarism check.
• There could be multiple ways to approach a question. Please explain your approach briefly in the report.
(a) Maximize f (x, y) = xy subject to x + y 2 ≤ 2 and x, y > 0 using KKT conditions (7 points)
(b) True or False (with justification) Given a linearly separable data, the margin of the decision boundary
produced by SVM will always be greater than or equal to the margin of the decision boundary produced
by any other hyperplane that perfectly classifies that data (hyperplane) for the given training dataset.
(3 points)
When training SVMs, we begin by computing the kernel matrix K, over our training data {x1 , ..., xn }. The
kernel matrix, defined as Ki,i′ = K(xi , xi′ ), expresses the kernel function applied between all pairs of training
points.
The Mercer’s theorem tells us that any function K that yields a positive semi-definite kernel matrix forms a
valid kernel, i.e. corresponds to a matrix of dot-products under some basis ϕ. Therefore, instead of using an
explicit basis, we can build kernel functions directly that fulfill this property. A particularly nice benefit of
this theorem is that it allows us to build more expressive kernels by composition.
In this problem, you are tasked with using Mercer’s theorem and the definition of a kernel matrix to prove
that the following compositions are valid kernels, assuming K (1) and K (2) are valid kernels.
Recall that a positive semi-definite matrix K requires z T Kz ≥ 0, ∀z ∈ Rn
′ ′
(a) K(x, x ) = cK (1) (x, x ); c > 0 (4 points)
′ ′ ′
(1) (2)
(b) K(x, x ) = K (x, x ) + K (x, x ) (4 points)
′ ′ ′
(1) m
(c) K(x, x ) = f (x)K (x, x )f (x ) ; Where f is any function from R to R (4 points)
′ ′ ′
(1) (2)
(d) K(x, x ) = K (x, x )K (x, x ) (4 points)
′ ′
Hint: Use the property that for any ϕ(x), K(x, x ) = ϕ(x)T ϕ(x ) forms a positive semi-definite kernel matrix.
5. Theory Question (19 points)
Consider the following dataset that has 3 points in 1 D.
(a) Are the classes {+, −} linearly separable? (1 point)
√ 2 T
(b) Map each point to 3-D using new feature vectors ϕ(x) = [1, 2x, x ] . Are the classes now linearly
separable? If yes, find a separating hyperplane. (3 points)
(c) Define a class variable yi ∈ {1, +1} which denotes the class of xi and let w = (w1, w2, w3)T .The max-
margin SVM classifier solves the following problem.
1
min ||w||22 s.t.
w,b 2
yi (wT ϕ(xi ) + b) ≥ 1, i = 1, 2, 3
Using the method of Lagrange multipliers show that the solution is ŵ = (0, 0, −2)T , b = 1 and the
1
margin is ||ŵ||2
. (8 points)
(d) Show that the solution remains the same if the constraints are changed to
yi (wT ϕ(xi ) + b) ≥ ρ, i = 1, 2, 3