0% found this document useful (0 votes)
6 views

MS_key-2

The document is a mid-semester exam for CS 771A: Intro to Machine Learning at IIT Kanpur, consisting of various questions related to decision trees, entropy, optimization problems, and convex functions. It includes specific tasks such as calculating entropy, justifying true/false statements, and deriving optimal points for given functions. The exam is structured with clear instructions and marks allocation for each question.

Uploaded by

Ashutosh Anand
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

MS_key-2

The document is a mid-semester exam for CS 771A: Intro to Machine Learning at IIT Kanpur, consisting of various questions related to decision trees, entropy, optimization problems, and convex functions. It includes specific tasks such as calculating entropy, justifying true/false statements, and deriving optimal points for given functions. The exam is structured with clear instructions and marks allocation for each question.

Uploaded by

Ashutosh Anand
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

CS 771A: Intro to Machine Learning, IIT Kanpur Midsem Exam (18 Jun 2023)

Name MELBO 40 marks


Roll No 230007 Dept. AWSM Page 1 of 4

Instructions:
1. This question paper contains 2 pages (4 sides of paper). Please verify.
2. Write your name, roll number, department in block letters with ink on each page.
3. Write your final answers neatly with a blue/black pen. Pencil marks may get smudged.
4. Don’t overwrite/scratch answers especially in MCQ – ambiguous cases will get 0 marks.

Q1 (Optimal DT) Melbo has a multiclass problem with three classes +,×, □. There are 16 datapoints
in total, each with a 2D feature vector (𝑥, 𝑦). 𝑥, 𝑦 can take value 0 or 1. The table below describes
each data point. All 16 points are at the root of a decision tree. Melbo wishes to learn a decision
stump based on the entropy reduction principle to split this node into two children. Help Melbo
finish this task. Hint: take logs to base 2 so no need for calculator . (8 x 0.5 = 4 marks)

SNo Class (𝑥, 𝑦) SNo Class (𝑥, 𝑦) SNo Class (𝑥, 𝑦) SNo Class (𝑥, 𝑦)
1 + (0,1) 5 + (0,1) 9 × (1,0) 13 □ (1,0)
2 + (1,1) 6 + (0,1) 10 × (1,0) 14 □ (0,0)
3 + (0,1) 7 + (1,1) 11 × (0,0) 15 □ (1,0)
4 + (1,1) 8 + (1,1) 12 × (0,0) 16 □ (0,0)

Class proportions are (1⁄2 , 1⁄4 , 1⁄8)


1 1 1 1 1 1
What is the entropy of the root node? 𝐻root = − log 2 − log 2 − log 2
2 2 4 4 4 4
= 1.5
What is the entropy of the two child nodes (give answers for the Class proportions Class proportions
two nodes separately) if the split is done using the 𝑥 feature remain the same remain the same
(𝑥 = 0 becomes left child, 𝑥 = 1 becomes right child)? i.e., 𝐻left = 1.5 i.e., 𝐻right = 1.5
1 1
𝐻𝑟𝑜𝑜𝑡 − (𝐻left + 𝐻right ) = 1.5 − (1.5 + 1.5)
What is the reduction in entropy (i.e., 𝐻root − 𝐻children ) if the 2 2

split is done using the 𝑥 feature as described above? =0

What is the entropy of the two child nodes (give answers for the Class proportions Class proportions
two nodes separately) if the split is done using the 𝑦 feature are (0, 1⁄2 , 1⁄2) are (1,0,0) i.e.,
i.e., 𝐻left = 1 𝐻right = 0
(𝑦 = 0 becomes left child, 𝑦 = 1 becomes right child)?
1 1
𝐻𝑟𝑜𝑜𝑡 − (𝐻left + 𝐻right ) = 1.5 − (1 + 0)
What is the reduction in entropy (i.e., 𝐻root − 𝐻children ) if the 2 2

split is done using the 𝑦 feature as described above? =1

To get the most entropy reduction, should we split using 𝑥 We should split using the 𝑦
feature or 𝑦 feature? feature
Q2. Write T or F for True/False in the box. Also, give justification. (4 x (1+3) = 16 marks)
Recall that ‖𝐯‖0 is the number of non-zero coordinates of the vector 𝐯. Then for
any two vectors 𝐚, 𝐛 ∈ ℝ3 , we always have ‖𝐚 + 𝐛‖0 ≤ ‖𝐚‖0 + ‖𝐛‖0 . If true, give
1 T
a brief proof, else give a counterexample of two 3D vectors that violate this
inequality. Show brief calculations in either case.
Page 2 of 4
Note that if 𝑎𝑖 = 0 = 𝑏𝑖 for some 𝑖 ∈ [𝑑], then 𝑎𝑖 + 𝑏𝑖 = 0. This means that if 𝑎𝑖 + 𝑏𝑖 ≠ 0, then
either 𝑎𝑖 or 𝑏𝑖 must be non-zero. Another way of saying this is to write
𝕀{𝑎𝑖 + 𝑏𝑖 ≠ 0} ≤ 𝕀{𝑎𝑖 ≠ 0} + 𝕀{𝑏𝑖 ≠ 0}
0 if 𝐴 is false
where 𝕀{𝐴} = { is the indicator function for some statement 𝐴. Taking sums over
1 if 𝐴 is true
𝑖 ∈ [𝑑] gives us ∑𝑖∈[𝑑]{𝑎𝑖 + 𝑏𝑖 ≠ 0} ≤ ∑𝑖∈[𝑑] 𝕀{𝑎𝑖 ≠ 0} + ∑𝑖∈[𝑑] 𝕀{𝑏𝑖 ≠ 0}. However, for any
vector 𝐯, we have ‖𝐯‖0 = ∑𝑖∈[𝑑] 𝕀{𝑣𝑖 ≠ 0}. This tells us ‖𝐚 + 𝐛‖0 ≤ ‖𝐚‖0 + ‖𝐛‖0.

The function 𝑁(𝑝) ≝ 𝑝 ln 𝑝 is convex over the interval 𝑝 ∈ (0, ∞). If true, give a
2 proof via chord definition or tangent definition or second derivative definition of T
convex functions. If false, give a counter example that violates any one definition.
To use the tangent definition, we have to show that for any 𝑝, 𝑞 ∈ (0, ∞), we always have
𝑁(𝑞) ≥ 𝑁(𝑝) + 𝑁 ′ (𝑝)(𝑞 − 𝑝)
Since 𝑁 ′ (𝑝) = (1 + ln 𝑝), the above requirement reduces to showing
𝑞 ln 𝑞 − 𝑞 > 𝑞 ln 𝑝 − 𝑝
Consider the function 𝑓(𝑥) ≝ 𝑞 ln 𝑥 − 𝑥. We have 𝑓 ′ (𝑥) = 𝑞⁄𝑥 − 1 and 𝑓 ′′ (𝑥) = −𝑞 ⁄𝑥 2 . Thus,
𝑓 ′ (𝑞) = 0 and 𝑓 ′′ (𝑞) < 0 i.e., 𝑞 is the global maxima for 𝑓(𝑥). This shows that 𝑓(𝑞) ≥ 𝑓(𝑝) or
in other words, 𝑞 ln 𝑞 − 𝑞 > 𝑞 ln 𝑝 − 𝑝, which proves that the function 𝑁(⋅) is convex.

The optimum for argmin‖𝐰 − 𝐰0 ‖22 + ‖𝑋𝐰 − 𝐲‖22 is always achieved at 𝐰0 . Justify
3 𝐰∈ℝ𝑑 F
𝑛×𝑑 𝑛 𝑑
by deriving the optimum. 𝑋 ∈ ℝ , 𝐲 ∈ ℝ , 𝐰0 ∈ ℝ are all constants.
As this is an unconstrained problem with a differentiable objective, first order optimality tells
us that the gradient must vanish at the optimum. This means (𝐰 − 𝐰0 ) + 𝑋 ⊤ (𝑋𝐰 − 𝐲) = 𝟎
i.e., 𝐰 = (𝑋 ⊤ 𝑋 + 𝐼)−1 (𝐰0 + 𝑋 ⊤ 𝐲). This means that the optimum is not always 𝐰0 .

For any two vectors 𝐮, 𝐯 ∈ ℝ3 , we always have 𝐮⊤ 𝐯 ≤ (‖𝐮‖22 + ‖𝐯‖22 )⁄2. If true,
4 T
give a brief proof, else give a counter example and calculations with two 3D vectors.
By Cauchy-Schwartz inequality, we have 𝐮⊤ 𝐯 ≤ ‖𝐮‖2 ⋅ ‖𝐯‖2 . However, we always have
‖𝐮‖2 ⋅ ‖𝐯‖2 ≤ (‖𝐮‖22 + ‖𝐯‖22 )⁄2
since (‖𝐮‖2 − ‖𝐯‖2 )2 ≥ 0. This completes the proof.
CS 771A: Intro to Machine Learning, IIT Kanpur Midsem Exam (18 Jun 2023)
Name MELBO 40 marks
Roll No 230007 Dept. AWSM Page 3 of 4

Q3 (Absolute Tilt) Consider the optimization problem min 𝑓(𝑥) with objective 𝑓: ℝ → ℝ defined
𝑥∈ℝ
as 𝑓(𝑥) ≝ |𝑥| + 𝑎 ⋅ 𝑥 where 𝑎 ∈ ℝ is a constant (maybe pos/neg/zero). Find the point 𝑥 ∗ at which
the optimum is achieved and 𝑓(𝑥 ∗ ). Note: 𝑥 ∗ and 𝑓(𝑥 ∗ ) depend on 𝑎. Both 𝑥 ∗ , 𝑓(𝑥 ∗ ) can be ∞ or
−∞ for certain cases. You must tell us for each possible case, where is the optimum achieved i.e.,
𝑥 ∗ and what is 𝑓(𝑥 ∗ ). E.g., you might say that case 1 is 𝑎 < 1, in which case we get 𝑓(𝑥 ∗ ) = 1 at
𝑥 ∗ = 0.5, and case 2 is 𝑎 ≥ 1, in which case we get 𝑓(𝑥 ∗ ) = −1 at 𝑥 ∗ = ∞. You may use at most
3 cases to describe your solution. If you don’t need those many cases, leave cases blank. Give brief
derivations. Hint: you should not have to derive the dual to solve this problem. (8 marks)
Case Case Condition (write condition Point 𝒙∗ where opt. is Optimal objective value
No. such as 𝒂 < 𝟏 or 𝒂 = 𝟏 etc). reached for this case 𝒇(𝒙∗ ) for this case
1
|𝑎| ≤ 1 0 0
2
|𝑎| > 1 Either +∞ or −∞ −∞
3

Give brief derivation below.


Note that the first term |𝑥| takes only positive values whereas the second term 𝑎 ⋅ 𝑥 can take
negative values and reduce the objective value.
Notice that if |𝑎| ≤ 1, the first term i.e., |𝑥| dominates since 𝑎 ⋅ 𝑥 ≤ |𝑎 ⋅ 𝑥| = |𝑎| ⋅ |𝑥| ≤ |𝑥| i.e.,
|𝑥| + 𝑎 ⋅ 𝑥 ≥ |𝑥| − |𝑥| ≥ 0. In this case, the smallest objective value is 0 which is indeed
achieved at 𝑥 ∗ = 0.
However, if |𝑎| > 1, then the term 𝑎 ⋅ 𝑥 dominates and we can push objective value to −∞. To
see this, consider 𝑥 = −𝑀 ⋅ sign(𝑎) for some real 𝑀 ≥ 0. We have 𝑓(𝑥) = 𝑀 − |𝑎| ⋅ 𝑀 < 0
since |𝑎| > 1. Taking 𝑀 → +∞ tells us that lim 𝑓(−𝑀 ⋅ sign(𝑎)) = −∞ which completes the
𝑀→∞
argument.

Q4. (Parallel Classifier) Create a feature map 𝜙: ℝ2 → ℝ𝐷 for


some 𝐷 > 0 so that for any 𝐳 = (𝑥, 𝑦) ∈ ℝ2 , sign(𝟏⊤ 𝜙(𝐳)) takes
value +1 if 𝐳 is in the dark cross-hatched region and −1 if 𝐳 is in
the light dotted region (see fig). E.g., (0,0) is labelled −1 while the
points (2,5) and (−6,1) are both labelled +1. The lines in the
figure are 𝑥 + 𝑦 = 4 and 𝑥 + 𝑦 = −4. We don’t care what values
are taken on points lying on these two lines (as these are the
decision boundaries). 𝟏 = (1,1, … ,1) ∈ ℝ𝐷 is the all-ones vector.
No need for derivation – just give the final map below. (5 marks)
Page 4 of 4
𝜙(𝑥, 𝑦) = (𝑥 2 , 2𝑥𝑦, 𝑦 2 , −16)
Intuition: The cross-hatched area is where 𝑥 + 𝑦 ≥ 4 or 𝑥 + 𝑦 ≤ −4 i.e., where (𝑥 + 𝑦)2 ≥ 16

Q5 (CM to the rescue) Consider the following problem where 𝐚, 𝐛, 𝐜 ∈ ℝ𝑑 and 𝜆 ∈ ℝ are constants
and 𝜆 > 0. Design a coordinate minimization algorithm (choose coordinates cyclically) to solve the
primal. Give brief calculations on how you will create a simplified unidimensional problem for a
chosen coordinate 𝑖 ∈ [𝑑] and then show how to get the optimal value of 𝑥𝑖 . (7 marks)
1
min𝐱∈ℝ𝑑 ‖𝐱‖22 + 𝐚⊤ 𝐱
2
s. t. 𝐛⊤ 𝐱 ≤ 𝜆
𝐜⊤𝐱 ≤ 𝜆
Suppose cyclic coordinate choice has presented a coordinate 𝑖 ∈ [𝑑]. We extract portions of the
optimization problem that depend on 𝑥𝑖 (and treat 𝑥𝑗 , 𝑗 ≠ 𝑖 as constants) to get the following
simplified 1D optimization problem (where 𝑚𝑖 ≝ 𝜆 − ∑𝑗≠𝑖 𝑏𝑗 𝑥𝑗 and 𝑛𝑖 ≝ 𝜆 − ∑𝑗≠𝑖 𝑐𝑗 𝑥𝑗 ).
1
min𝑥𝑖∈ℝ 𝑥2 + 𝑎𝑖 𝑥𝑖 1
2 𝑖 𝑥2
s. t.
𝑏𝑖 𝑥𝑖 ≤ 𝑚𝑖 ⇒ min𝑥𝑖 ∈ℝ
s. t.
2 𝑖
+ 𝑎𝑖 𝑥𝑖
𝑥𝑖 ∈ [𝑙𝑖 , 𝑢𝑖 ]
𝑐𝑖 𝑥𝑖 ≤ 𝑛𝑖
Now we clean up the constraints to get a single box constraint as shown below
𝑏𝑖 > 0 𝑏𝑖 > 0 𝑏𝑖 < 0 𝑏𝑖 < 0
Case
𝑐𝑖 > 0 𝑐𝑖 < 0 𝑐𝑖 > 0 𝑐𝑖 < 0
𝑙𝑖 −∞ 𝑛𝑖 𝑚𝑖 𝑚𝑖 𝑛𝑖
max { , }
𝑐𝑖 𝑏𝑖 𝑏𝑖 𝑐𝑖
𝑢𝑖 𝑚𝑖 𝑛𝑖 𝑚𝑖 𝑛𝑖 +∞
min { , }
𝑏𝑖 𝑐𝑖 𝑏𝑖 𝑐𝑖
If either 𝑏𝑖 or 𝑐𝑖 is 0, that constraint is ignored since it no longer involves 𝑥𝑖 . Having converted the
pair of constraints into a single box constraint, we can now apply the QUIN trick to solve this
problem and obtain the optimal value 𝑥𝑖∗ in two simple steps:
1
1. Find the unconstrained minimum for 𝑥𝑖2 + 𝑎𝑖 𝑥𝑖 which turns out to be 𝑧𝑖 ≝ −𝑎𝑖
2
2. If 𝑧𝑖 ∈ [𝑙𝑖 , 𝑢𝑖 ], then 𝑥𝑖∗ = 𝑧𝑖 else if 𝑧𝑖 < 𝑙𝑖 , 𝑥𝑖∗ = 𝑙𝑖 else 𝑥𝑖∗ = 𝑢𝑖

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy