HW 23 P 4 Rie

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

—————————————————————————————————————

Learning From Data


Caltech - edX CS1156x
https://www.edx.org/course/learning-data-introductory-machine-caltechx-cs1156x

Fall 2016
—————————————————————————————————————

Homework # 2

Due Monday, October 10, 2016, at 22:00 GMT/UTC

All questions have multiple-choice answers ([a], [b], [c], ...). You can collaborate
with others, but do not discuss the selected or excluded choices in the answers. You
can consult books and notes, but not other people’s solutions. Your solutions should
be based on your own work. Definitions and notation follow the lectures.

Note about the homework

• The goal of the homework is to facilitate a deeper understanding of the course


material. The questions are not designed to be puzzles with catchy answers.
They are meant to make you roll up your sleeves, face uncertainties, and ap-
proach the problem from different angles.

• The problems range from easy to difficult, and from practical to theoretical.
Some problems require running a full experiment to arrive at the answer.

• The answer may not be obvious or numerically close to one of the choices,
but one (and only one) choice will be correct if you follow the instructions
precisely in each problem. You are encouraged to explore the problem further
by experimenting with variations on these instructions, for the learning benefit.

• You are also encouraged to take part in the discussion forum. Please make
sure you don’t discuss specific answers, or specific excluded answers, before the
homework is due.


c 2012-2016 Yaser Abu-Mostafa. All rights reserved. No redistribution in any
format. No translation or derivative products without written permission.

1
• Hoeffding Inequality
Run a computer simulation for flipping 1,000 virtual fair coins. Flip each coin inde-
pendently 10 times. Focus on 3 coins as follows: c1 is the first coin flipped, crand is a
coin chosen randomly from the 1,000, and cmin is the coin which had the minimum
frequency of heads (pick the earlier one in case of a tie). Let ν1 , νrand , and νmin be
the fraction of heads obtained for the 3 respective coins out of the 10 tosses.
Run the experiment 100,000 times in order to get a full distribution of ν1 , νrand , and
νmin (note that crand and cmin will change from run to run).

1. The average value of νmin is closest to:

[a] 0
[b] 0.01
[c] 0.1
[d] 0.5
[e] 0.67

2. Which coin(s) has a distribution of ν that satisfies the (single-bin) Hoeffding


Inequality?

[a] c1 only
[b] crand only
[c] cmin only
[d] c1 and crand
[e] cmin and crand

• Error and Noise


Consider the bin model for a hypothesis h that makes an error with probability µ in
approximating a deterministic target function f (both h and f are binary functions).
If we use the same h to approximate a noisy version of f given by:

λ y = f (x)
P (y | x) =
1−λ y 6= f (x)

3. What is the probability of error that h makes in approximating y? Hint: Two


wrongs can make a right!

2
[a] µ
[b] λ
[c] 1-µ
[d] (1 − λ) ∗ µ + λ ∗ (1 − µ)
[e] (1 − λ) ∗ (1 − µ) + λ ∗ µ

4. At what value of λ will the performance of h be independent of µ?

[a] 0
[b] 0.5

[c] 1/ 2
[d] 1
[e] No values of λ

• Linear Regression
In these problems, we will explore how Linear Regression for classification works. As
with the Perceptron Learning Algorithm in Homework # 1, you will create your own
target function f and data set D. Take d = 2 so you can visualize the problem, and
assume X = [−1, 1] × [−1, 1] with uniform probability of picking each x ∈ X . In
each run, choose a random line in the plane as your target function f (do this by
taking two random, uniformly distributed points in [−1, 1] × [−1, 1] and taking the
line passing through them), where one side of the line maps to +1 and the other maps
to −1. Choose the inputs xn of the data set as random points (uniformly in X ), and
evaluate the target function on each xn to get the corresponding output yn .

5. Take N = 100. Use Linear Regression to find g and evaluate Ein , the fraction of
in-sample points which got classified incorrectly. Repeat the experiment 1000
times and take the average (keep the g’s as they will be used again in Problem
6). Which of the following values is closest to the average Ein ? (Closest is the
option that makes the expression |your answer − given option| closest to 0. Use
this definition of closest here and throughout.)

[a] 0
[b] 0.001
[c] 0.01
[d] 0.1
[e] 0.5

3
6. Now, generate 1000 fresh points and use them to estimate the out-of-sample
error Eout of g that you got in Problem 5 (number of misclassified out-of-sample
points / total number of out-of-sample points). Again, run the experiment 1000
times and take the average. Which value is closest to the average Eout ?

[a] 0
[b] 0.001
[c] 0.01
[d] 0.1
[e] 0.5

7. Now, take N = 10. After finding the weights using Linear Regression, use
them as a vector of initial weights for the Perceptron Learning Algorithm. Run
PLA until it converges to a final vector of weights that completely separates
all the in-sample points. Among the choices below, what is the closest value to
the average number of iterations (over 1000 runs) that PLA takes to converge?
(When implementing PLA, have the algorithm choose a point randomly from
the set of misclassified points at each iteration)

[a] 1
[b] 15
[c] 300
[d] 5000
[e] 10000

• Nonlinear Transformation
In these problems, we again apply Linear Regression for classification. Consider the
target function:

f (x1 , x2 ) = sign(x21 + x22 − 0.6)

Generate a training set of N = 1000 points on X = [−1, 1] × [−1, 1] with a uniform


probability of picking each x ∈ X . Generate simulated noise by flipping the sign of
the output in a randomly selected 10% subset of the generated training set.

8. Carry out Linear Regression without transformation, i.e., with feature vector:

(1, x1 , x2 ),

4
to find the weight w. What is the closest value to the classification in-sample
error Ein ? (Run the experiment 1000 times and take the average Ein to reduce
variation in your results.)

[a] 0
[b] 0.1
[c] 0.3
[d] 0.5
[e] 0.8

9. Now, transform the N = 1000 training data into the following nonlinear feature
vector:
(1, x1 , x2 , x1 x2 , x21 , x22 )
Find the vector w̃ that corresponds to the solution of Linear Regression. Which
of the following hypotheses is closest to the one you find? Closest here means
agrees the most with your hypothesis (has the highest probability of agreeing on
a randomly selected point). Average over a few runs to make sure your answer
is stable.

[a] g(x1 , x2 ) = sign(−1 − 0.05x1 + 0.08x2 + 0.13x1 x2 + 1.5x21 + 1.5x22 )


[b] g(x1 , x2 ) = sign(−1 − 0.05x1 + 0.08x2 + 0.13x1 x2 + 1.5x21 + 15x22 )
[c] g(x1 , x2 ) = sign(−1 − 0.05x1 + 0.08x2 + 0.13x1 x2 + 15x21 + 1.5x22 )
[d] g(x1 , x2 ) = sign(−1 − 1.5x1 + 0.08x2 + 0.13x1 x2 + 0.05x21 + 0.05x22 )
[e] g(x1 , x2 ) = sign(−1 − 0.05x1 + 0.08x2 + 1.5x1 x2 + 0.15x21 + 0.15x22 )

10. What is the closest value to the classification out-of-sample error Eout of your
hypothesis from Problem 9? (Estimate it by generating a new set of 1000 points
and adding noise, as before. Average over 1000 runs to reduce the variation in
your results.)

[a] 0
[b] 0.1
[c] 0.3
[d] 0.5
[e] 0.8

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy