Lecture 9: Classification, LDA: Reading: Chapter 4
Lecture 9: Classification, LDA: Reading: Chapter 4
Reading: Chapter 4
1 / 21
Review: Main strategy in Chapter 4
ŷ0 = argmax y P̂ (Y = y | X = x0 ).
2 / 21
Linear Discriminant Analysis (LDA)
3 / 21
Linear Discriminant Analysis (LDA)
3 / 21
Linear Discriminant Analysis (LDA)
3 / 21
Linear Discriminant Analysis (LDA)
P̂ (X = x | Y = k)P̂ (Y = k)
P̂ (Y = k | X = x) =
P̂ (X = x)
3 / 21
Linear Discriminant Analysis (LDA)
P̂ (X = x | Y = k)P̂ (Y = k)
P̂ (Y = k | X = x) = P
j P̂ (X = x | Y = j)P̂ (Y = j)
3 / 21
Linear Discriminant Analysis (LDA)
Instead of estimating P (Y | X), we will estimate:
4
2
2
X2
X2
0
0
−2
−2
−4
−4
−4 −2 0 2 4 −4 −2 0 2 4
X1 X1
4 / 21
Linear Discriminant Analysis (LDA)
Instead of estimating P (Y | X), we will estimate:
4
2
2
X2
X2
0
0
−2
−2
−4
−4
−4 −2 0 2 4 −4 −2 0 2 4
X1 X1
4 / 21
LDA has linear decision boundaries
Suppose that:
5 / 21
LDA has linear decision boundaries
Suppose that:
I We know P (Y = k) = πk exactly.
5 / 21
LDA has linear decision boundaries
Suppose that:
I We know P (Y = k) = πk exactly.
I P (X = x|Y = k) is Mutivariate Normal with density:
1 1 T Σ−1 (x−µ )
fk (x) = e− 2 (x−µk ) k
(2π)p/2 |Σ|1/2
5 / 21
LDA has linear decision boundaries
Suppose that:
I We know P (Y = k) = πk exactly.
I P (X = x|Y = k) is Mutivariate Normal with density:
1 1 T Σ−1 (x−µ )
fk (x) = e− 2 (x−µk ) k
(2π)p/2 |Σ|1/2
5 / 21
LDA has linear decision boundaries
Suppose that:
I We know P (Y = k) = πk exactly.
I P (X = x|Y = k) is Mutivariate Normal with density:
1 1 T Σ−1 (x−µ )
fk (x) = e− 2 (x−µk ) k
(2π)p/2 |Σ|1/2
5 / 21
LDA has linear decision boundaries
By Bayes rule, the probability of category k, given the input x is:
fk (x)πk
P (Y = k | X = x) =
P (X = x)
6 / 21
LDA has linear decision boundaries
By Bayes rule, the probability of category k, given the input x is:
fk (x)πk
P (Y = k | X = x) =
P (X = x)
P (Y = k | X = x) = C × fk (x)πk
6 / 21
LDA has linear decision boundaries
By Bayes rule, the probability of category k, given the input x is:
fk (x)πk
P (Y = k | X = x) =
P (X = x)
P (Y = k | X = x) = C × fk (x)πk
Cπk 1 T −1
P (Y = k | X = x) = p/2 1/2
e− 2 (x−µk ) Σ (x−µk )
(2π) |Σ|
6 / 21
LDA has linear decision boundaries
Cπk 1 T −1
P (Y = k | X = x) = p/2 1/2
e− 2 (x−µk ) Σ (x−µk )
(2π) |Σ|
7 / 21
LDA has linear decision boundaries
Cπk 1 T −1
P (Y = k | X = x) = p/2 1/2
e− 2 (x−µk ) Σ (x−µk )
(2π) |Σ|
7 / 21
LDA has linear decision boundaries
Cπk 1 T −1
P (Y = k | X = x) = p/2 1/2
e− 2 (x−µk ) Σ (x−µk )
(2π) |Σ|
7 / 21
LDA has linear decision boundaries
Cπk 1 T −1
P (Y = k | X = x) = p/2 1/2
e− 2 (x−µk ) Σ (x−µk )
(2π) |Σ|
7 / 21
LDA has linear decision boundaries
Cπk 1 T −1
P (Y = k | X = x) = p/2 1/2
e− 2 (x−µk ) Σ (x−µk )
(2π) |Σ|
7 / 21
LDA has linear decision boundaries
8 / 21
LDA has linear decision boundaries
8 / 21
LDA has linear decision boundaries
8 / 21
LDA has linear decision boundaries
8 / 21
LDA has linear decision boundaries
δk (x) = δ` (x)
9 / 21
LDA has linear decision boundaries
δk (x) = δ` (x)
1 1
log πk − µTk Σ−1 µk + xT Σ−1 µk = log π` − µT` Σ−1 µ` + xT Σ−1 µ`
2 2
9 / 21
LDA has linear decision boundaries
δk (x) = δ` (x)
1 1
log πk − µTk Σ−1 µk + xT Σ−1 µk = log π` − µT` Σ−1 µ` + xT Σ−1 µ`
2 2
This is a linear equation in x.
4
4
2
2
X2
X2
0
0
−2
−2
−4
−4
−4 −2 0 2 4 −4 −2 0 2 4
X1 X1
9 / 21
Estimating πk
#{i ; yi = k}
π̂k =
n
10 / 21
Estimating the parameters of fk (x)
Estimate the center of each class µk :
1 X
µ̂k = xi
#{i ; yi = k}
i ; yi =k
11 / 21
Estimating the parameters of fk (x)
Estimate the center of each class µk :
1 X
µ̂k = xi
#{i ; yi = k}
i ; yi =k
11 / 21
Estimating the parameters of fk (x)
Estimate the center of each class µk :
1 X
µ̂k = xi
#{i ; yi = k}
i ; yi =k
11 / 21
Estimating the parameters of fk (x)
Estimate the center of each class µk :
1 X
µ̂k = xi
#{i ; yi = k}
i ; yi =k
11 / 21
LDA prediction
For an input x, predict the class with the largest:
1
δ̂k (x) = log π̂k − µ̂Tk Σ̂−1 µ̂k + xT Σ̂−1 µ̂k
2
12 / 21
LDA prediction
For an input x, predict the class with the largest:
1
δ̂k (x) = log π̂k − µ̂Tk Σ̂−1 µ̂k + xT Σ̂−1 µ̂k
2
The decision boundaries are defined by:
1 1
log π̂k − µ̂Tk Σ̂−1 µ̂k + xT Σ̂−1 µ̂k = log π̂` − µ̂T` Σ̂−1 µ̂` + xT Σ̂−1 µ̂`
2 2
12 / 21
LDA prediction
For an input x, predict the class with the largest:
1
δ̂k (x) = log π̂k − µ̂Tk Σ̂−1 µ̂k + xT Σ̂−1 µ̂k
2
The decision boundaries are defined by:
1 1
log π̂k − µ̂Tk Σ̂−1 µ̂k + xT Σ̂−1 µ̂k = log π̂` − µ̂T` Σ̂−1 µ̂` + xT Σ̂−1 µ̂`
2 2
Solid lines in:
4
4
2
2
X2
X2
0
0
−2
−2
−4
−4
−4 −2 0 2 4 −4 −2 0 2 4
X1 X1
12 / 21
Quadratic discriminant analysis (QDA)
The assumption that the inputs of every class have the same
covariance Σ can be quite restrictive:
2
2
1
1
0
0
X2
X2
−1
−1
−2
−2
−3
−3
−4
−4
−4 −2 0 2 4 −4 −2 0 2 4
X1 X1
13 / 21
Quadratic discriminant analysis (QDA)
14 / 21
Quadratic discriminant analysis (QDA)
14 / 21
Quadratic discriminant analysis (QDA)
14 / 21
Quadratic discriminant analysis (QDA)
I Bayes boundary (– – –)
I LDA (· · · · · · )
I QDA (——).
2
2
1
1
0
0
X2
X2
−1
−1
−2
−2
−3
−3
−4
−4
−4 −2 0 2 4 −4 −2 0 2 4
X1 X1
15 / 21
Evaluating a classification method
16 / 21
Evaluating a classification method
16 / 21
Example. Predicting default
Used LDA to predict credit card default in a dataset of 10K people.
17 / 21
Example. Predicting default
Used LDA to predict credit card default in a dataset of 10K people.
17 / 21
Example. Predicting default
Used LDA to predict credit card default in a dataset of 10K people.
17 / 21
Example. Predicting default
Used LDA to predict credit card default in a dataset of 10K people.
17 / 21
Example. Predicting default
Used LDA to predict credit card default in a dataset of 10K people.
18 / 21
Example. Predicting default
Note that the rate of false positives became higher! That is the
price to pay for fewer false negatives.
18 / 21
Example. Predicting default
0.6
Error Rate
0.4
0.2
0.0
Threshold
19 / 21
Example. The ROC curve
ROC Curve
I Displays the performance of
1.0
0.6
0.4
0.2
0.0
20 / 21
Example. The ROC curve
ROC Curve
I Displays the performance of
1.0
of the classifier:
0.2
20 / 21
Next time
21 / 21