0% found this document useful (0 votes)

59 views55 pages

Lecture 9: Classification, LDA: Reading: Chapter 4

1. The document discusses lecture notes on classification and linear discriminant analysis (LDA). 2. LDA estimates the distributions P(X|Y) and P(Y) instead of directly estimating P(Y|X). 3. It models P(X|Y) as a multivariate normal distribution and finds the decision boundaries that maximize P(Y|X) using Bayes' rule. This results in linear decision boundaries.

Uploaded by

Luis Soares

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views55 pages

Lecture 9: Classification, LDA: Reading: Chapter 4

Uploaded by

Luis Soares

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 55

Lecture 9: Classification, LDA

Reading: Chapter 4

STATS 202: Data mining and analysis

Jonathan Taylor, 10/12

Slide credits: Sergio Bacallado

1 / 21
Review: Main strategy in Chapter 4

Find an estimate P̂ (Y | X). Then, given an input x0 , we predict

the response as in a Bayes classifier:

ŷ0 = argmax y P̂ (Y = y | X = x0 ).

2 / 21
Linear Discriminant Analysis (LDA)

Instead of estimating P (Y | X), we will estimate:

3 / 21
Linear Discriminant Analysis (LDA)

Instead of estimating P (Y | X), we will estimate:

1. P̂ (X | Y ): Given the response, what is the distribution of the

inputs.

3 / 21
Linear Discriminant Analysis (LDA)

Instead of estimating P (Y | X), we will estimate:

1. P̂ (X | Y ): Given the response, what is the distribution of the

inputs.

2. P̂ (Y ): How likely are each of the categories.

3 / 21
Linear Discriminant Analysis (LDA)

Instead of estimating P (Y | X), we will estimate:

1. P̂ (X | Y ): Given the response, what is the distribution of the

inputs.

2. P̂ (Y ): How likely are each of the categories.

Then, we use Bayes rule to obtain the estimate:

P̂ (X = x | Y = k)P̂ (Y = k)
P̂ (Y = k | X = x) =
P̂ (X = x)

3 / 21
Linear Discriminant Analysis (LDA)

Instead of estimating P (Y | X), we will estimate:

1. P̂ (X | Y ): Given the response, what is the distribution of the

inputs.

2. P̂ (Y ): How likely are each of the categories.

Then, we use Bayes rule to obtain the estimate:

P̂ (X = x | Y = k)P̂ (Y = k)
P̂ (Y = k | X = x) = P
j P̂ (X = x | Y = j)P̂ (Y = j)

3 / 21
Linear Discriminant Analysis (LDA)
Instead of estimating P (Y | X), we will estimate:

1. We model P̂ (X = x | Y = k) = fˆk (x) as a Multivariate

Normal Distribution:
4

4
2

2
X2

X2
0

0
−2

−2
−4

−4

−4 −2 0 2 4 −4 −2 0 2 4

X1 X1

4 / 21
Linear Discriminant Analysis (LDA)
Instead of estimating P (Y | X), we will estimate:

1. We model P̂ (X = x | Y = k) = fˆk (x) as a Multivariate

Normal Distribution:
4

4
2

2
X2

X2
0

0
−2

−2
−4

−4

−4 −2 0 2 4 −4 −2 0 2 4

X1 X1

2. P̂ (Y = k) = π̂k is estimated by the fraction of training

samples of class k.

4 / 21
LDA has linear decision boundaries

Suppose that:

5 / 21
LDA has linear decision boundaries

Suppose that:

I We know P (Y = k) = πk exactly.

5 / 21
LDA has linear decision boundaries

Suppose that:

I We know P (Y = k) = πk exactly.
I P (X = x|Y = k) is Mutivariate Normal with density:
1 1 T Σ−1 (x−µ )
fk (x) = e− 2 (x−µk ) k
(2π)p/2 |Σ|1/2

5 / 21
LDA has linear decision boundaries

Suppose that:

I We know P (Y = k) = πk exactly.
I P (X = x|Y = k) is Mutivariate Normal with density:
1 1 T Σ−1 (x−µ )
fk (x) = e− 2 (x−µk ) k
(2π)p/2 |Σ|1/2

µk : Mean of the inputs for category k.

Σ : Covariance matrix (common to all categories).

5 / 21
LDA has linear decision boundaries

Suppose that:

I We know P (Y = k) = πk exactly.
I P (X = x|Y = k) is Mutivariate Normal with density:
1 1 T Σ−1 (x−µ )
fk (x) = e− 2 (x−µk ) k
(2π)p/2 |Σ|1/2

µk : Mean of the inputs for category k.

Σ : Covariance matrix (common to all categories).
Then, what is the Bayes classifier?

5 / 21
LDA has linear decision boundaries
By Bayes rule, the probability of category k, given the input x is:

fk (x)πk
P (Y = k | X = x) =
P (X = x)

6 / 21
LDA has linear decision boundaries
By Bayes rule, the probability of category k, given the input x is:

fk (x)πk
P (Y = k | X = x) =
P (X = x)

The denominator does not depend on the response k, so we can

write it as a constant:

P (Y = k | X = x) = C × fk (x)πk

6 / 21
LDA has linear decision boundaries
By Bayes rule, the probability of category k, given the input x is:

fk (x)πk
P (Y = k | X = x) =
P (X = x)