CS3491-AI ML-Chapter 5

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 25

INTRODUCTION TO

Machine
Learning
CHAPTER 5:

Multivariate
Methods
Multivariate Data
 Multiple measurements (sensors)
 d inputs/features/attributes: d-variate
 N instances/observations/examples

 X 11 X 1
2  X 1
d
 2 2 2 
 X1 X 2  X d
X
  
 N N N 
 X 1 X 2  X d 

3
Multivariate Parameters
Mean : E x μ 1 ,...,d 
T

Covariance : ij CovX i , X j 


ij
Correlation : Corr X i , X j  ij 
i  j

  12  12   1d 
 
 
2
 21  2   2d 
 CovX   E X  μ X  μ  
T

  
 2 
  d1  d 2   d 

4
Parameter Estimation
N

Samplemean m : mi 
t 1 i
x t

,i 1,...,d
N

Covariancematrix S : sij 
 x
N
t 1
t
i 
 mi xtj  mj 
N
sij
Correlation matrix R : rij 
si s j

5
Estimation of Missing Values
 What to do if certain instances have missing
attributes?
 Ignore those instances: not a good idea if the
sample is small
 Use ‘missing’ as an attribute: may give
information
 Imputation: Fill in the missing value
 Mean imputation: Use the most likely value (e.g., mean)
 Imputation by regression: Predict based on other
attributes

6
Multivariate Normal
Distribution

x ~N d μ, Σ 
1  1 
p x  exp  x  μ Σ x  μ
T 1

2 Σ
d/ 2 1/ 2
 2 
7
Multivariate Normal
Distribution
 Mahalanobis distance: (x – μ)T ∑–1 (x – μ)
measures the distance from x to μ in terms of
∑ (normalizes for difference in variances and
correlations)
 12 12 
 Bivariate: d = 2   
 12 22 

1  1 2 
p x1 ,x2   exp 
 2
2
 
z1  2z1z 2  z 2 

212 1  2
 21   
zi  xi  i / i
8
Bivariate Normal

9
10
Independent Inputs: Naive
Bayes
 If xi are independent, offdiagonals of ∑ are 0,
Mahalanobis distance reduces to weighted (by
1/σi) Euclidean distance:
d
1  1 d x   
2

p x pi xi   d
exp    i
 
i
 
 2 
i 1
2 i
d / 2
 i 1  i 
i 1

 If variances are also equal, reduces to Euclidean


distance

11
Parametric Classification
 If p (x | Ci ) ~ N ( μi , ∑i )
1  1 
p x | C i   exp  x  μi  Σi x  μi 
T 1

2 Σi
d/ 2 1/ 2
 2 
 Discriminant functions are

gi x  log p x | C i   log P C i 


d 1 1
  log2  log Σi  x  μi  Σi x  μi   log P C i 
T 1

2 2 2

12
Estimation of Parameters
P̂ C  
 r
t i
t

i
N

mi 
t i x
r t t

t i
r t

r x  
T

Si 
t i
t t t
 mi x  mi
t i
r t

1 1
gi x   log Si  x  mi  Si x  mi   log P̂ C i 
T 1

2 2

13
Different Si
 Quadratic discriminant

1
2 2

1 T 1 1 T 1

gi x   log Si  x Si x  2xT Si mi  mi Si mi  log P̂ C i 
T
 xT Wi x  wi x  wi 0
where
1 1
Wi   Si
2
1
wi Si mi
1 T 1 1
wi 0   mi Si mi  log Si  log P̂ C i 
2 2

14
likelihoods
discriminant:
P (C1|x ) = 0.5

posterior for C1

15
Common Covariance Matrix S
 Shared common sample covariance S
S   P̂ C i Si
i

 Discriminant reduces to
1
gi x   x  mi  S 1 x  mi   log P̂ C i 
T

2
which is a linear discriminant
gi x wi x  wi 0
T

where
1 T 1
1
wi S mi wi 0   mi S mi  log P̂ C i 
2 16
Common Covariance Matrix S

17
Diagonal S
 When xj j = 1,..d, are independent, ∑ is diagonal
p (x|Ci) = ∏j p (xj |Ci) (Naive Bayes’ assumption)
2
1  x  mij 
d t

gi x    j
  log P̂ C i 
2 j 1  s j 

Classify based on weighted Euclidean distance (in


sj units) to the nearest mean

18
Diagonal S

variances may be
different

19
Diagonal S, equal variances
 Nearest mean classifier: Classify based on
Euclidean distance to the nearest mean
2
x  mi
gi x   2
 log P̂ C i 
2s
2
1 d

  2  xtj  mij
2s j 1
  log P̂ C 
i

 Each mean can be considered a prototype or


template and this is template matching

20
Diagonal S, equal variances

*?

21
Model Selection
Assumption Covariance matrix No of parameters
Shared, Hyperspheric Si=S=s2I 1
Shared, Axis-aligned Si=S, with sij=0 d
Shared, Hyperellipsoidal Si=S d(d+1)/2
Different, Si K d(d+1)/2
Hyperellipsoidal
 As we increase complexity (less restricted S), bias
decreases and variance increases
 Assume simple models (allow some bias) to
control variance (regularization)

22
Discrete Features
 Binary features:pij p x j 1 | C i 
if xj are independent (Naive Bayes’)
d
p x | C i  pij 1  pij 
xj 1 x j 
j 1

the discriminant is linear


gi x  log p x | C i   log P C i 
  x j log pij  1  x j  log 1  pij   log P C i 
j

Estimated parameters p̂ij 


t j ri
x t t

t i
r t

23
Discrete Features
 Multinomial (1-of-nj) features: xj  {v1, v2,..., vnj}

pijk p z jk 1 | C i  p x j vk | C i 

if xj are independent
nd j

p x | C i  pijkjk
z

j 1 k 1

gi x   j k z jk log pijk  log P C i 

p̂ijk 
t jk i
z t
r t

t i
r t

24
Multivariate Regression
r g x | w ,w ,...,w  
t t
0 1 d

 Multivariate linear model


w 0  w1x1t  w 2x2t    w d xdt
1

E w 0 ,w1 ,...,w d | X   t r t  w 0  w1x1t    w d xdt
2

2

 Multivariate polynomial model:


Define new higher-order variables
z1=x1, z2=x2, z3=x12, z4=x22, z5=x1x2
and use the linear model in this new z space
(basis functions, kernel trick, SVM: Chapter 10)

25

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy