CS3491-AI ML-Chapter 5

INTRODUCTION TO
Machine
Learning
CHAPTER 5:
Multivariate
Methods
Multivariate Data
 Multiple measurements (sensors)
 d inputs/features/attributes: d-variate
 N instances/observations/examples
 X 11 X 1
2  X 1
d
 2 2 2 
 X1 X 2  X d
X
  
 N N N 
 X 1 X 2  X d 
3
Multivariate Parameters
Mean : E x μ 1 ,...,d 
T
Covariance : ij CovX i , X j 

ij
Correlation : Corr X i , X j  ij 
i  j
  12  12   1d 
 
 
2
 21  2   2d 
 CovX   E X  μ X  μ  
T

  
 2 
  d1  d 2   d 
4
Parameter Estimation
N
Samplemean m : mi 
t 1 i
x t
,i 1,...,d
N
Covariancematrix S : sij 
 x
N
t 1
t
i 
 mi xtj  mj 
N
sij
Correlation matrix R : rij 
si s j
5
Estimation of Missing Values
 What to do if certain instances have missing
attributes?
 Ignore those instances: not a good idea if the
sample is small
 Use ‘missing’ as an attribute: may give
information
 Imputation: Fill in the missing value
 Mean imputation: Use the most likely value (e.g., mean)
 Imputation by regression: Predict based on other
attributes
6
Multivariate Normal
Distribution
x ~N d μ, Σ 
1  1 
p x  exp  x  μ Σ x  μ
T 1
2 Σ
d/ 2 1/ 2
 2 
7
Multivariate Normal
Distribution
 Mahalanobis distance: (x – μ)T ∑–1 (x – μ)
measures the distance from x to μ in terms of
∑ (normalizes for difference in variances and
correlations)
 12 12 
 Bivariate: d = 2   
 12 22 
1  1 2 
p x1 ,x2   exp 
 2
2
 
z1  2z1z 2  z 2 

212 1  2
 21   
zi  xi  i / i
8
Bivariate Normal
9
10
Independent Inputs: Naive
Bayes
 If xi are independent, offdiagonals of ∑ are 0,
Mahalanobis distance reduces to weighted (by
1/σi) Euclidean distance:
d
1  1 d x   
2

p x pi xi   d
exp    i
 
i
 
 2 
i 1
2 i
d / 2
 i 1  i 
i 1
 If variances are also equal, reduces to Euclidean

distance
11
Parametric Classification
 If p (x | Ci ) ~ N ( μi , ∑i )
1  1 
p x | C i   exp  x  μi  Σi x  μi 
T 1
2 Σi
d/ 2 1/ 2
 2 
 Discriminant functions are
gi x  log p x | C i   log P C i 

d 1 1
  log2  log Σi  x  μi  Σi x  μi   log P C i 
T 1
2 2 2
12
Estimation of Parameters
P̂ C  
 r
t i
t
i
N
mi 
t i x
r t t
t i
r t
r x  
T
Si 
t i
t t t
 mi x  mi
t i
r t
1 1
gi x   log Si  x  mi  Si x  mi   log P̂ C i 
T 1
2 2
13
Different Si
 Quadratic discriminant
1
2 2

1 T 1 1 T 1

gi x   log Si  x Si x  2xT Si mi  mi Si mi  log P̂ C i 
T
 xT Wi x  wi x  wi 0
where
1 1
Wi   Si
2
1
wi Si mi
1 T 1 1
wi 0   mi Si mi  log Si  log P̂ C i 
2 2
14
likelihoods
discriminant:
P (C1|x ) = 0.5
posterior for C1
15
Common Covariance Matrix S
 Shared common sample covariance S
S   P̂ C i Si
i
 Discriminant reduces to
1
gi x   x  mi  S 1 x  mi   log P̂ C i 
T
2
which is a linear discriminant
gi x wi x  wi 0
T
where
1 T 1
1
wi S mi wi 0   mi S mi  log P̂ C i 
2 16
Common Covariance Matrix S
17
Diagonal S
 When xj j = 1,..d, are independent, ∑ is diagonal
p (x|Ci) = ∏j p (xj |Ci) (Naive Bayes’ assumption)
2
1  x  mij 
d t
gi x    j
  log P̂ C i 
2 j 1  s j 

Classify based on weighted Euclidean distance (in

sj units) to the nearest mean
18
Diagonal S
variances may be
different
19
Diagonal S, equal variances
 Nearest mean classifier: Classify based on
Euclidean distance to the nearest mean
2
x  mi
gi x   2
 log P̂ C i 
2s
2
1 d

  2  xtj  mij
2s j 1
  log P̂ C 
i
 Each mean can be considered a prototype or

template and this is template matching
20
Diagonal S, equal variances
*?
21
Model Selection
Assumption Covariance matrix No of parameters
Shared, Hyperspheric Si=S=s2I 1
Shared, Axis-aligned Si=S, with sij=0 d
Shared, Hyperellipsoidal Si=S d(d+1)/2
Different, Si K d(d+1)/2
Hyperellipsoidal
 As we increase complexity (less restricted S), bias
decreases and variance increases
 Assume simple models (allow some bias) to
control variance (regularization)
22
Discrete Features
 Binary features:pij p x j 1 | C i 
if xj are independent (Naive Bayes’)
d
p x | C i  pij 1  pij 
xj 1 x j 
j 1
the discriminant is linear

gi x  log p x | C i   log P C i 
  x j log pij  1  x j  log 1  pij   log P C i 
j
Estimated parameters p̂ij 

t j ri
x t t
t i
r t
23
Discrete Features
 Multinomial (1-of-nj) features: xj  {v1, v2,..., vnj}
pijk p z jk 1 | C i  p x j vk | C i 
if xj are independent
nd j
p x | C i  pijkjk
z
j 1 k 1
gi x   j k z jk log pijk  log P C i 
p̂ijk 
t jk i
z t
r t
t i
r t
24
Multivariate Regression
r g x | w ,w ,...,w  
t t
0 1 d
 Multivariate linear model

w 0  w1x1t  w 2x2t    w d xdt
1

E w 0 ,w1 ,...,w d | X   t r t  w 0  w1x1t    w d xdt
2

2
 Multivariate polynomial model:

Define new higher-order variables
z1=x1, z2=x2, z3=x12, z4=x22, z5=x1x2
and use the linear model in this new z space
(basis functions, kernel trick, SVM: Chapter 10)
25

CS3491-AI ML-Chapter 5

Uploaded by

Copyright:

Available Formats

CS3491-AI ML-Chapter 5

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CS3491-AI ML-Chapter 5

Uploaded by

Copyright:

Available Formats

INTRODUCTION TO

Covariance : ij CovX i , X j 

 If variances are also equal, reduces to Euclidean

gi x  log p x | C i   log P C i 

Classify based on weighted Euclidean distance (in

 Each mean can be considered a prototype or

the discriminant is linear

Estimated parameters p̂ij 

gi x   j k z jk log pijk  log P C i 

 Multivariate linear model

 Multivariate polynomial model:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.