CS3491-AI ML-Chapter 5
CS3491-AI ML-Chapter 5
CS3491-AI ML-Chapter 5
Machine
Learning
CHAPTER 5:
Multivariate
Methods
Multivariate Data
Multiple measurements (sensors)
d inputs/features/attributes: d-variate
N instances/observations/examples
X 11 X 1
2 X 1
d
2 2 2
X1 X 2 X d
X
N N N
X 1 X 2 X d
3
Multivariate Parameters
Mean : E x μ 1 ,...,d
T
12 12 1d
2
21 2 2d
CovX E X μ X μ
T
2
d1 d 2 d
4
Parameter Estimation
N
Samplemean m : mi
t 1 i
x t
,i 1,...,d
N
Covariancematrix S : sij
x
N
t 1
t
i
mi xtj mj
N
sij
Correlation matrix R : rij
si s j
5
Estimation of Missing Values
What to do if certain instances have missing
attributes?
Ignore those instances: not a good idea if the
sample is small
Use ‘missing’ as an attribute: may give
information
Imputation: Fill in the missing value
Mean imputation: Use the most likely value (e.g., mean)
Imputation by regression: Predict based on other
attributes
6
Multivariate Normal
Distribution
x ~N d μ, Σ
1 1
p x exp x μ Σ x μ
T 1
2 Σ
d/ 2 1/ 2
2
7
Multivariate Normal
Distribution
Mahalanobis distance: (x – μ)T ∑–1 (x – μ)
measures the distance from x to μ in terms of
∑ (normalizes for difference in variances and
correlations)
12 12
Bivariate: d = 2
12 22
1 1 2
p x1 ,x2 exp
2
2
z1 2z1z 2 z 2
212 1 2
21
zi xi i / i
8
Bivariate Normal
9
10
Independent Inputs: Naive
Bayes
If xi are independent, offdiagonals of ∑ are 0,
Mahalanobis distance reduces to weighted (by
1/σi) Euclidean distance:
d
1 1 d x
2
p x pi xi d
exp i
i
2
i 1
2 i
d / 2
i 1 i
i 1
11
Parametric Classification
If p (x | Ci ) ~ N ( μi , ∑i )
1 1
p x | C i exp x μi Σi x μi
T 1
2 Σi
d/ 2 1/ 2
2
Discriminant functions are
2 2 2
12
Estimation of Parameters
P̂ C
r
t i
t
i
N
mi
t i x
r t t
t i
r t
r x
T
Si
t i
t t t
mi x mi
t i
r t
1 1
gi x log Si x mi Si x mi log P̂ C i
T 1
2 2
13
Different Si
Quadratic discriminant
1
2 2
1 T 1 1 T 1
gi x log Si x Si x 2xT Si mi mi Si mi log P̂ C i
T
xT Wi x wi x wi 0
where
1 1
Wi Si
2
1
wi Si mi
1 T 1 1
wi 0 mi Si mi log Si log P̂ C i
2 2
14
likelihoods
discriminant:
P (C1|x ) = 0.5
posterior for C1
15
Common Covariance Matrix S
Shared common sample covariance S
S P̂ C i Si
i
Discriminant reduces to
1
gi x x mi S 1 x mi log P̂ C i
T
2
which is a linear discriminant
gi x wi x wi 0
T
where
1 T 1
1
wi S mi wi 0 mi S mi log P̂ C i
2 16
Common Covariance Matrix S
17
Diagonal S
When xj j = 1,..d, are independent, ∑ is diagonal
p (x|Ci) = ∏j p (xj |Ci) (Naive Bayes’ assumption)
2
1 x mij
d t
gi x j
log P̂ C i
2 j 1 s j
18
Diagonal S
variances may be
different
19
Diagonal S, equal variances
Nearest mean classifier: Classify based on
Euclidean distance to the nearest mean
2
x mi
gi x 2
log P̂ C i
2s
2
1 d
2 xtj mij
2s j 1
log P̂ C
i
20
Diagonal S, equal variances
*?
21
Model Selection
Assumption Covariance matrix No of parameters
Shared, Hyperspheric Si=S=s2I 1
Shared, Axis-aligned Si=S, with sij=0 d
Shared, Hyperellipsoidal Si=S d(d+1)/2
Different, Si K d(d+1)/2
Hyperellipsoidal
As we increase complexity (less restricted S), bias
decreases and variance increases
Assume simple models (allow some bias) to
control variance (regularization)
22
Discrete Features
Binary features:pij p x j 1 | C i
if xj are independent (Naive Bayes’)
d
p x | C i pij 1 pij
xj 1 x j
j 1
t i
r t
23
Discrete Features
Multinomial (1-of-nj) features: xj {v1, v2,..., vnj}
pijk p z jk 1 | C i p x j vk | C i
if xj are independent
nd j
p x | C i pijkjk
z
j 1 k 1
p̂ijk
t jk i
z t
r t
t i
r t
24
Multivariate Regression
r g x | w ,w ,...,w
t t
0 1 d
25
Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.
Alternative Proxies: