UNIT_-V_MA - Copy

Download as pdf or txt
Download as pdf or txt
You are on page 1of 86

UNIT - V

MULTIVARIATE
ANALYSIS
BY
E.MATHIVADHANA, M.Sc.,M.PHIL.
ASSISTANT PROFESSOR
DEPARTMENT OF MATHEMATICS
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
1 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
SYLLABUS

Random Vectors and Matrices - Mean


vectors and Covariance matrices –
Multivariate Normal density and its
properties - Principal components
Population principal components -
Principal components from standardized
variables.

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
2 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Random Vectors & Matrices
A random vector is a vector whose elements are random variables.
Similarly, a random matrix whose elements are random variables.

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
3 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Expected Value of a Random Matrix
The expected value of a random matrix (or vector) is the matrix
(vector) consisting of the expected values of each of the elements.

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
4 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Mean Vectors

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
5 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Covariance Matrices

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
6 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Covariance Matrix
⚫ Covariance matrix captures the variance and linear
correlation in multivariate/ multidimensional data.
⚫ If data is an n x p matrix, the Covariance Matrix is a p x p
square matrix
⚫ .Think of n as the number of data instances (rows) and p
the number of attributes (columns).

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
7 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Covariance
⚫ The covariance of the return is

⚫ It is always true that

⚫ i.

⚫ ii.

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
8 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Mean Matrix

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
9 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Covariance Matrix

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
10 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Covariance Matrix

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
11 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Example
Find the mean & covariance matrix for the 2 r.v. X1 & X2 for the given
joint probability function P12 (x1 ,x2) is

Soln:
Marginal Distribution of X

X1 -1 0 1
P(X1) 0.3 0.3 0.4

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
12 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Example

Marginal Distribution of Y

X2 0 1
P(X2) 0.8 0.2

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
13 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Example

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
14 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Example

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
15 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Sample Covariance
⚫ Example. The table provides the returns on three assets
over three years

Year 1 Year 2 Year 3


A 10 12 11
B 10 14 12
C 12 6 9
⚫ Mean returns

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
16 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Sample Covariance
⚫ Covariance between A and B is

⚫ Covariance between A and C is

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
17 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Variance-Covariance Matrix
⚫ Covariance between B and C is

⚫ The matrix is symmetric

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
18 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Variance-Covariance Matrix
⚫ For the example the variance-covariance matrix is

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
19 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Correlation Coefficient
Let the population correlation coefficient matrix be the p x p symmetric
matrix

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
20 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Standard Deviation
Let the p x p standard deviation be

Then it is verified that

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
21 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Example

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
22 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
23 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Linear Combination of Random
Variables
Prove that the linear combination cʹX = aX1 + bX2 has
Mean = E(cʹX) = cʹμ
Var = Var(cʹX) = cʹΣc
Where μ = E(X) & Σ = cov(X)
Soln:

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
24 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
25 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
The previous result can be extended to a linear combination of
p random varaibles:
The linear combination cʹX = c1 X1 + c2 X2 +… + cpXp has
Mean = E(cʹX) = cʹμ
Var = Var(cʹX) = cʹΣc

In general, consider for q linear combinations Z=CX of the p


random varaibles X1 , X2 , …, Xp

μZ = E(Z) = E(CX) = C μX
ΣZ = cov(Z) = cov(CX) = CΣXCʹ

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
26 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Example

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
27 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
28 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Multivariate Normal Distribution
The multivariate normal density of the univariate normal
density to p ≥ 2 dimension is

The p-dimensional normal density for the random vector


X' = [X1 , X2, …, Xp ] has the form

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
29 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Bivariate Normal Distribution
The Bivariate normal density

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
30 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
The Multivariate Normal Distribution
The univariate normal distribution has a generalized form in p dimensions
– the p-dimensional normal density function is

squared
where -∞ ≤ xi ≤ ∞, i = 1,…,p.
generalized
~ ~
distance
This p-dimensional normal density function from
is denoted by xNp(μ,Σ) where
to μ

~ ~

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
31 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
The simplest multivariate normal distribution is the bivariate (2
dimensional) normal distribution, which has the density function

squared
where -∞ ≤ xi ≤ ∞, i = 1, 2. generalized
~ ~
distance from x
to μ
This 2-dimensional normal density function is denoted by N2(μ,Σ) where

~ ~

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
32 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
We can easily find the inverse of the covariance matrix (by using
Gauss-Jordan elimination or some other technique):

Now we use the previously established relationship

to establish that

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
33 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
By substitution we can now write the squared distance as

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
34 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
which means that we can rewrite the bivariate normal probability density
function as

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
35 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Graphically, the bivariate normal probability density function looks like
this:

contours

X2

X1

All points of equal density are called a contour, defined for p-dimensions
as all x such that
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/ ~
36 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
The contours

form concentric ellipsoids centered at μ with axes


~
X2

contour for
constant c

f(X1, X2)

X1
where
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
37 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
The general form of contours for a bivariate normal probability
distribution where the variables have equal variance (σ11 = σ22) is relative
easy to derive:
First we need the eigenvalues of Σ
~

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
38 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Next we need the eigenvectors of Σ
~

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
39 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
- for a positive covariance σ12, the first eigenvalue and its associated
eigenvector lie along the 450 line running through the centroid μ:

~
X2

contour for
constant

f(X1, X2)

X1

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
40 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
- for a negative covariance σ12, the second eigenvalue and its associated
eigenvector lie at right angles to the 450 line running through the
centroid μ:
~
X2

contour for
constant

f(X1, X2)

X1

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
41 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
What do you suppose happens when the two random variables X1 and X2
are uncorrelated (i.e., r12 = 0):

f(X1) f(X2)

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
42 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
- for covariance σ12 of zero the two eigenvalues and eigenvectors are equal
(except for signs) - one runs along the 450 line running through the
centroid μ and the other is perpendicular:
~

X2

contour for
constant

f(X1, X2)

X1

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
43 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Contours also have an important probability interpretation – the solid
ellipsoid of x values satisfying:
~

has a probability 1 – α, i.e.,

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
44 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Bivariate Normal Distribution

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
45 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Bivariate Normal Distribution

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
46 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Properties

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
47 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Properties

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
48 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Properties

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
49 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Properties

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
50 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Principal Components Analysis
A. The Basic Principle
We wish to explain/summarize the underlying variance-covariance structure
of a large set of variables through a few linear combinations of these
variables. The objectives of principal components analysis are

- data reduction

- interpretation

The results of principal components analysis are often used as inputs to

- regression analysis

- cluster analysis

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
51 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
B. Population Principal Components
Suppose we have a population measured on p random variables X1,…,Xp.
Note that these random variables represent the p-axes of the Cartesian
coordinate system in which the population resides. Our goal is to develop a
new set of p axes (linear combinations of the original p axes) in the directions
of greatest variability:

X2

X1
This is accomplished by
rotating the axes.

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
52 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Consider our random vector

with covariance matrix Σ and eigenvalues λ1 ≥ λ2 ≥  ≥ λp.


~
We can construct p linear combinations

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
53 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
It is easy to show that

The principal components are those uncorrelated linear combinations Y1,…,Yp


whose variances are as large as possible.

Thus the first principal component is the linear combination of maximum


variance, i.e., we wish to solve the nonlinear optimization problem

source of restrict to
nonlineari coefficient
ty vectors of unit
length

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
54 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
The second principal component is the linear combination of maximum
variance that is uncorrelated with the first principal component, i.e., we wish
to solve the nonlinear optimization problem

restricts
covarianc
e to zero

The third principal component is the solution to the nonlinear optimization


problem

restricts
covarianc
es to zero
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
55 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Generally, the ith principal component is the linear combination of maximum
variance that is uncorrelated with all previous principal components, i.e., we
wish to solve the nonlinear optimization problem

We can show that, for random vector X with covariance matrix Σ and
eigenvalues λ1 ≥ λ2 ≥  ≥ λp ≥ 0, the ith principal component is given by

Note that the principal components are not unique if some eigenvalues are
equal.

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
56 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
We can also show for random vector X with covariance matrix Σ and
~
eigenvalue-eigenvector pairs (λ1 , e1), …, (λp , ep) where λ1 ≥ λ2 ≥  ≥ λp ,
~ ~
~

so we can assess how well a subset of the principal components Yi


summarizes the original random variables Xi – one common method of doing
so is

proportion of total
population variance due
to the kth principal
component

If a large proportion of the total population variance can be attributed to


relatively few principal components, we can replace the original p variables
with these principal components without loss of much information!
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
57 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
We can also easily find the correlations between the original random
variables Xk and the principal components Yi:

These values are often used in interpreting the principal components Yi.

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
58 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Example: Suppose we have the following population of four observations
made on three random variables X1, X2, and X3:

Find the three population principal components Y1, Y2, and Y3:

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
59 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
First we need the covariance matrix Σ:
~

and the corresponding eigenvalue-eigenvector pairs:

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
60 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
so the principal components are:

Note that

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
61 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
and the proportion of total population variance due to the each principal
component is

Note that the third principal component is relatively irrelevant!

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
62 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Next we obtain the correlations between the original random variables Xi
and the principal components Yi:

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
63 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
64 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
We can display these results in a correlation matrix:

Here we can easily see that

- the first principal component (Y1) is a mixture of all three random variables
(X1, X2, and X3)

- the second principal component (Y2) is a trade-off between X1 and X3


- the third principal component (Y3) is a residual of X1

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
65 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
When the principal components are derived from an X ~ Np(μ,Σ) distributed
population, the density of X is constant on the μ-centered ellipsoids

which have axes

where (λi,ei) are the eigenvalue-eigenvector pairs of Σ.

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
66 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
We can set μ = 0 w.l.g. – we can then write

where the are the principal components of x.

Setting and substituting into the previous expression yields

which defines an ellipsoid (note that λi > 0 ∀ i) in a coordinate system with


axes y1,…,yp lying in the directions of e1,…,ep, respectively.
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
67 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
The major axis lies in the direction determined by the eigenvector ei
associated with the largest eigenvalue λi - the remaining minor axes lie in
the directions determined
~ by the other eigenvectors.

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
68 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Example: For the principal components derived from the following
population of four observations made on three random variables X1, X2, and
X3:

plot the major and minor axes.

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
69 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
We will need the centroid μ:

The direction of the major axis is given by

while the directions of the two minor axis are given by

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
70 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
We first graph the centroid:

X
2

3.0,10.0,15. X1
0

X
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
3
71 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
…then use the first eigenvector to find a second point on the first principal
axis:

X
2

Y1

X1

The line connecting these two


points is the Y1 axis.

X
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
3
72 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
…then do the same thing with the second eigenvector:
Y2
X
2

Y1

X1

The line connecting these


two points is the Y2 axis.
X
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
3
73 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
…and do the same thing with the third eigenvector:
Y2
X
2

Y1

X1

The line connecting these two Y3


points is the Y3 axis.

X
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
3
74 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
What we have done is a rotation…
Y2

X
2

Y1

X1

Y3

X
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
3
75 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
and a translation in p = 3 dimensions.
Y2 Y2
X
2

Note that the rotated axes


remain orthogonal!

Y1

X1

Y3

X
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
3
76 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Note that we can also construct principal components for the standardized
variables Zi:

which in matrix notation is

where V1/2 is the diagonal standard deviation matrix.

Obviously

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
77 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
This suggests that the principal components for the standardized variables
Zi may be obtained from the eigenvectors of the correlation matrix ρ! The
operations are analogous to those used in conjunction with the covariance
matrix.
~

We can show that, for random vector Z of standardized variables with


covariance matrix ρ and eigenvalues λ1 ≥ λ2 ≥  ≥ λ~
p
≥ 0, the ith principal
component is given by ~

Note again that the principal components are not unique if some eigenvalues
are equal.

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
78 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
We can also show for random vector Z with covariance matrix ρ and
~
eigenvalue-eigenvector pairs (λ1 , e1), …, (λp , ep) where λ1 ≥ λ2 ≥  ≥ λp ,
~ ~
~

and we can again assess how well a subset of the principal components Yi
summarizes the original random variables Xi by using

proportion of total
population variance due
to the kth principal
component
If a large proportion of the total population variance can be attributed to
relatively few principal components, we can replace the original p variables
with these principal components without loss of much information!

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
79 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Example: Suppose we have the following population of four observations
made on three random variables X1, X2, and X3:

Find the three population principal components variables Y1, Y2, and Y3 for
the standardized random variables Z1, Z2, and Z3:

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
80 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
We could standardize the variables X1, X2, and X3, then work with the
resulting covariance matrix Σ, but it is much easier to proceed directly with
correlation matrix ρ: ~
~

and the corresponding eigenvalue-eigenvector pairs:

These results differ from


the covariance- based
principal components!

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
81 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
so the principal components are:

Note that

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
82 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
and the proportion of total population variance due to the each principal
component is

Note that the third principal component is again relatively irrelevant!


IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
83 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Next we obtain the correlations between the original random variables Xi
and the principal components Yi:

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
84 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
85 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
We can display these results in a correlation matrix:

Here we can easily see that

- the first principal component (Y1) is a mixture of all three random variables
(X1, X2, and X3)

- the second principal component (Y2) is a trade-off between X1 and X3


- the third principal component (Y3) is a trade-off between X1 and X2

IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
86 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy