UNIT_-V_MA - Copy
UNIT_-V_MA - Copy
UNIT_-V_MA - Copy
MULTIVARIATE
ANALYSIS
BY
E.MATHIVADHANA, M.Sc.,M.PHIL.
ASSISTANT PROFESSOR
DEPARTMENT OF MATHEMATICS
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
1 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
SYLLABUS
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
2 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Random Vectors & Matrices
A random vector is a vector whose elements are random variables.
Similarly, a random matrix whose elements are random variables.
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
3 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Expected Value of a Random Matrix
The expected value of a random matrix (or vector) is the matrix
(vector) consisting of the expected values of each of the elements.
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
4 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Mean Vectors
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
5 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Covariance Matrices
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
6 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Covariance Matrix
⚫ Covariance matrix captures the variance and linear
correlation in multivariate/ multidimensional data.
⚫ If data is an n x p matrix, the Covariance Matrix is a p x p
square matrix
⚫ .Think of n as the number of data instances (rows) and p
the number of attributes (columns).
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
7 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Covariance
⚫ The covariance of the return is
⚫ i.
⚫ ii.
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
8 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Mean Matrix
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
9 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Covariance Matrix
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
10 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Covariance Matrix
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
11 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Example
Find the mean & covariance matrix for the 2 r.v. X1 & X2 for the given
joint probability function P12 (x1 ,x2) is
Soln:
Marginal Distribution of X
X1 -1 0 1
P(X1) 0.3 0.3 0.4
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
12 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Example
Marginal Distribution of Y
X2 0 1
P(X2) 0.8 0.2
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
13 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Example
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
14 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Example
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
15 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Sample Covariance
⚫ Example. The table provides the returns on three assets
over three years
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
16 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Sample Covariance
⚫ Covariance between A and B is
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
17 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Variance-Covariance Matrix
⚫ Covariance between B and C is
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
18 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Variance-Covariance Matrix
⚫ For the example the variance-covariance matrix is
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
19 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Correlation Coefficient
Let the population correlation coefficient matrix be the p x p symmetric
matrix
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
20 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Standard Deviation
Let the p x p standard deviation be
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
21 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Example
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
22 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
23 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Linear Combination of Random
Variables
Prove that the linear combination cʹX = aX1 + bX2 has
Mean = E(cʹX) = cʹμ
Var = Var(cʹX) = cʹΣc
Where μ = E(X) & Σ = cov(X)
Soln:
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
24 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
25 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
The previous result can be extended to a linear combination of
p random varaibles:
The linear combination cʹX = c1 X1 + c2 X2 +… + cpXp has
Mean = E(cʹX) = cʹμ
Var = Var(cʹX) = cʹΣc
μZ = E(Z) = E(CX) = C μX
ΣZ = cov(Z) = cov(CX) = CΣXCʹ
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
26 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Example
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
27 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
28 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Multivariate Normal Distribution
The multivariate normal density of the univariate normal
density to p ≥ 2 dimension is
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
29 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Bivariate Normal Distribution
The Bivariate normal density
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
30 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
The Multivariate Normal Distribution
The univariate normal distribution has a generalized form in p dimensions
– the p-dimensional normal density function is
squared
where -∞ ≤ xi ≤ ∞, i = 1,…,p.
generalized
~ ~
distance
This p-dimensional normal density function from
is denoted by xNp(μ,Σ) where
to μ
~ ~
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
31 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
The simplest multivariate normal distribution is the bivariate (2
dimensional) normal distribution, which has the density function
squared
where -∞ ≤ xi ≤ ∞, i = 1, 2. generalized
~ ~
distance from x
to μ
This 2-dimensional normal density function is denoted by N2(μ,Σ) where
~ ~
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
32 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
We can easily find the inverse of the covariance matrix (by using
Gauss-Jordan elimination or some other technique):
to establish that
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
33 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
By substitution we can now write the squared distance as
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
34 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
which means that we can rewrite the bivariate normal probability density
function as
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
35 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Graphically, the bivariate normal probability density function looks like
this:
contours
X2
X1
All points of equal density are called a contour, defined for p-dimensions
as all x such that
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/ ~
36 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
The contours
contour for
constant c
f(X1, X2)
X1
where
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
37 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
The general form of contours for a bivariate normal probability
distribution where the variables have equal variance (σ11 = σ22) is relative
easy to derive:
First we need the eigenvalues of Σ
~
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
38 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Next we need the eigenvectors of Σ
~
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
39 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
- for a positive covariance σ12, the first eigenvalue and its associated
eigenvector lie along the 450 line running through the centroid μ:
~
X2
contour for
constant
f(X1, X2)
X1
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
40 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
- for a negative covariance σ12, the second eigenvalue and its associated
eigenvector lie at right angles to the 450 line running through the
centroid μ:
~
X2
contour for
constant
f(X1, X2)
X1
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
41 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
What do you suppose happens when the two random variables X1 and X2
are uncorrelated (i.e., r12 = 0):
f(X1) f(X2)
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
42 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
- for covariance σ12 of zero the two eigenvalues and eigenvectors are equal
(except for signs) - one runs along the 450 line running through the
centroid μ and the other is perpendicular:
~
X2
contour for
constant
f(X1, X2)
X1
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
43 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Contours also have an important probability interpretation – the solid
ellipsoid of x values satisfying:
~
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
44 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Bivariate Normal Distribution
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
45 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Bivariate Normal Distribution
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
46 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Properties
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
47 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Properties
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
48 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Properties
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
49 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Properties
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
50 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Principal Components Analysis
A. The Basic Principle
We wish to explain/summarize the underlying variance-covariance structure
of a large set of variables through a few linear combinations of these
variables. The objectives of principal components analysis are
- data reduction
- interpretation
- regression analysis
- cluster analysis
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
51 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
B. Population Principal Components
Suppose we have a population measured on p random variables X1,…,Xp.
Note that these random variables represent the p-axes of the Cartesian
coordinate system in which the population resides. Our goal is to develop a
new set of p axes (linear combinations of the original p axes) in the directions
of greatest variability:
X2
X1
This is accomplished by
rotating the axes.
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
52 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Consider our random vector
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
53 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
It is easy to show that
source of restrict to
nonlineari coefficient
ty vectors of unit
length
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
54 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
The second principal component is the linear combination of maximum
variance that is uncorrelated with the first principal component, i.e., we wish
to solve the nonlinear optimization problem
restricts
covarianc
e to zero
restricts
covarianc
es to zero
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
55 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Generally, the ith principal component is the linear combination of maximum
variance that is uncorrelated with all previous principal components, i.e., we
wish to solve the nonlinear optimization problem
We can show that, for random vector X with covariance matrix Σ and
eigenvalues λ1 ≥ λ2 ≥ ≥ λp ≥ 0, the ith principal component is given by
Note that the principal components are not unique if some eigenvalues are
equal.
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
56 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
We can also show for random vector X with covariance matrix Σ and
~
eigenvalue-eigenvector pairs (λ1 , e1), …, (λp , ep) where λ1 ≥ λ2 ≥ ≥ λp ,
~ ~
~
proportion of total
population variance due
to the kth principal
component
These values are often used in interpreting the principal components Yi.
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
58 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Example: Suppose we have the following population of four observations
made on three random variables X1, X2, and X3:
Find the three population principal components Y1, Y2, and Y3:
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
59 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
First we need the covariance matrix Σ:
~
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
60 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
so the principal components are:
Note that
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
61 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
and the proportion of total population variance due to the each principal
component is
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
62 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Next we obtain the correlations between the original random variables Xi
and the principal components Yi:
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
63 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
64 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
We can display these results in a correlation matrix:
- the first principal component (Y1) is a mixture of all three random variables
(X1, X2, and X3)
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
65 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
When the principal components are derived from an X ~ Np(μ,Σ) distributed
population, the density of X is constant on the μ-centered ellipsoids
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
66 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
We can set μ = 0 w.l.g. – we can then write
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
68 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Example: For the principal components derived from the following
population of four observations made on three random variables X1, X2, and
X3:
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
69 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
We will need the centroid μ:
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
70 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
We first graph the centroid:
X
2
3.0,10.0,15. X1
0
X
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
3
71 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
…then use the first eigenvector to find a second point on the first principal
axis:
X
2
Y1
X1
X
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
3
72 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
…then do the same thing with the second eigenvector:
Y2
X
2
Y1
X1
Y1
X1
X
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
3
74 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
What we have done is a rotation…
Y2
X
2
Y1
X1
Y3
X
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
3
75 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
and a translation in p = 3 dimensions.
Y2 Y2
X
2
Y1
X1
Y3
X
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
3
76 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Note that we can also construct principal components for the standardized
variables Zi:
Obviously
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
77 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
This suggests that the principal components for the standardized variables
Zi may be obtained from the eigenvectors of the correlation matrix ρ! The
operations are analogous to those used in conjunction with the covariance
matrix.
~
Note again that the principal components are not unique if some eigenvalues
are equal.
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
78 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
We can also show for random vector Z with covariance matrix ρ and
~
eigenvalue-eigenvector pairs (λ1 , e1), …, (λp , ep) where λ1 ≥ λ2 ≥ ≥ λp ,
~ ~
~
and we can again assess how well a subset of the principal components Yi
summarizes the original random variables Xi by using
proportion of total
population variance due
to the kth principal
component
If a large proportion of the total population variance can be attributed to
relatively few principal components, we can replace the original p variables
with these principal components without loss of much information!
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
79 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
Example: Suppose we have the following population of four observations
made on three random variables X1, X2, and X3:
Find the three population principal components variables Y1, Y2, and Y3 for
the standardized random variables Z1, Z2, and Z3:
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
80 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
We could standardize the variables X1, X2, and X3, then work with the
resulting covariance matrix Σ, but it is much easier to proceed directly with
correlation matrix ρ: ~
~
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
81 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
so the principal components are:
Note that
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
82 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
and the proportion of total population variance due to the each principal
component is
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
84 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
85 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1
We can display these results in a correlation matrix:
- the first principal component (Y1) is a mixture of all three random variables
(X1, X2, and X3)
IFETCE/H&S- II/MATHS/MATHIVADHANA/IYEAR/
86 M.E.(CSE)/I-SEM/MA7155/APPLIED PROBABILITY AND
STATISTICS /UNIT–V/PPT/VER1.1