Pattern Recognition Techniques
Pattern Recognition Techniques
RECOGNITION
TECHNIQUES
PRINCIPAL COMPONENT
ANALYSIS
NEXT
Introduction to PCA & its Purpose
Large datasets are increasingly widespread in many
disciplines. In order to interpret such datasets, methods are
required to drastically reduce their dimensionality in an
interpretable way, such that most of the information in the
data is preserved.
PCA is one of the oldest and most widely used for this
purpose.
Its idea is simple to reduce the dimensionality of a dataset,
while preserving as much ‘variability’ as possible.
It is adaptive in another sense too, since variants of the
technique have been developed that are tailored to various
different data types and structures.
Geometric intuition
Let’s take an example of a dataset with features f1
and f2.The data is standardized i.e. mean centered
and variance is 1.
We plot the points in the feature space.
Then we rotate our axes to find the best principal
component(z1 and z_2) such that the variance of
x_i projected onto z1 is maximal.
We drop z_2 later. So z_2 becomes our only
needed direction.
In PCA we are not changing the points present in
space. We change the axes , then coordinates of
each point will change but the actual location of
the points remains unchanged.
Mathematical objective function
Here d1 and d2 are the distances of the points x1 and x2 from the unit vector.
In the distance minimization interpretation of the PCA we have to minimize the distance of the points from the
unit vector selected so as to increase the spread on that particular axis only.
PCA: Dimensionality reduction
THANK YOU