4-1-PCA
4-1-PCA
• Extremely useful when working with data sets that have a lot of
features.
• image processing, compression, genome research etc.
PCA (cont.)
• Transforms high-dimensions data into lower-dimensions while
retaining as much information as possible.
How PCA Works
• Two-step Process
• The greater the variance, the more the information (and vice versa).
∑ − ̅
=
−1
Variance vs. Information
Person Height Person Height
Alex 145 Daniel 172
Ben 160 Elsa 171
Chris 185 Fernandez 173
Variance vs. Information (cont.)
−
=
2- Covariance Matrix
• To Identify correlations between variables, the covariance matrix is used.
( ) ( , ) ( , )
• For instance, for a 3-dimensional dataset: ( , ) ( ) ( , )
( , ) ( , ) ( )
• Principal components are new variables that are constructed as linear combinations
of initial variables.
• These combinations are done in such a way that the principal components are
uncorrelated and most of the information within the initial variables is squeezed into
the first component, then maximum remaining information in the second and so on.
3- Identify Principal Components (cont.)
• Geometrically speaking, principal components represent the directions of the data that
explain a maximal amount of variance, that is to say, the lines that capture most
information of the data.
3- Identify Principal Components (cont.)
• Can you guess the first principal component within the data?
• It’s approximately the line that matches the purple marks because it goes through the origin and it’s the line in
which the projection of the points (red dots) is the most spread out.
• It’s the line that maximizes the variance (the average of the squared distances from the projected points (red dots)
to the origin).
3- Identify Principal Components (cont.)
• The second principal component is calculated in the same way, with the condition
that it is uncorrelated with (i.e., perpendicular to) the first principal component
and that it accounts for the next highest variance.
• This continues until a total of p principal components have been calculated, equal
to the original number of variables.
4- Feature Vector
• Let’s suppose that our data set is 2-dimensional with 2 variables x, y and that the
eigenvectors and eigenvalues of the covariance matrix are as follows:
• If we rank the eigenvalues in descending order, we get λ1>λ2, which means that the
eigenvector that corresponds to the first principal component (PC1) is v1 and the one that
corresponds to the second principal component (PC2) is v2.
• PC1 and PC2 carry respectively = 96% and = 4% of the variance of the data.
4- Feature Vector (cont.)
• We can either form a feature vector with both of the eigenvectors v1 and v2:
Or discard the eigenvector v2, which is the one of lesser significance, and form a feature
vector with v1 only:
• Discarding the eigenvector v2 will reduce dimensionality by 1, and will consequently cause
a loss of information in the final data set. But given that v2 was carrying only 4 percent of
the information, the loss will be therefore not important and we will still have 96 percent
of the information that is carried by v1.
5- Recast Data
• In this step, the aim is to use the feature vector formed using the eigenvectors of the
covariance matrix, to reorient the data from the original axes to the ones represented by
the principal components.
• This can be done by multiplying the transpose of the original data set by the transpose of
the feature vector.
Further Reading
• Feature extraction vs. Feature selection
• Feature extraction: Transforming the existing feature into a lower dimensional space,
such as PCA and LDA
• Feature (subset) selection: Selecting a subset of existing features without a
transformation
• Feature selection requires
• A search strategy to select candidate subsets, such as sequential forward / backward
selection
• An objective function to evaluate these candidates, such as correlation
Source
• https://towardsdatascience.com/
• https://medium.com/
• https://builtin.com/data-science/step-step-explanation-principal-component-
analysis