0% found this document useful (0 votes)
7 views26 pages

Principal Component Analysis1

Principal Component Analysis (PCA) is a dimensionality reduction technique that transforms a large set of variables into a smaller set while preserving most of the information. It is particularly useful for datasets with many features, helping to reduce model training time, improve visualization, and mitigate issues like overfitting. PCA works by identifying principal components that capture maximum variance in the data, allowing for effective feature selection and easier data interpretation.

Uploaded by

kaustub
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views26 pages

Principal Component Analysis1

Principal Component Analysis (PCA) is a dimensionality reduction technique that transforms a large set of variables into a smaller set while preserving most of the information. It is particularly useful for datasets with many features, helping to reduce model training time, improve visualization, and mitigate issues like overfitting. PCA works by identifying principal components that capture maximum variance in the data, allowing for effective feature selection and easier data interpretation.

Uploaded by

kaustub
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 26

Principal Component

Analysis
What is PCA?
• Principal component analysis, or PCA, is a dimensionality reduction method that is
often used to reduce the dimensionality of large data sets, by transforming a large
set of variables into a smaller one that still contains most of the information in the
large set.

• Why reduction in dimensionality?


• smaller data sets are easier to explore and visualize and make analyzing data points much
easier and faster for machine learning algorithms without extraneous variables to process.
• Easy to interpret and visualize 1D (dot), 2D (line) or 3D (Plane) models compare to 4D, 5D or nD models(cannot
visualize only)

• PCA- reduce the number of variables of a data set, while preserving as much
information as possible.
• Goal – identifying most important feature from set of features – feature selection
Why PCA?
• PCA is extremely useful when working with data sets that have a lot of features.
• Problems- long model training time
• Difficult in visualization
• Difficult in interpretation of model
• Curse of dimensionality – reduced accuracy of machine learning models
• Over fitting
• Increased Computation Time

• Finding the time to read a 1000-pages book is a luxury that few can afford. Wouldn’t it be nice if we can summarize
the most important points in just 2 or 3 pages so that the information is easily digestible even by the busiest person?
We may lose some information in the process, but at least we get the big picture.  same applies to PCA

• How PCA will come to know  which are the most important features?
• Can PCA understand which part of data is most important?
• Can we mathematically quantify the amount of information embedded within the data?
• Variance can – greater the variance more the information
PCA
• Common applications
• image processing, genome research always have to deal with thousands-, if not tens of thousands
of columns.
How does PCA work?
• The greater the variance, the more the information
• Variance – measures the average degree to which each point differs
from the mean

• How variance is associated with amount of information?


How variance is associated with
amount of information?
• Playing a guessing game with friends
• Need to guess who’s who based solely on their height.

Without a doubt, I am going to say that


Person A is Chris, Person B is Alex, and
Person C is Ben.
How variance is associated with
amount of information?
• Previous group

Trying new group

Can you guess who’s who? It’s tough when they are
very similar in height.
Earlier, we had no trouble differentiating a 185cm
person from a 160cm and 145cm person because their
How variance is associated with
amount of information?
• In Principal Component Analysis, it is assumed that the information is carried in the variance of
the features, that is, the higher the variation in a feature, the more information that features
carries.
• Therefore, PCA chooses, features with higher variance for dimensionality reduction.
• PCA formally defined:
• Principal Component Analysis (PCA) is a technique for dimensionality reduction that identifies a set of orthogonal
axes, called principal components, that capture the maximum variance in the data. The principal components are
linear combinations of the original variables in the dataset and are ordered in decreasing order of importance. The
total variance captured by all the principal components is equal to the total variance in the original dataset.
How variance is associated with
amount of information? Continued…
• Round two: let’s guess a person based on height and weight both

• Doubled the amount of data on our friends


• However, weight differences are small (small variance)  it won’t help much in
differentiating friends
• So, again mostly relying on height (reduced data from 2-dimension to 1-dimension)
• PCA idea – selectively keep the variables with higher variances and then forget bout
the variables with lower variances.
What if features have same
variance?
• What if height and weight have the same variance?
• Does it mean no longer reduce the dimensionality of the data set?

It’s very difficult to choose the variables we want


to delete. If I throw away either one of the
variables, we are throwing away half of the
information.

Can we keep both?


What if features have same
variance?
• What if height and weight have the same variance?
• Does it mean no longer reduce the dimensionality of the data set?

The maximum amount of variance lies not in the


x-axis, not in the y-axis, but a diagonal line
across.

The second-largest variance would be a line 90


degrees that cuts through the first.
What if features have same
variance?

To represent these 2 lines, PCA combines both


height and weight to create two brand new
variables. It could be 30% height and 70%
weight, or 87.2% height and 13.8% weight, or
any other combinations depending on the data
that we have.

These two new variables are called the first


principal component (PC1) and the second
principal component (PC2). Rather than using
height and weight on the two axes, we can use
PC1 and PC2 respectively.
What if features have same
variance? PC1 alone can capture the total
variance of Height and Weight
combined. Since PC1 has all the
information, you already know the
drill — we can be very comfortable
in removing PC2 and know that our
new data is still representative of
Real time  data.
the original we won’t get a principal
component that captures 100% of the
variances.

Performing a PCA will give us N


number of principal components,
where N is equal to the
dimensionality of our original data.
From this list of principal
components,

Choose the least number of principal


components that would explain the
PCA
• Principal components are new variables that are constructed as linear combinations or mixtures of the initial variables.
• These combinations are done in such a way that the new variables (i.e., principal components) are uncorrelated and most
of the information within the initial variables is squeezed or compressed into the first components.
• So, the idea is 10-dimensional data gives you 10 principal components, but PCA tries to put maximum possible information
in the first component, then maximum remaining information in the second and so on, until having something like shown
in the scree plot below.
• Organizing information in principal components this way will allow you to reduce dimensionality without losing much
information, and this by discarding the components with low information and considering the remaining components as
your new variables.
Proportion of variance explained by
each Feature
When it comes to real data, more
often than not, we won’t get a
principal component that captures
100% of the variances.

Performing a PCA will give us N


number of principal components,
where N is equal to the
dimensionality of our original data.

From this list of principal


components, we generally choose the
least number of principal components
that would explain the most amount
of our original data.
Steps for PCA
Steps for PCA
1. Calculate the mean and standard deviation for each variable
• Calculate the mean and standard deviation for each variable
• Use mean and standard deviation to standardize the data
• Converting all of the variables so that they are on the same scale
• If variables are on different scales, then some may have more impact on the
principal components than others  this can lead to biased results
• Can use techniques like Z-score standardization
Steps for PCA
2. Calculate the covariance matrix
• Square matrix that tells how each variable relates to each other
3. Calculate the eigenvectors and eigen values
• To identify what the principal components are.
• Principal components are the vectors that define the new coordinate system for the data –
meaning they are the directions along which data varies most
• Eigenvectors are vectors that describe the direction of a linear transformation, while eigenvalues
are scalars that tell you the magnitude of the transformation.
• The aim of calculating these is to find out what the principal components in your data
• So you can transform your data into a new set of variables that's easier to visualize and interpret.
• To calculate this, take the matrix of data and multiply it by each eigenvector.
• This can give a new vector that's in the direction of the eigenvector with a length that's equal to
the eigenvalue.
• Sort the eigenvectors by descending order of the eigenvalues to identify the principal components.
Steps for PCA
4. Choose the principal components to keep
• Create a feature vector that helps you decide which principal components to keep.
• A feature vector is a matrix that tells you how important each principal component
is.
• This vector consists of the eigenvectors that correspond to the largest
eigenvalues.
• The number of principal components you choose to keep depends on how much
variance you want to explain.
• Choose enough components to explain at least 85% of the variance in your data.
• For example, if you have 10 variables in your data, you might choose to keep six
principal components which would explain around 60% of the variance.
• Discard the components that have lower eigenvalues of less significance, meaning
they don't explain as much of the variance.
Steps for PCA
5. Transform your data
• Transform your data into the new coordinate system that the principal components define. You can
do this by multiplying your original data matrix by the matrix of eigenvectors that you created in the
prior step.
• This can give you a new matrix with the same number of rows but fewer columns. This is your
transformed data and it's in a form that's easier to visualize and interpret.
PCA Solved
Given the data in Table, reduce the dimension from 2 to 1 using the Principal Component
Analysis (PCA) algorithm.

Step 1: -Calculate the mean of X1 and X2


PCA Solved
Given the data in Table, reduce the dimension from 2 to 1 using the Principal Component
Analysis (PCA) algorithm.

Step 2: Calculation of the covariance matrix.

The covariance matrix


is,
PCA Solved
Given the data in Table, reduce the dimension from 2 to 1 using the Principal Component Analysis
(PCA) algorithm.
Step 3: Eigenvalues of the covariance matrix
PCA Solved
Given the data in Table, reduce the dimension from 2 to 1 using the Principal Component Analysis
(PCA) algorithm.
Step 4: Computation of the eigenvectors

To find the first principal components, we need only compute the


eigenvector corresponding to the largest eigenvalue. In the present
example, the largest eigenvalue is λ 1 and so we compute the eigenvector
corresponding to λ1.

The eigenvector corresponding to λ = λ1 is a vector

Taking t as 1,t
is some real
number
PCA Solved
Given the data in Table, reduce the dimension from 2 to 1 using the Principal Component Analysis
(PCA) algorithm.
Therefore, a unit eigenvector corresponding to λ1 is

Step 4: Computation of the eigenvectors

By carrying out similar computations, the


unit eigenvector e corresponding to the
To find a unit eigenvector, we compute the length of 2

eigenvalue λ= λ2 can be shown to be,


U1 which is given by, the unit eigen vectors represent the
directions of maximum variance in the data. It is calculated
by Euclidean norm
PCA Solved
Given the data in Table, reduce the dimension from 2 to 1 using the Principal Component Analysis
(PCA) algorithm.

Step 5: Computation of first principal


components

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy