0% found this document useful (0 votes)

7 views26 pages

Principal Component Analysis1

Principal Component Analysis (PCA) is a dimensionality reduction technique that transforms a large set of variables into a smaller set while preserving most of the information. It is particularly useful for datasets with many features, helping to reduce model training time, improve visualization, and mitigate issues like overfitting. PCA works by identifying principal components that capture maximum variance in the data, allowing for effective feature selection and easier data interpretation.

Uploaded by

kaustub

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views26 pages

Principal Component Analysis1

Uploaded by

kaustub

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 26

Principal Component

Analysis
What is PCA?
• Principal component analysis, or PCA, is a dimensionality reduction method that is
often used to reduce the dimensionality of large data sets, by transforming a large
set of variables into a smaller one that still contains most of the information in the
large set.

• Why reduction in dimensionality?

• smaller data sets are easier to explore and visualize and make analyzing data points much
easier and faster for machine learning algorithms without extraneous variables to process.
• Easy to interpret and visualize 1D (dot), 2D (line) or 3D (Plane) models compare to 4D, 5D or nD models(cannot
visualize only)

• PCA- reduce the number of variables of a data set, while preserving as much
information as possible.
• Goal – identifying most important feature from set of features – feature selection
Why PCA?
• PCA is extremely useful when working with data sets that have a lot of features.
• Problems- long model training time
• Difficult in visualization
• Difficult in interpretation of model
• Curse of dimensionality – reduced accuracy of machine learning models
• Over fitting
• Increased Computation Time

• Finding the time to read a 1000-pages book is a luxury that few can afford. Wouldn’t it be nice if we can summarize
the most important points in just 2 or 3 pages so that the information is easily digestible even by the busiest person?
We may lose some information in the process, but at least we get the big picture.  same applies to PCA

• How PCA will come to know  which are the most important features?
• Can PCA understand which part of data is most important?
• Can we mathematically quantify the amount of information embedded within the data?
• Variance can – greater the variance more the information
PCA
• Common applications
• image processing, genome research always have to deal with thousands-, if not tens of thousands
of columns.
How does PCA work?
• The greater the variance, the more the information
• Variance – measures the average degree to which each point differs
from the mean

• How variance is associated with amount of information?

How variance is associated with
amount of information?
• Playing a guessing game with friends
• Need to guess who’s who based solely on their height.

Without a doubt, I am going to say that

Person A is Chris, Person B is Alex, and
Person C is Ben.
How variance is associated with
amount of information?
• Previous group

Trying new group

Can you guess who’s who? It’s tough when they are
very similar in height.
Earlier, we had no trouble differentiating a 185cm
person from a 160cm and 145cm person because their
How variance is associated with
amount of information?
• In Principal Component Analysis, it is assumed that the information is carried in the variance of
the features, that is, the higher the variation in a feature, the more information that features
carries.
• Therefore, PCA chooses, features with higher variance for dimensionality reduction.
• PCA formally defined:
• Principal Component Analysis (PCA) is a technique for dimensionality reduction that identifies a set of orthogonal
axes, called principal components, that capture the maximum variance in the data. The principal components are
linear combinations of the original variables in the dataset and are ordered in decreasing order of importance. The
total variance captured by all the principal components is equal to the total variance in the original dataset.
How variance is associated with
amount of information? Continued…
• Round two: let’s guess a person based on height and weight both

• Doubled the amount of data on our friends

• However, weight differences are small (small variance)  it won’t help much in
differentiating friends
• So, again mostly relying on height (reduced data from 2-dimension to 1-dimension)
• PCA idea – selectively keep the variables with higher variances and then forget bout
the variables with lower variances.
What if features have same
variance?
• What if height and weight have the same variance?
• Does it mean no longer reduce the dimensionality of the data set?

It’s very difficult to choose the variables we want

to delete. If I throw away either one of the
variables, we are throwing away half of the
information.

Can we keep both?

What if features have same
variance?
• What if height and weight have the same variance?
• Does it mean no longer reduce the dimensionality of the data set?

The maximum amount of variance lies not in the

x-axis, not in the y-axis, but a diagonal line
across.

The second-largest variance would be a line 90

degrees that cuts through the first.
What if features have same
variance?

To represent these 2 lines, PCA combines both

height and weight to create two brand new
variables. It could be 30% height and 70%
weight, or 87.2% height and 13.8% weight, or
any other combinations depending on the data
that we have.

These two new variables are called the first

principal component (PC1) and the second
principal component (PC2). Rather than using
height and weight on the two axes, we can use
PC1 and PC2 respectively.
What if features have same
variance? PC1 alone can capture the total
variance of Height and Weight
combined. Since PC1 has all the
information, you already know the
drill — we can be very comfortable
in removing PC2 and know that our
new data is still representative of
Real time  data.
the original we won’t get a principal
component that captures 100% of the
variances.

Performing a PCA will give us N

number of principal components,
where N is equal to the
dimensionality of our original data.
From this list of principal
components,

Choose the least number of principal

components that would explain the
PCA
• Principal components are new variables that are constructed as linear combinations or mixtures of the initial variables.
• These combinations are done in such a way that the new variables (i.e., principal components) are uncorrelated and most
of the information within the initial variables is squeezed or compressed into the first components.
• So, the idea is 10-dimensional data gives you 10 principal components, but PCA tries to put maximum possible information
in the first component, then maximum remaining information in the second and so on, until having something like shown
in the scree plot below.
• Organizing information in principal components this way will allow you to reduce dimensionality without losing much
information, and this by discarding the components with low information and considering the remaining components as
your new variables.
Proportion of variance explained by
each Feature
When it comes to real data, more
often than not, we won’t get a
principal component that captures
100% of the variances.

Performing a PCA will give us N

number of principal components,
where N is equal to the
dimensionality of our original data.

From this list of principal

components, we generally choose the
least number of principal components
that would explain the most amount
of our original data.
Steps for PCA
Steps for PCA
1. Calculate the mean and standard deviation for each variable
• Calculate the mean and standard deviation for each variable
• Use mean and standard deviation to standardize the data
• Converting all of the variables so that they are on the same scale
• If variables are on different scales, then some may have more impact on the
principal components than others  this can lead to biased results
• Can use techniques like Z-score standardization
Steps for PCA
2. Calculate the covariance matrix
• Square matrix that tells how each variable relates to each other
3. Calculate the eigenvectors and eigen values
• To identify what the principal components are.
• Principal components are the vectors that define the new coordinate system for the data –
meaning they are the directions along which data varies most
• Eigenvectors are vectors that describe the direction of a linear transformation, while eigenvalues
are scalars that tell you the magnitude of the transformation.
• The aim of calculating these is to find out what the principal components in your data
• So you can transform your data into a new set of variables that's easier to visualize and interpret.
• To calculate this, take the matrix of data and multiply it by each eigenvector.
• This can give a new vector that's in the direction of the eigenvector with a length that's equal to
the eigenvalue.
• Sort the eigenvectors by descending order of the eigenvalues to identify the principal components.
Steps for PCA
4. Choose the principal components to keep
• Create a feature vector that helps you decide which principal components to keep.
• A feature vector is a matrix that tells you how important each principal component
is.
• This vector consists of the eigenvectors that correspond to the largest
eigenvalues.
• The number of principal components you choose to keep depends on how much
variance you want to explain.
• Choose enough components to explain at least 85% of the variance in your data.
• For example, if you have 10 variables in your data, you might choose to keep six
principal components which would explain around 60% of the variance.
• Discard the components that have lower eigenvalues of less significance, meaning
they don't explain as much of the variance.
Steps for PCA
5. Transform your data
• Transform your data into the new coordinate system that the principal components define. You can
do this by multiplying your original data matrix by the matrix of eigenvectors that you created in the
prior step.
• This can give you a new matrix with the same number of rows but fewer columns. This is your
transformed data and it's in a form that's easier to visualize and interpret.
PCA Solved
Given the data in Table, reduce the dimension from 2 to 1 using the Principal Component
Analysis (PCA) algorithm.

Step 1: -Calculate the mean of X1 and X2

PCA Solved
Given the data in Table, reduce the dimension from 2 to 1 using the Principal Component
Analysis (PCA) algorithm.

Step 2: Calculation of the covariance matrix.

The covariance matrix

is,
PCA Solved
Given the data in Table, reduce the dimension from 2 to 1 using the Principal Component Analysis
(PCA) algorithm.
Step 3: Eigenvalues of the covariance matrix
PCA Solved
Given the data in Table, reduce the dimension from 2 to 1 using the Principal Component Analysis
(PCA) algorithm.
Step 4: Computation of the eigenvectors

To find the first principal components, we need only compute the

eigenvector corresponding to the largest eigenvalue. In the present
example, the largest eigenvalue is λ 1 and so we compute the eigenvector
corresponding to λ1.

The eigenvector corresponding to λ = λ1 is a vector

Taking t as 1,t
is some real
number
PCA Solved
Given the data in Table, reduce the dimension from 2 to 1 using the Principal Component Analysis
(PCA) algorithm.
Therefore, a unit eigenvector corresponding to λ1 is

Step 4: Computation of the eigenvectors

By carrying out similar computations, the

unit eigenvector e corresponding to the
To find a unit eigenvector, we compute the length of 2

eigenvalue λ= λ2 can be shown to be,

U1 which is given by, the unit eigen vectors represent the
directions of maximum variance in the data. It is calculated
by Euclidean norm
PCA Solved
Given the data in Table, reduce the dimension from 2 to 1 using the Principal Component Analysis
(PCA) algorithm.

Step 5: Computation of first principal

components

Drive Axle and Differential Assembly Repair (Wet Brake - NMHG)
No ratings yet
Drive Axle and Differential Assembly Repair (Wet Brake - NMHG)
66 pages
BTech IT 2024 DetailedSyllabus NEW NEP
No ratings yet
BTech IT 2024 DetailedSyllabus NEW NEP
187 pages
Gloveox-Manual-MB20-200 Labmaster TP700 V4.1 MBI
100% (1)
Gloveox-Manual-MB20-200 Labmaster TP700 V4.1 MBI
350 pages
A. I. in Healthcare
100% (1)
A. I. in Healthcare
14 pages
PCA Using Python
No ratings yet
PCA Using Python
18 pages
Complex Baseband
No ratings yet
Complex Baseband
20 pages
Program: Course Code: Course Name:: M.C.A. MCAS9220 Data Science Fundamentals
No ratings yet
Program: Course Code: Course Name:: M.C.A. MCAS9220 Data Science Fundamentals
28 pages
Trees
No ratings yet
Trees
78 pages
IDS 4 (Week 14)
No ratings yet
IDS 4 (Week 14)
66 pages
EtherCAT Introduction en
No ratings yet
EtherCAT Introduction en
86 pages
Testing and Measuring Equipment/Allowed Subcontracting IEC 61010-1, Edition 3.0
100% (1)
Testing and Measuring Equipment/Allowed Subcontracting IEC 61010-1, Edition 3.0
7 pages
Cloud Data Center Network Architectures and Technologies
No ratings yet
Cloud Data Center Network Architectures and Technologies
38 pages
Rise City: Church Brand Guidelines
No ratings yet
Rise City: Church Brand Guidelines
29 pages
Qrm2024 Topic5 Pca Fa
No ratings yet
Qrm2024 Topic5 Pca Fa
67 pages
Burnham Independence I&O Manual11115
No ratings yet
Burnham Independence I&O Manual11115
96 pages
Module 3
No ratings yet
Module 3
41 pages
Pca 1692550768
No ratings yet
Pca 1692550768
13 pages
Principal Component Analysis: by Eesha Tur Razia Babar
No ratings yet
Principal Component Analysis: by Eesha Tur Razia Babar
38 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
20 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
34 pages
Support Vector Machine
No ratings yet
Support Vector Machine
28 pages
Pca - Principal Component Analysis 1233
No ratings yet
Pca - Principal Component Analysis 1233
30 pages
Dimension Reduction Techniques v1
No ratings yet
Dimension Reduction Techniques v1
14 pages
Unit 3
No ratings yet
Unit 3
31 pages
Introduction To Statistical Programming - PPT Week 2 - Descriptive Statistics
No ratings yet
Introduction To Statistical Programming - PPT Week 2 - Descriptive Statistics
54 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
9 pages
Neural Networks
No ratings yet
Neural Networks
19 pages
DR Pca
No ratings yet
DR Pca
22 pages
PCA Theory
No ratings yet
PCA Theory
13 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
27 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
19 pages
Unit V Foml
No ratings yet
Unit V Foml
18 pages
Opc Unit - 4
No ratings yet
Opc Unit - 4
16 pages
PCA Finds Representation Through Linear Transformation
No ratings yet
PCA Finds Representation Through Linear Transformation
28 pages
Pca
No ratings yet
Pca
18 pages
Clustering and Dimensionality Reduction Techniques PCA T SNE K Means
No ratings yet
Clustering and Dimensionality Reduction Techniques PCA T SNE K Means
15 pages
Pca Ica
No ratings yet
Pca Ica
34 pages
Workbook Juniper Logical System and Routing Protocols PDF
No ratings yet
Workbook Juniper Logical System and Routing Protocols PDF
15 pages
Ferath Kherif PCA
No ratings yet
Ferath Kherif PCA
17 pages
Recommendation Systems
No ratings yet
Recommendation Systems
12 pages
Data Analytics
No ratings yet
Data Analytics
28 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
1 page
Principal Components Analysis (PCA)
No ratings yet
Principal Components Analysis (PCA)
27 pages
1 s2.0 S0959652620323106 Main
No ratings yet
1 s2.0 S0959652620323106 Main
15 pages
What Is PCA?: Image Source
No ratings yet
What Is PCA?: Image Source
17 pages
Face Recognition PAC
No ratings yet
Face Recognition PAC
24 pages
PCA Dev
No ratings yet
PCA Dev
16 pages
Chapter Five Principal Comonent Analysis (PCA)
No ratings yet
Chapter Five Principal Comonent Analysis (PCA)
33 pages
16 Jaipur International Film Festival-JIFF 2024
No ratings yet
16 Jaipur International Film Festival-JIFF 2024
12 pages
1501589578da Mod15 Q1 e Text
No ratings yet
1501589578da Mod15 Q1 e Text
9 pages
DS Ca2 PPT 3010 3017
No ratings yet
DS Ca2 PPT 3010 3017
10 pages
Andrefsky-A Brief Introduction To The Lithic Analysis
No ratings yet
Andrefsky-A Brief Introduction To The Lithic Analysis
10 pages
U5@-Data Reduction
No ratings yet
U5@-Data Reduction
22 pages
10 ASAP Advanced Statistics Dimension Reduction
No ratings yet
10 ASAP Advanced Statistics Dimension Reduction
8 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
10 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
3.2 Pca
No ratings yet
3.2 Pca
27 pages
Read Job Information .
No ratings yet
Read Job Information .
6 pages
PCA
100% (1)
PCA
33 pages
Composite Landing Gear Components For Aerospace Applications
No ratings yet
Composite Landing Gear Components For Aerospace Applications
8 pages
10-601 Machine Learning (Fall 2010) Principal Component Analysis
No ratings yet
10-601 Machine Learning (Fall 2010) Principal Component Analysis
8 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
11 pages
Devoir PCA
No ratings yet
Devoir PCA
13 pages
6 Principal Component Analysis
No ratings yet
6 Principal Component Analysis
7 pages
Pyq Digital Social Science
No ratings yet
Pyq Digital Social Science
4 pages
STAT502
No ratings yet
STAT502
13 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
Pca 1
No ratings yet
Pca 1
3 pages
MT Ishtar Deck Defect List-1
No ratings yet
MT Ishtar Deck Defect List-1
12 pages
Love Report
No ratings yet
Love Report
7 pages
The Math Behind PCA
No ratings yet
The Math Behind PCA
3 pages
Module 2 Lab 2
No ratings yet
Module 2 Lab 2
5 pages
U4 - PCA - 5th Sem - DS
No ratings yet
U4 - PCA - 5th Sem - DS
14 pages
A Step by Step Explanation of Principal Component Analysis
No ratings yet
A Step by Step Explanation of Principal Component Analysis
7 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
6 pages
Turbo HD DVR V3.1.13 Release Notes
No ratings yet
Turbo HD DVR V3.1.13 Release Notes
2 pages
Program 3
No ratings yet
Program 3
7 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
6 pages
Love Report 1
No ratings yet
Love Report 1
10 pages
03 Principal Components Analysis
No ratings yet
03 Principal Components Analysis
3 pages
00 VESDA-E Power Supplies - (European) TDS A4 IE Lores
No ratings yet
00 VESDA-E Power Supplies - (European) TDS A4 IE Lores
2 pages
Cheat Sheet
No ratings yet
Cheat Sheet
2 pages
3 Hours / 70 Marks: Seat No
No ratings yet
3 Hours / 70 Marks: Seat No
4 pages
DCF6060 Arleca
No ratings yet
DCF6060 Arleca
2 pages
User Manual: Multifunction Tennis Ball Machine
No ratings yet
User Manual: Multifunction Tennis Ball Machine
4 pages
Need of Principal Component Analysis
No ratings yet
Need of Principal Component Analysis
8 pages
Sectigo JD
No ratings yet
Sectigo JD
2 pages
Article - Different Houses
No ratings yet
Article - Different Houses
1 page
1D PDF
No ratings yet
1D PDF
1 page
Principal Component Analysis
No ratings yet
Principal Component Analysis
2 pages
PCA Explained Stepbystep
No ratings yet
PCA Explained Stepbystep
4 pages
Int242 Cyber Security Essentials
No ratings yet
Int242 Cyber Security Essentials
2 pages
Research 2 First Quarter Examination
No ratings yet
Research 2 First Quarter Examination
5 pages
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Principal Component Analysis1

Uploaded by

Principal Component Analysis1

Uploaded by

Principal Component

• Why reduction in dimensionality?

• How variance is associated with amount of information?

Without a doubt, I am going to say that

Trying new group

• Doubled the amount of data on our friends

It’s very difficult to choose the variables we want

Can we keep both?

The maximum amount of variance lies not in the

The second-largest variance would be a line 90

To represent these 2 lines, PCA combines both

These two new variables are called the first

Performing a PCA will give us N

Choose the least number of principal

Performing a PCA will give us N

From this list of principal

Step 1: -Calculate the mean of X1 and X2

Step 2: Calculation of the covariance matrix.

The covariance matrix

To find the first principal components, we need only compute the

The eigenvector corresponding to λ = λ1 is a vector

Step 4: Computation of the eigenvectors

By carrying out similar computations, the

eigenvalue λ= λ2 can be shown to be,

Step 5: Computation of first principal

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.