0% found this document useful (0 votes)

4 views21 pages

4-1-PCA

Principal Component Analysis (PCA) is an unsupervised learning method used for visualization and dimensionality reduction, particularly effective with high-dimensional datasets. It transforms high-dimensional data into lower dimensions while retaining as much information as possible, primarily through the analysis of variance. The process involves standardizing data, computing a covariance matrix, identifying principal components via eigenvectors and eigenvalues, and recasting data along these new axes.

Uploaded by

wasiqbarat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views21 pages

4-1-PCA

Uploaded by

wasiqbarat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Principal Component Analysis

Principal Component Analysis (PCA)

• Supervised vs. Unsupervised learning
• PCA is an unsupervised method

• Visualization and Dimensionality Reduction

• Curse of dimensionality

• Extremely useful when working with data sets that have a lot of
features.
• image processing, compression, genome research etc.
PCA (cont.)
• Transforms high-dimensions data into lower-dimensions while
retaining as much information as possible.
How PCA Works
• Two-step Process

• Understand which part of our data is important. How?

• Mathematically quantify the amount of information embedded within the data. How?

• Answer: through the Variance

• The greater the variance, the more the information (and vice versa).

∑ − ̅
=
−1
Variance vs. Information
Person Height Person Height
Alex 145 Daniel 172
Ben 160 Elsa 171
Chris 185 Fernandez 173
Variance vs. Information (cont.)

Person Height (cm) Weight (kg)

Alex 145 68
Ben 160 67
Chris 185 69

• The weight variance is so small (little information), so it doesn’t help

differentiate our friends at all.
• Still rely mostly on height to make guesses (variable with higher variance).
• Reducing our data from 2-dimensions to 1-dimension.
Principal Components
• What if height and weight have the same variance?
• Can we keep both?
• Can we combine them?
• The maximum amount of variance lies not in the x-axis,
nor in the y-axis, but a diagonal line across.
• The second-largest variance would be a line 90 degrees that
cuts through the first.
• To represent these 2 lines, PCA combines both height
and weight to create two new variables. It could be 30%
height and 70% weight or any other combinations
depending on the data.
• These two new variables are called the first principal
component (PC1) and the second principal component
(PC2).
Principal Components (cont.)
• Instead of using height and weight on the two
axes, we can use PC1 and PC2 respectively.
• PC1 alone can capture the total variance of
Height and Weight combined.
• We can safely remove PC2 and know that our new
data is still representative of the original data.
--------------------------------------------------
• When it comes to real data, we won’t get a
principal component that captures 100% of the
variances.
Feature Variance Feature Variance
• We generally choose the least number of
principal components that would explain the Height 1.11 PC1 2.22
most amount of our original data. Weight 1.11 PC2 0
Total 2.22 Total 2.22
Computing Principal Components
• Principal components are basically vectors that are linearly uncorrelated and have a variance
within data.
• From the principal components top p is picked which have the most variance.

• Eigenvectors and Eigenvalues

• Property of a matrix which satisfies the following equation:
=
where denotes the matrix, denotes the eigenvector and denotes the eigenvalues.
• Are principal components the eigenvectors of the covariance matrix?
• By finding the eigenvalues and eigenvectors of the covariance matrix, we find that the
eigenvectors with the largest eigenvalues correspond to the dimensions that have the
strongest correlation in the dataset. This is the principal component.
Principal Components Analysis
• How to do a Principal Component Analysis

1. Standardize the range of continuous initial variables

2. Compute the covariance matrix to identify correlations
3. Compute the eigenvectors and eigenvalues of the covariance matrix to
identify the principal components
4. Create a feature vector to decide which principal components to keep
5. Recast the data along the principal components axes
1- Standardization
• If there are large differences between the ranges of variables, those
variables with larger ranges will dominate over those with small ranges
• To Prevent biased results

−
=
2- Covariance Matrix
• To Identify correlations between variables, the covariance matrix is used.

( ) ( , ) ( , )
• For instance, for a 3-dimensional dataset: ( , ) ( ) ( , )
( , ) ( , ) ( )

• It’s actually the sign of the covariance that matters

• If positive then: the two variables increase or decrease together (correlated)
• If negative then: one increases when the other decreases (Inversely correlated)
3- Identify Principal Components
• We need to compute Eigenvectors and Eigenvalues of the covariance matrix to
determine the principal components of the data.

• Principal components are new variables that are constructed as linear combinations
of initial variables.

• These combinations are done in such a way that the principal components are
uncorrelated and most of the information within the initial variables is squeezed into
the first component, then maximum remaining information in the second and so on.
3- Identify Principal Components (cont.)
• Geometrically speaking, principal components represent the directions of the data that
explain a maximal amount of variance, that is to say, the lines that capture most
information of the data.
3- Identify Principal Components (cont.)
• Can you guess the first principal component within the data?
• It’s approximately the line that matches the purple marks because it goes through the origin and it’s the line in
which the projection of the points (red dots) is the most spread out.
• It’s the line that maximizes the variance (the average of the squared distances from the projected points (red dots)
to the origin).
3- Identify Principal Components (cont.)
• The second principal component is calculated in the same way, with the condition
that it is uncorrelated with (i.e., perpendicular to) the first principal component
and that it accounts for the next highest variance.

• This continues until a total of p principal components have been calculated, equal
to the original number of variables.
4- Feature Vector
• Let’s suppose that our data set is 2-dimensional with 2 variables x, y and that the
eigenvectors and eigenvalues of the covariance matrix are as follows:

• If we rank the eigenvalues in descending order, we get λ1>λ2, which means that the
eigenvector that corresponds to the first principal component (PC1) is v1 and the one that
corresponds to the second principal component (PC2) is v2.

• PC1 and PC2 carry respectively = 96% and = 4% of the variance of the data.
4- Feature Vector (cont.)
• We can either form a feature vector with both of the eigenvectors v1 and v2:

Or discard the eigenvector v2, which is the one of lesser significance, and form a feature
vector with v1 only:

• Discarding the eigenvector v2 will reduce dimensionality by 1, and will consequently cause
a loss of information in the final data set. But given that v2 was carrying only 4 percent of
the information, the loss will be therefore not important and we will still have 96 percent
of the information that is carried by v1.
5- Recast Data
• In this step, the aim is to use the feature vector formed using the eigenvectors of the
covariance matrix, to reorient the data from the original axes to the ones represented by
the principal components.
• This can be done by multiplying the transpose of the original data set by the transpose of
the feature vector.
Further Reading
• Feature extraction vs. Feature selection
• Feature extraction: Transforming the existing feature into a lower dimensional space,
such as PCA and LDA
• Feature (subset) selection: Selecting a subset of existing features without a
transformation
• Feature selection requires
• A search strategy to select candidate subsets, such as sequential forward / backward
selection
• An objective function to evaluate these candidates, such as correlation
Source
• https://towardsdatascience.com/

• https://medium.com/

• https://builtin.com/data-science/step-step-explanation-principal-component-
analysis

Lecture FPCA
No ratings yet
Lecture FPCA
67 pages
Pca
No ratings yet
Pca
28 pages
Week 9 Lecture - Revision Test-dual-translated
No ratings yet
Week 9 Lecture - Revision Test-dual-translated
92 pages
Principal Component Analysis1
No ratings yet
Principal Component Analysis1
26 pages
Chapter2 PCA
No ratings yet
Chapter2 PCA
65 pages
DimensionalitY Reduction
No ratings yet
DimensionalitY Reduction
29 pages
A Step-By-Step Explanation of Principal Component Analysis (PCA) - Built in
No ratings yet
A Step-By-Step Explanation of Principal Component Analysis (PCA) - Built in
8 pages
Remote Sensing Assignment
No ratings yet
Remote Sensing Assignment
10 pages
16. Principal Component Analysis
No ratings yet
16. Principal Component Analysis
27 pages
RES805-RM-Module 2
No ratings yet
RES805-RM-Module 2
26 pages
ACPusingR
No ratings yet
ACPusingR
25 pages
L3
No ratings yet
L3
38 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
34 pages
2. PCA
No ratings yet
2. PCA
22 pages
L 10 Principal Component Analysis 09052024 072206pm
No ratings yet
L 10 Principal Component Analysis 09052024 072206pm
37 pages
1501589578da-mod15-Q1-e-text
No ratings yet
1501589578da-mod15-Q1-e-text
9 pages
Qrm2024 Topic5 Pca Fa
No ratings yet
Qrm2024 Topic5 Pca Fa
67 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
5 pages
Principal Computer Analysis(PCA)
No ratings yet
Principal Computer Analysis(PCA)
25 pages
DR Pca
No ratings yet
DR Pca
22 pages
Unit V Foml
No ratings yet
Unit V Foml
18 pages
ML Unit - 3 DimensionalitY Reduction
No ratings yet
ML Unit - 3 DimensionalitY Reduction
39 pages
pca1
No ratings yet
pca1
3 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
13 pages
Kinya Sharon - Ass2 - Machine Learning
No ratings yet
Kinya Sharon - Ass2 - Machine Learning
12 pages
Principal Component Analysis (PCA)
No ratings yet
Principal Component Analysis (PCA)
18 pages
Pca
No ratings yet
Pca
18 pages
Principal Component Analysis (PCA) Explained - Built in
No ratings yet
Principal Component Analysis (PCA) Explained - Built in
11 pages
Pattern Recognition PCA: Subrata Datta Dept. of AIML Nsec
No ratings yet
Pattern Recognition PCA: Subrata Datta Dept. of AIML Nsec
19 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
14 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
10 pages
Pca Kmeans GMM
No ratings yet
Pca Kmeans GMM
96 pages
PCA Finds Representation Through Linear Transformation
No ratings yet
PCA Finds Representation Through Linear Transformation
28 pages
PCA Complete
No ratings yet
PCA Complete
8 pages
Principal Components Analysis (PCA)
No ratings yet
Principal Components Analysis (PCA)
27 pages
Dimensionality Reduction (Principal Component Analysis)
No ratings yet
Dimensionality Reduction (Principal Component Analysis)
12 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
82 pages
PCA_dev
No ratings yet
PCA_dev
16 pages
How Do You Do A Principal Component Analysis?
No ratings yet
How Do You Do A Principal Component Analysis?
13 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
10 pages
6 Principal Component Analysis
No ratings yet
6 Principal Component Analysis
7 pages
Module 2 Lab 2
No ratings yet
Module 2 Lab 2
5 pages
U5@-Data Reduction
No ratings yet
U5@-Data Reduction
22 pages
Steps of PCA
No ratings yet
Steps of PCA
2 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
Unit-3
No ratings yet
Unit-3
28 pages
3.2 Pca
No ratings yet
3.2 Pca
27 pages
A Step by Step Explanation of Principal Component Analysis
No ratings yet
A Step by Step Explanation of Principal Component Analysis
7 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
9 pages
Principal Component Analysis - Intro - Towards Data Science
No ratings yet
Principal Component Analysis - Intro - Towards Data Science
4 pages
Steps for PCA
No ratings yet
Steps for PCA
5 pages
PCA_Notes
No ratings yet
PCA_Notes
3 pages
U4 - PCA - 5th Sem - DS
No ratings yet
U4 - PCA - 5th Sem - DS
14 pages
PCA - Ensemble Classifiers
No ratings yet
PCA - Ensemble Classifiers
9 pages
Principal Component Analysis Concepts: T56Gzsrvah
No ratings yet
Principal Component Analysis Concepts: T56Gzsrvah
16 pages
Need of Principal Component Analysis
No ratings yet
Need of Principal Component Analysis
8 pages
Feature Extraction: - Saheni Patra
No ratings yet
Feature Extraction: - Saheni Patra
17 pages
Finite Element Analysis of Steel Structures - A Review of Useful Guidelines
No ratings yet
Finite Element Analysis of Steel Structures - A Review of Useful Guidelines
28 pages
What's Done Cannot Be Undone: TASI Lectures On Non-Invertible Symmetry
No ratings yet
What's Done Cannot Be Undone: TASI Lectures On Non-Invertible Symmetry
93 pages
PCA Explained Stepbystep
No ratings yet
PCA Explained Stepbystep
4 pages
DHANALAKSHMI SRINIVASAN COLLEGE OF ENGG AND TECHNOLOGY
No ratings yet
DHANALAKSHMI SRINIVASAN COLLEGE OF ENGG AND TECHNOLOGY
22 pages
Complex Number - Wikipedia PDF
No ratings yet
Complex Number - Wikipedia PDF
133 pages
Engineering Thermodynamics - Rwanda Polytechnics
No ratings yet
Engineering Thermodynamics - Rwanda Polytechnics
78 pages
Probability and Random Processes (15B11MA301 (15B11MA301 Probability and Random Processes 15B11MA301) 15B11MA301)
No ratings yet
Probability and Random Processes (15B11MA301 (15B11MA301 Probability and Random Processes 15B11MA301) 15B11MA301)
16 pages
106d95d6-6801-4884-8ee9-45731c8be42d
No ratings yet
106d95d6-6801-4884-8ee9-45731c8be42d
1 page
IMSO 2016 - Science Theoritical 1
No ratings yet
IMSO 2016 - Science Theoritical 1
13 pages
Lecture#07
No ratings yet
Lecture#07
15 pages
Dr. Sanjay Kumar Mandal
No ratings yet
Dr. Sanjay Kumar Mandal
13 pages
Center of Mass _ DPP 07 (Extra) __ Arjuna JEE 2024
No ratings yet
Center of Mass _ DPP 07 (Extra) __ Arjuna JEE 2024
3 pages
Stat Doc Pract 6,7,8
No ratings yet
Stat Doc Pract 6,7,8
17 pages
Characterization of The Diffusion Properties of Chromium in Stainless-Steel Oxides by
No ratings yet
Characterization of The Diffusion Properties of Chromium in Stainless-Steel Oxides by
7 pages
SCL Model
No ratings yet
SCL Model
9 pages
Quantum Technologies Execsum
No ratings yet
Quantum Technologies Execsum
24 pages
WAEC Syllabus For General Mathematics or Mathematics
No ratings yet
WAEC Syllabus For General Mathematics or Mathematics
20 pages
Lab #6 Discharge Over Weirs
No ratings yet
Lab #6 Discharge Over Weirs
13 pages
Quotation For Calibration (19005 20 005)
No ratings yet
Quotation For Calibration (19005 20 005)
2 pages
Building & Structural Construction n6 Sample Test
25% (4)
Building & Structural Construction n6 Sample Test
5 pages
GMATH Module 1
No ratings yet
GMATH Module 1
2 pages
Pengertian Dan Ruang Lingkup Vektor
No ratings yet
Pengertian Dan Ruang Lingkup Vektor
10 pages
Mat Jee
No ratings yet
Mat Jee
4 pages
Light & Optics CP
No ratings yet
Light & Optics CP
5 pages
Prd-Wi-04 Moulding
No ratings yet
Prd-Wi-04 Moulding
2 pages
Integers
No ratings yet
Integers
3 pages
02-Firecel 100 9 Sr-114e
No ratings yet
02-Firecel 100 9 Sr-114e
1 page
Mathematical Structure: For The Notion of "Structure" In, See - For Structures in Category Theory, See
No ratings yet
Mathematical Structure: For The Notion of "Structure" In, See - For Structures in Category Theory, See
4 pages
Teaching Plan MQC Sem-I 19-20
No ratings yet
Teaching Plan MQC Sem-I 19-20
8 pages
Tutorial Sheet - 1 (Analysis of Forces) - Engineering Mechanics (AM12101)
0% (1)
Tutorial Sheet - 1 (Analysis of Forces) - Engineering Mechanics (AM12101)
2 pages
Chapter 11, Solution 15.: DV A KT DT DV V DV KT DT
No ratings yet
Chapter 11, Solution 15.: DV A KT DT DV V DV KT DT
1 page
Mathematics for Data Science: Linear Algebra with Matlab
From Everand
Mathematics for Data Science: Linear Algebra with Matlab
César Pérez López
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

4-1-PCA

Uploaded by

4-1-PCA

Uploaded by

Principal Component Analysis

Principal Component Analysis (PCA)

• Visualization and Dimensionality Reduction

• Understand which part of our data is important. How?

• Answer: through the Variance

Person Height (cm) Weight (kg)

• The weight variance is so small (little information), so it doesn’t help

• Eigenvectors and Eigenvalues

1. Standardize the range of continuous initial variables

• It’s actually the sign of the covariance that matters

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.