0% found this document useful (0 votes)
19 views5 pages

Linear Algebra

The document discusses principal component analysis (PCA) and its use in dimensionality reduction and mitigating issues like multicollinearity. PCA transforms datasets into principal components that capture maximum variance in the data using linear combinations of original variables. It finds application in preprocessing data for machine learning to extract informative features while retaining relevant information.

Uploaded by

Chaitanya Birla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views5 pages

Linear Algebra

The document discusses principal component analysis (PCA) and its use in dimensionality reduction and mitigating issues like multicollinearity. PCA transforms datasets into principal components that capture maximum variance in the data using linear combinations of original variables. It finds application in preprocessing data for machine learning to extract informative features while retaining relevant information.

Uploaded by

Chaitanya Birla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Principal Component Analysis in Bank Rate Prediction Model to

Reduce Multicollinearity

Linear Algebra

Under the guidance of

SUBMITTED BY
INDEX

S No. Topic Pg No.

1 Introduction

2 Math Behind PCA

3 Problem Statement

4 Solution

5 Conclusion

6 References
INTRODUCTION

Principal Component Analysis (PCA) is a dimensionality reduction technique used to transform


large datasets into a smaller set of variables known as principal components. Developed by Karl
Pearson in 1901, PCA gained popularity with the development of high-end computers, enabling
fast multivariate statistical computations at scale. By converting potentially correlated variables
into principal components, PCA simplifies data visualization, exploration and model building,
making it particularly effective for high-dimensional datasets. PCA's utility extends beyond
regression analysis to various domains, including pattern recognition, signal processing, and
image processing.

PCA finds extensive application in preprocessing data for machine learning algorithms. It
extracts informative features while retaining the most relevant information, thus mitigating the
"curse of dimensionality," where adding features can degrade model performance. By projecting
high-dimensional data into a reduced feature space, PCA addresses issues like multicollinearity
and overfitting. Multicollinearity, which arises from highly correlated independent variables, can
hinder causal modeling, while overfit models generalize poorly to new data.

Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) are dimension
reduction techniques, with PCA being versatile for both supervised and unsupervised tasks,
while LDA is primarily used in supervised learning. In unsupervised scenarios, PCA can reduce
dimensions without considering class labels or categories, unlike LDA. Additionally, PCA is
closely related to factor analysis, as both methods aim to minimize information loss while
reducing the number of dimensions or variables in a dataset.

Comparing PCA to K-means clustering, both are unsupervised techniques serving distinct
purposes. PCA transforms a dataset by creating new variables (principal components) as linear
combinations of the original variables, effectively reducing dimensionality. In contrast, K-means
clustering identifies clusters within the data based on the similarity of data points to cluster
centers. PCA is favored for exploratory data analysis and dimensionality reduction, while K-
means clustering is useful for identifying clusters in the data. Each technique plays a vital role in
data analysis, catering to different objectives and complementing each other in various
applications.

MATH BEHIND PCA

Principal Component Analysis (PCA) condenses the information within large datasets into a
smaller set of independent variables called principal components. These components, formed as
linear combinations of original variables, capture the maximum variance in the data. PCA
leverages linear algebra and matrix operations to transform the dataset into a new coordinate
system defined by these principal components, derived from eigenvectors and eigenvalues of the
covariance matrix.

Eigenvectors signify the directions of variance in this plot, while eigenvalues quantify the
importance of these directional data. Since principal components represent maximal variance
directions, they align with the eigenvectors of the covariance matrix. The two primary principal
components, denotes as PC1 and PC2, capture the highest and subsequent variances,
respectively. PC2 will always be orthogonal to PC1 to ensure independence between the
principal components. If there are anymore principal components are included, also remain
uncorrelated and explain any remaining variation in the data.

PC1 decides the direction of highest variance, encapsulating the majority of information from the
original dataset. Meanwhile, PC2 captures the next highest variance. The relationship between
PC1 and PC2 is depicted in a scatterplot below, where the axes are perpendicular.
(Zakaria, 2024)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy