0% found this document useful (0 votes)

28 views22 pages

Class8-9 DataPreprocessing DataReduction 30Sept-05Oct2020

Uploaded by

Rajat Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views22 pages

Class8-9 DataPreprocessing DataReduction 30Sept-05Oct2020

Uploaded by

Rajat Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

30-09-2020

Data Preprocessing
Data Reduction

Data Reduction
• Data reduction techniques are applied to obtain a
reduced representation of the dataset that is much
smaller in volume, yet closely maintain the integrity of
the original data
• The mining on the reduced dataset should produce
the same or almost same analytical results
• Different strategies:
– Attribute subset selection (feature selection):
• Irrelevant, weekly relevant or redundant attributes
(dimensions) are detected and removed
– Dimensionality reduction:
• Encoding mechanisms are used to reduce the dataset size

1
30-09-2020

Attribute (Feature) Subset Section

• In the context of machine learning, it is termed as
feature subset selection
• Irrelevant or redundant features are detected using
correlation analysis
• Two strategies:
– First strategy:
• Perform the correlation analysis between every pair of
attributes
• Drop one among the two attributes when they are highly
correlated
– Second strategy:
• Perform the correlation analysis between each attribute
and target attribute
• Drop the attributes that are less correlated with target
attribute.

Attribute (Feature) Subset Section

• Second strategy:
– Perform the correlation
analysis between each
attribute and target attribute
– Drop the attributes that are
less correlated with target
attribute
• Example:
– Predicting Rain (target
attribute) based on
Temperature, Humidity and
Pressure
– Rain dependent on
Temperature, Humidity and
Pressure
– Correlation analysis of
Temperature, Humidity,
Pressure with Rain

2
30-09-2020

Dimensionality Reduction

Dimensionality Reduction
• Data encoding or transformations are applied so as to
obtain a reduced or compressed representation of the
original data
Reduced
Representation
Representation
Data Feature x Dimension a Pattern
Extraction Reduction Analysis Task
d l

• If the original data can be reconstructed from

compressed data without any loss of information, the
data reduction is called lossless
• If only an approximation of the original data can be
reconstructed from compressed data, then the data
reduction is called lossy
• One of the popular and effective methods of lossy
dimensionality reduction is principal component
analysis (PCA)

3
30-09-2020

Tuple (Data Vector) – Attribute (Dimension)

• A tuple (one row) is
referred as a vector
• Attribute is referred as
dimension
• In this example:
– Number of vectors =
number of rows = 20
– Dimension of a vector
= number of
attributes = 5
– Size of data matrix is
20x5

Tuple (Data Vector)

Principal Component Analysis (PCA)

• Suppose data to be reduced consist of N tuples (or
data vectors) described by d-attributes (d -
dimensions)
D  {x n }nN1 , x n  R d
x n  [ xn1 xn 2 ... xnd ]T
• Let qi, where i = 1, 2,…, d be the d orthonormal
vectors in the d –dimensional space, q i  R
d

– These are unit vectors that each point in a direction

perpendicular to the others
q iT q j  0 i  j
q iT q i  1
• PCA searches for l orthonormal vectors that can best
be used to represent the data, where l < d

4
30-09-2020

Principal Component Analysis (PCA)

• These orthonormal vectors are also called as direction
of projection
• The original data (each of the tuples (data vectors),
xn) is then projected onto each of the l orthonormal
vectors get the principal components
ani  q iT x n
i  1, 2, ..., l
xn
– ani is an ith principal component of xn qi
x2
• This transform each of the d –
dimensional vectors (i.e. tuples) to ani
l – dimensional vectors 0 x1
 xn1  an1  • Task:
x  a  – How to obtain the
xn   n2  an   n2  orthonormal vectors?
...  ... 
    – Which l orthonormal vectors
 xnd  anl  to choose?

Principal Component Analysis (PCA)

• Thus the original data is projected onto much smaller
space, resulting in dimensionality reduction
• It combines the essence of attributes by creating an
alternative, smaller set of variables (attributes)
• It is possible to reconstruct the good approximation of
original data, xn , as linear combination of the direction of
projection, qi , and the principal components, ani
l

x n   ani q i
i 1

• The Euclidian distance between the original and

approximated tuples give the error in reconstruction
d
 
Error  x n  x n   (x
i 1
ni  xni ) 2

5
30-09-2020

PCA for Dimension Reduction

• Given: Data with N samples, D  {x n }n 1 , x n  R
N d

• Remove mean for each attribute (dimension) in data

samples (tuples)
• Then construct a data matrix X using the mean
subtracted samples, X  R
Nxd

– Each row of the matrix X corresponds to 1 sample (tuple

or a data vector)
• Compute a correlation matrix C=XTX
• Perform the eigen analysis of correlation matrix C
Cqi  i q i i  1, 2, ..., d
– As correlation matrix (covariance matrix) is symmetric
matrix and positive semideifinite,
• Each eigenvalues λi are distinct and non-negative.
• Eigenvectors qi corresponding to each eigenvalues are
orthonormal vectors
• Eigenvalues indicate the variance or strength of eigenvectors 11

PCA for Dimension Reduction

• Project the xn onto each of the directions
(eigenvectors) to get the principal components
ani  q iT x n i  1, 2, ..., d
– ani is an ith principal component of xn
• Thus, each training example xn is transformed to a new
representation an by projecting on to d-orthonormal
basis (eigenvectors)  xn1  an1 
x  a 
xn   n2  an   n2 
...  ... 
   
 xnd  and 
• It is possible to reconstruct the original data, xn , without
error as linear combination of the direction of projection, qi,
and the principal components, ani d
x n   aniq i
i 1 12

6
30-09-2020

PCA for Dimension Reduction

• In general, we are interested in representing the data
using fewer dimensions such that the data has high
variance along these dimensions
• Idea: Select l out of d orthonormal basis vectors
(eigenvectors) that contain high variance of data (i.e.
more information content)
• Rank order the eigenvalues (λi’s) such that
1  2  ...  d
• Based on the Definition 1, consider the l (l << d)
eigenvectors corresponding to l significant eigenvalues
– Definition 1: Let λ1, λ2, . . . , λd, be the eigenvalues of an
d x d matrix A. λ1 is called the dominant (significant)
eigenvalue of A if | λ1| ≥ | λi| , i = 1, 2, …, d

PCA for Dimension Reduction

• Project the xn onto each of the l directions
(eigenvectors) to get reduced dimensional
representation

ani  q iT x n i  1, 2, ..., l

• Thus, each training example xn is transformed to a new

reduced dimensional representation an by projecting on
to l-orthonormal basis vectors (eigenvectors)
 xn1  an1 
x  a 
xn   n2 
an   n2 
...  ... 
   
 xnd  anl 
• The eigenvalue λi correspond to the variance of
projected data
14

7
30-09-2020

PCA for Dimension Reduction

• Since the strongest l directions are considered for
obtaining reduced dimensional representation, it
should be possible to reconstruct a good
approximation of the original data
• An approximation of original data, xn , is obtained as linear
combination of the direction of projection (stongest
eigenvectors), qi , and the principal components, ai
l

x n   ai q i
i 1

PCA: Basic Procedure

• Given: Data with N samples, D  {x n }n 1 , x n  R
N d

1. Remove mean for each attribute (dimension) in data

samples (tuples)
2. Then construct a data matrix X using the mean
subtracted samples,X  R
Nxd

– Each row of the matrix X corresponds to 1 sample

(tuple)
3. Compute a correlation matrix C=XTX
4. Perform the eigen analysis of correlation matrix C
Cqi  i q i i  1, 2, ..., d
– As correlation matrix is symmetric matrix,
• Each eigenvalues λi are distinct and non-negative
• Eigenvectors qi corresponding to each eigenvalues are
orthonormal vectors
• Eigenvalues indicate the variance or strength of
eigenvectors 16

8
30-09-2020

PCA for Dimension Reduction

• In general, we are interested in representing the data
using fewer dimensions such that the data has high
variance along these dimensions
5. Rank order the eigenvalues (λi’s) (sorted order) such
that
1  2  ...  d

6. Consider the l (l << d) eigenvectors corresponding to

l significant eigenvalues
7. Project the xn onto each of the l directions
(eigenvectors) to get reduced dimensional
representation
ani  q iT x n i  1, 2, ..., l

PCA for Dimension Reduction

8. Thus, each training example xn is transformed to a
new reduced dimensional representation an by
projecting on to l-orthonormal basis
 xn1  an1 
x  a 
xn   n2  an   n2 
...  ... 
   
 xnd  anl 
• The new reduced representation an is uncorrelated
• The eigenvalue λi correspond to the variance of
projected data

9
30-09-2020

Illustration: PCA
• Atmospheric Data:
– N = Number tiples
(data vectors) = 20
–d = Number of
attributes (dimension)
=5
• Mean of each
dimension:

23.42 93.63 1003.55 448.88 14.4

Illustration: PCA
• Step1: Subtract mean
from each attribute

10
30-09-2020

Illustration: PCA
• Step2: Compute correlation matrix from the data
matrix

Illustration: PCA
Eigen Values
• Step4: Perform Eigen
analysis on
Eigen Vectors correlation matrix
– Get eigenvalues and
eigenvectors
• Step5: Sort the
eigenvalues in
descending order
• Step6: Arrange the
eigenvectors in the
descending order of
their corresponding
eigenvalues

11
30-09-2020

Illustration: PCA
• Step7: Consider the two leading
(significant) eigenvalues and their
corresponding eigenvectors
• Step8: Project the mean subtracted
data matrix onto the selected two
eigenvectors corresponding to leading
eigenvalues

Eigenvalues and Eigenvectors

• What happens when a
1 2
8
A vector is multiplied with a
2 1
7 matrix?
7  • The vector gets transformed
6

q2 5 Aq    into a new vector

5 
4 1  – Direction changes
q 
3
3 • The vector may also get
2
scaled (elongated or
1 shortened) in the process
0
1 2 3 4 5 6 7 8
q1

12
30-09-2020

Eigenvalues and Eigenvectors

• For a given square symmetric
1 2 matrix A, there exist special
8
A
7 2 1 vectors which do not change
direction when multiplied
6
• These vectors are called
q2 5 eigenvectors
4

3 1 • More formally,

3
Aq     3    q
2 3 1 Aq   q
1 – λ is eigenvalue
1 q 
1 – Eigenvalue indicate the
0
1 2 3 4
q1
5 6 7 8
magnitude of the eigenvector
• The vector will only get scaled
but will not change its
direction
• So what is so special about
eigenvalues and eigenvectors?
25

Linear Algebra: Basic Definitions

• Basis: A set of vectors  R d is called a basis, if
– those vectors are linearly independent and
– every vector  R can be expressed as a linear
d

combination of these basis vectors

• Linearly independent vectors:
– A set of d vectors q1, q2, . . . , qd is linearly independent if
no vector in the set can be expressed as a linear
combination of the remaining d – 1 vectors
– In other words, the only solution to
c1q1  c2q 2  ...  cd q d  0 is c1  c2  ...  cd  0
• Here ci are scalars

13
30-09-2020

Linear Algebra: Basic Definitions

• For example consider the
8 space R 2
7 • Consider the vectors:
6 1  0 
q1    q2   
 z1 
q2 5 z  0  1 
4  z2 
• Any vector [z1 z2]T can be
3
expressed as a linear
2
0 combination of these two
q2   
1
1  vectors
 z1  1  0
 z   z1 0  z 2 1 
0
1 2 3 4 5 6 7 8
1  q1
q1     2    
 0
• Further, q1 and q2 are linearly
independent
– The only solution to
c1q1  c2q 2  0 is c1  c2  0
27

Linear Algebra: Basic Definitions

• It turns out that q1 and q2 are
8 unit vectors in the direction of
7
the co-ordinate axes
6
• And indeed we are used to
 z1  represent all vectors in R 2 as a
q2 5
z 
4  z2  linear combination of these
3
two vectors
2
0
1
q2   
1 
0
1 2 3 4 5 6 7 8
1  q1
q1   
 0

14
30-09-2020

Linear Algebra: Basic Definitions

• We could have chosen any 2
8 linearly independent vectors in
5 
7 q2    R 2 as the basis vectors
7 
6
• For example, consider the
 z1  linearly independent vectors
q2 5
z 
4  z2  [4 2]T and [5 7]T
3 • Any vector z=[z1 z2]T can be
4
2 q1    expressed as a linear
2 combination of these two
1

vectors  z1   4 5 
 z   1  2  2 7 
0
1 2 3 4 5 6 7 8
q1  2    
z  1 q1  2 q 2
z1  41  52 • We can find λ1 and λ2 by
z 2  21  72 solving a system of linear
equations
29

Linear Algebra: Basic Definitions

• In general, given a set of
8 linearly independent vectors
5  q 1, q 2 , . . . , q d  R d
7 q2   
7 
– we can express any vector z  R
d
6

 z1  as a linear combination of
q2 5 z 
 z2  these vectors
4
z  1 q 1   2 q 2  ...   d q d
3
4  z1   q11   q 21  qd 1 
2 q1    z  q  q   
1
2  2     12     22   ...    q d 2 
.  1
.  2
.  d
. 
       
0
1 2 3 4 5 6 7 8  zd   q1d  q2 d   q dd 
q1
 z1   q11 q 21 ... q d 1  1 
z    
 2    q12 q 22 ... q d 2   2 
.  . . . . . . . . . . . . . . . .  . 
    
 zd   q1d q 2 d ... q dd   d 
z  Q λ
30

15
30-09-2020

Linear Algebra: Basic Definitions

x2 • Let us see if we have
orthonormal basis
5
q iT q i  1 and q iT q j  0 i  j
q2 4
• We can express any vector z  R d
3
z as a linear combination of
q1
2
these vectors
1
2 1 z  1q1  2q 2  ...  d q d
-3 -2 -1 0 1 2 3 4
x1 – Multiply q1 to both sides
q1T z  1q1T q1  2q1Tq 2  ...  d q1Tq d
q1T z  1
• Similarly, 2  q 2T z
• An orthogonal basis is the ...
most convenient basis
d  q Td z
that one can hope for
31

Eigenvalues and Eigenvectors

• What does any of this have to do with eigenvectors?
• Eigenvectors can form a basis
• Theorem 1: The eigenvectors of a matrix A  R d x d
having distinct eigenvalues are linearly independent
• Theorem 2: The eigenvectors of a square symmetric
matrix are orthogonal
• Definition 1: Let λ1, λ2, . . . , λd, be the eigenvalues of
an d x d matrix A. λ1 is called the dominant
(significant) eigenvalue of A if | λ1| ≥ | λi| , i = 1, 2, …, d
• We will put all of this to use for principal component
analysis

16
30-09-2020

Principal Component Analysis (PCA)

x2 • Each point (vector) here is
5
represented using a linear
4
combination of the x1 and x2
3
p2 axes
2
• In other words we are using p1
1
and p2 as the basis
p1
-2 -1 0 1 2 3 4 5
x1

Principal Component Analysis (PCA)

x2 • Lets consider orthonormal
5
q1 vectors q1 and q2 as a basis
instead of p1 and p2 as the basis
4
q2 3
• We observe that all the points
2
have a very small component in
1
the direction of q2 (almost
noise)
-2 -1 0 1 2 3 4 5
x1

17
30-09-2020

Principal Component Analysis (PCA)

x2 • Lets consider orthonormal
5
q1 vectors q1 and q2 as a basis
instead of p1 and p2 as the basis
4
q2
3
• We observe that all the points
2
have a very small component in
1
1 the direction of q2 (almost
2 noise)
-2 -1 0 1 2 3 4 5
x1
• Now the same data can be represented in 1-dimension
in the direction of q1 by making a smarter choice for
the basis
• Why do we not care about q2?
– Variance in the data in this direction is very small
– All data points have almost the same value in the q2
direction
35

Principal Component Analysis (PCA)

x2
• If we were to build a classifier
q1
5 on top of this data then q2
4 would not contribute to the
q2 classier
3

2 – The points are not

1
1 distinguishable along this
2 direction
-2 -1 0 1 2 3 4 5
x1
• In general, we are interested in representing the data
using fewer dimensions such that
– the data has high variance along these dimensions
– the dimensions are linearly independent (uncorrelated)
• PCA preserves the geometrical locality of the
transformed data with respect to original data

18
30-09-2020

PCA: Basic Procedure

• Given: Data with N samples, D  {x n }n 1 , x n  R
N d

1. Remove mean for each attribute (dimension) in data

samples (tuples)
2. Then construct a data matrix X using the mean
subtracted samples,X  R
Nxd

– Each row of the matrix X corresponds to 1 sample

PCA for Dimension Reduction

6. Consider the l (l << d) eigenvectors corresponding to

l significant eigenvalues
7. Project the xn onto each of the l directions
(eigenvectors) to get reduced dimensional
representation
ani  q iT x n i  1, 2, ..., l

19
30-09-2020

PCA for Dimension Reduction

8. Thus, each training example xn is transformed to a
new reduced dimensional representation an by
projecting on to l-orthonormal basis
 xn1  an1 
x  a 
xn   n2 
an   n2 
...  ... 
   
 xnd  anl 
• The new reduced representation an is uncorrelated
• The eigenvalue λi correspond to the variance of
projected data

Illustration: PCA
• Handwritten Digit Image [1]:
– Size of each image: 28 x 28
– Dimension after linearizing: 784
– Total number of training examples: 5000 (500 per class)

[1] Y. LeCun, L. Bottou, Y. Bengio and P. Haffner, “Gradient-Based Learning

Applied to Document Recognition,” Intelligent Signal Processing, 306-351,
IEEE Press, 2001
40

20
30-09-2020

Illustration: PCA
• Handwritten Digit Image:
– All 784 Eigenvalues

Illustration: PCA
• Handwritten Digit Image:
– Leading 100 Eigenvalues

21
30-09-2020

Illustration: PCA-Reconstructed Images

Original Image l=1 l=20 l=100

Ukraine - A History PDF
100% (15)
Ukraine - A History PDF
887 pages
Final Report - Envoy Last
50% (4)
Final Report - Envoy Last
88 pages
Chapter 10. Dimensionality Reduction With PCA
No ratings yet
Chapter 10. Dimensionality Reduction With PCA
23 pages
Principal Component Analysis: Term Paper For Data Mining & Data Warehousing
No ratings yet
Principal Component Analysis: Term Paper For Data Mining & Data Warehousing
11 pages
Unit 3
No ratings yet
Unit 3
102 pages
Machine Learning (CSO851) - Lecture 03
No ratings yet
Machine Learning (CSO851) - Lecture 03
71 pages
U5@-Data Reduction
No ratings yet
U5@-Data Reduction
22 pages
3.2 Pca
No ratings yet
3.2 Pca
27 pages
Module 3
No ratings yet
Module 3
41 pages
Dim Reduction & Pattern Recognition
No ratings yet
Dim Reduction & Pattern Recognition
63 pages
Unit 4 Dimenstionality Reduction
No ratings yet
Unit 4 Dimenstionality Reduction
104 pages
IDS 4 (Week 14)
No ratings yet
IDS 4 (Week 14)
66 pages
W4.2 DataPreProcessing-PCA
No ratings yet
W4.2 DataPreProcessing-PCA
22 pages
7.3 Pca
No ratings yet
7.3 Pca
17 pages
کتاب نهم بارگزاری شده
No ratings yet
کتاب نهم بارگزاری شده
55 pages
Module3 Notes
No ratings yet
Module3 Notes
13 pages
PCA
100% (1)
PCA
33 pages
Unit 3dimentionality Reduction
No ratings yet
Unit 3dimentionality Reduction
13 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
60 pages
Lecture 9 - PCA
No ratings yet
Lecture 9 - PCA
44 pages
10-601 Machine Learning (Fall 2010) Principal Component Analysis
No ratings yet
10-601 Machine Learning (Fall 2010) Principal Component Analysis
8 pages
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
No ratings yet
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
19 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
19 pages
Dimension Reduction
No ratings yet
Dimension Reduction
23 pages
Dimension Reduction and Hidden Structure: 1.1 Principal Component Analysis (PCA)
No ratings yet
Dimension Reduction and Hidden Structure: 1.1 Principal Component Analysis (PCA)
40 pages
Pca Lda Lobo
No ratings yet
Pca Lda Lobo
20 pages
ML Mod 4 Part 2
No ratings yet
ML Mod 4 Part 2
32 pages
FALLSEM2024-25 SWE1015 ETH VL2024250103260 2024-09-18 Reference-Material-I
No ratings yet
FALLSEM2024-25 SWE1015 ETH VL2024250103260 2024-09-18 Reference-Material-I
62 pages
22AIP3101A Session 7
No ratings yet
22AIP3101A Session 7
28 pages
Prs l6
No ratings yet
Prs l6
10 pages
Unit Iii Dimentionality Reduction
No ratings yet
Unit Iii Dimentionality Reduction
12 pages
Need of Principal Component Analysis
No ratings yet
Need of Principal Component Analysis
8 pages
Lecture 9 - Data Reduction
No ratings yet
Lecture 9 - Data Reduction
36 pages
PCA Finds Representation Through Linear Transformation
No ratings yet
PCA Finds Representation Through Linear Transformation
28 pages
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
No ratings yet
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
59 pages
315 F19 27 Pca1
No ratings yet
315 F19 27 Pca1
28 pages
Unit 3
No ratings yet
Unit 3
28 pages
Computer Vision and Image Processing - Fundamentals and Applications
No ratings yet
Computer Vision and Image Processing - Fundamentals and Applications
34 pages
D3S2 - Unsupervised - Dimensionality Reduction
No ratings yet
D3S2 - Unsupervised - Dimensionality Reduction
81 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
Lecture 3
No ratings yet
Lecture 3
14 pages
Ferath Kherif PCA
No ratings yet
Ferath Kherif PCA
17 pages
Chapter Five Principal Comonent Analysis (PCA)
No ratings yet
Chapter Five Principal Comonent Analysis (PCA)
33 pages
03 Dimensionality Reduction
No ratings yet
03 Dimensionality Reduction
38 pages
4.5 Principal Component Analysis
No ratings yet
4.5 Principal Component Analysis
15 pages
What Is PCA?: Image Source
No ratings yet
What Is PCA?: Image Source
17 pages
Unit V Foml
No ratings yet
Unit V Foml
18 pages
PCA ChrisDing4
No ratings yet
PCA ChrisDing4
74 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
Presentation A I STD 2
No ratings yet
Presentation A I STD 2
63 pages
20 Pca
No ratings yet
20 Pca
50 pages
Ai & ML Week-9
No ratings yet
Ai & ML Week-9
30 pages
It ML Unit 4 Notes Final
No ratings yet
It ML Unit 4 Notes Final
21 pages
Principal Component Analysis (PCA) : Anisha M. Lal
No ratings yet
Principal Component Analysis (PCA) : Anisha M. Lal
20 pages
Linear Regression: Dimensionality Reduction
No ratings yet
Linear Regression: Dimensionality Reduction
7 pages
Kinya Sharon - Ass2 - Machine Learning
No ratings yet
Kinya Sharon - Ass2 - Machine Learning
12 pages
PCA - Feb 8
No ratings yet
PCA - Feb 8
28 pages
cs229 Notes10 PDF
No ratings yet
cs229 Notes10 PDF
6 pages
Lecture 14: Principal Component Analysis: Computing The Principal Components
No ratings yet
Lecture 14: Principal Component Analysis: Computing The Principal Components
6 pages
Week12 PCA BayesianInference Before Lecture
No ratings yet
Week12 PCA BayesianInference Before Lecture
82 pages
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
The Numpy Pocketbook: Essentials on the Go
From Everand
The Numpy Pocketbook: Essentials on the Go
Silas Meadowlark
No ratings yet
Diaphragm Design
No ratings yet
Diaphragm Design
4 pages
Knowledge K1 - Remembering K3 - Applying K5 - Evaluating Levels (KL) K2 - Understanding K4 - Analyzing K6 - Creating Course Outcome
No ratings yet
Knowledge K1 - Remembering K3 - Applying K5 - Evaluating Levels (KL) K2 - Understanding K4 - Analyzing K6 - Creating Course Outcome
3 pages
Laboratory 8 - Micronutrients and Water
No ratings yet
Laboratory 8 - Micronutrients and Water
6 pages
INV-EXP-241116 - Final - 055851
No ratings yet
INV-EXP-241116 - Final - 055851
7 pages
Placenta Previa Maternal and Foetal Outcome
No ratings yet
Placenta Previa Maternal and Foetal Outcome
4 pages
Chapter 4 Conics
No ratings yet
Chapter 4 Conics
55 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
15 pages
B.tech. 3rd Sem (G Scheme) Exam Dec 2019
No ratings yet
B.tech. 3rd Sem (G Scheme) Exam Dec 2019
2 pages
Troubleshooting: 4-1 Checkpoints by Error Mode
No ratings yet
Troubleshooting: 4-1 Checkpoints by Error Mode
31 pages
Cost of Voltage Sags
No ratings yet
Cost of Voltage Sags
29 pages
EZC250H3150
No ratings yet
EZC250H3150
2 pages
K W Jeter - Noir (v5.0)
No ratings yet
K W Jeter - Noir (v5.0)
254 pages
Total 10,000#
No ratings yet
Total 10,000#
1 page
Regional Assessment of Tusanmi Risk Gulf of Mexico Ten Brink (2009)
No ratings yet
Regional Assessment of Tusanmi Risk Gulf of Mexico Ten Brink (2009)
98 pages
Boq - 2 Storey - Using Aac Block
No ratings yet
Boq - 2 Storey - Using Aac Block
9 pages
Lee Gardens Gift Card Programme Acceptance List
No ratings yet
Lee Gardens Gift Card Programme Acceptance List
1 page
Timber Lesson 1 PDF
No ratings yet
Timber Lesson 1 PDF
59 pages
The Poem Is The Real: A Poetics : Ferndale Homes 17 Hunyo 2008
No ratings yet
The Poem Is The Real: A Poetics : Ferndale Homes 17 Hunyo 2008
6 pages
June 2013
No ratings yet
June 2013
124 pages
CIEM5790 2019 20 Exam Paper
No ratings yet
CIEM5790 2019 20 Exam Paper
7 pages
Complete Handbook
No ratings yet
Complete Handbook
28 pages
IP Project Final
No ratings yet
IP Project Final
13 pages
Journal Reading Clinicopathological Study of Lichen Amyloidosis - Kartika Mega Utami Bouw
No ratings yet
Journal Reading Clinicopathological Study of Lichen Amyloidosis - Kartika Mega Utami Bouw
4 pages
Action Plan Freight Transport and Logistic
No ratings yet
Action Plan Freight Transport and Logistic
48 pages
Wa0005
No ratings yet
Wa0005
6 pages
Scheme Cbcs Nep Ece 2024
No ratings yet
Scheme Cbcs Nep Ece 2024
223 pages
Iso 4034-5
No ratings yet
Iso 4034-5
10 pages
30 Renal Regulation of Sodium
No ratings yet
30 Renal Regulation of Sodium
7 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Class8-9 DataPreprocessing DataReduction 30Sept-05Oct2020

Uploaded by

Class8-9 DataPreprocessing DataReduction 30Sept-05Oct2020

Uploaded by

30-09-2020

Attribute (Feature) Subset Section

Attribute (Feature) Subset Section

• If the original data can be reconstructed from

Tuple (Data Vector) – Attribute (Dimension)

Tuple (Data Vector)

Principal Component Analysis (PCA)

– These are unit vectors that each point in a direction

Principal Component Analysis (PCA)

Principal Component Analysis (PCA)

• The Euclidian distance between the original and

PCA for Dimension Reduction

• Remove mean for each attribute (dimension) in data

– Each row of the matrix X corresponds to 1 sample (tuple

PCA for Dimension Reduction

PCA for Dimension Reduction

PCA for Dimension Reduction

• Thus, each training example xn is transformed to a new

PCA for Dimension Reduction

PCA: Basic Procedure

1. Remove mean for each attribute (dimension) in data

– Each row of the matrix X corresponds to 1 sample

PCA for Dimension Reduction

6. Consider the l (l << d) eigenvectors corresponding to

PCA for Dimension Reduction

23.42 93.63 1003.55 448.88 14.4

Eigenvalues and Eigenvectors

q2 5 Aq    into a new vector

Eigenvalues and Eigenvectors

3 1 • More formally,

Linear Algebra: Basic Definitions

combination of these basis vectors

Linear Algebra: Basic Definitions

Linear Algebra: Basic Definitions

Linear Algebra: Basic Definitions

Linear Algebra: Basic Definitions

Linear Algebra: Basic Definitions

Eigenvalues and Eigenvectors

Principal Component Analysis (PCA)

Principal Component Analysis (PCA)

Principal Component Analysis (PCA)

Principal Component Analysis (PCA)

2 – The points are not

PCA: Basic Procedure

1. Remove mean for each attribute (dimension) in data

– Each row of the matrix X corresponds to 1 sample

PCA for Dimension Reduction

6. Consider the l (l << d) eigenvectors corresponding to

PCA for Dimension Reduction

[1] Y. LeCun, L. Bottou, Y. Bengio and P. Haffner, “Gradient-Based Learning

Illustration: PCA-Reconstructed Images

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.