0% found this document useful (0 votes)
28 views22 pages

Class8-9 DataPreprocessing DataReduction 30Sept-05Oct2020

Uploaded by

Rajat Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views22 pages

Class8-9 DataPreprocessing DataReduction 30Sept-05Oct2020

Uploaded by

Rajat Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

30-09-2020

Data Preprocessing
Data Reduction

Data Reduction
• Data reduction techniques are applied to obtain a
reduced representation of the dataset that is much
smaller in volume, yet closely maintain the integrity of
the original data
• The mining on the reduced dataset should produce
the same or almost same analytical results
• Different strategies:
– Attribute subset selection (feature selection):
• Irrelevant, weekly relevant or redundant attributes
(dimensions) are detected and removed
– Dimensionality reduction:
• Encoding mechanisms are used to reduce the dataset size

1
30-09-2020

Attribute (Feature) Subset Section


• In the context of machine learning, it is termed as
feature subset selection
• Irrelevant or redundant features are detected using
correlation analysis
• Two strategies:
– First strategy:
• Perform the correlation analysis between every pair of
attributes
• Drop one among the two attributes when they are highly
correlated
– Second strategy:
• Perform the correlation analysis between each attribute
and target attribute
• Drop the attributes that are less correlated with target
attribute.

Attribute (Feature) Subset Section


• Second strategy:
– Perform the correlation
analysis between each
attribute and target attribute
– Drop the attributes that are
less correlated with target
attribute
• Example:
– Predicting Rain (target
attribute) based on
Temperature, Humidity and
Pressure
– Rain dependent on
Temperature, Humidity and
Pressure
– Correlation analysis of
Temperature, Humidity,
Pressure with Rain

2
30-09-2020

Dimensionality Reduction

Dimensionality Reduction
• Data encoding or transformations are applied so as to
obtain a reduced or compressed representation of the
original data
Reduced
Representation
Representation
Data Feature x Dimension a Pattern
Extraction Reduction Analysis Task
d l

• If the original data can be reconstructed from


compressed data without any loss of information, the
data reduction is called lossless
• If only an approximation of the original data can be
reconstructed from compressed data, then the data
reduction is called lossy
• One of the popular and effective methods of lossy
dimensionality reduction is principal component
analysis (PCA)

3
30-09-2020

Tuple (Data Vector) – Attribute (Dimension)


• A tuple (one row) is
referred as a vector
• Attribute is referred as
dimension
• In this example:
– Number of vectors =
number of rows = 20
– Dimension of a vector
= number of
attributes = 5
– Size of data matrix is
20x5

Tuple (Data Vector)

Principal Component Analysis (PCA)


• Suppose data to be reduced consist of N tuples (or
data vectors) described by d-attributes (d -
dimensions)
D  {x n }nN1 , x n  R d
x n  [ xn1 xn 2 ... xnd ]T
• Let qi, where i = 1, 2,…, d be the d orthonormal
vectors in the d –dimensional space, q i  R
d

– These are unit vectors that each point in a direction


perpendicular to the others
q iT q j  0 i  j
q iT q i  1
• PCA searches for l orthonormal vectors that can best
be used to represent the data, where l < d

4
30-09-2020

Principal Component Analysis (PCA)


• These orthonormal vectors are also called as direction
of projection
• The original data (each of the tuples (data vectors),
xn) is then projected onto each of the l orthonormal
vectors get the principal components
ani  q iT x n
i  1, 2, ..., l
xn
– ani is an ith principal component of xn qi
x2
• This transform each of the d –
dimensional vectors (i.e. tuples) to ani
l – dimensional vectors 0 x1
 xn1  an1  • Task:
x  a  – How to obtain the
xn   n2  an   n2  orthonormal vectors?
...  ... 
    – Which l orthonormal vectors
 xnd  anl  to choose?

Principal Component Analysis (PCA)


• Thus the original data is projected onto much smaller
space, resulting in dimensionality reduction
• It combines the essence of attributes by creating an
alternative, smaller set of variables (attributes)
• It is possible to reconstruct the good approximation of
original data, xn , as linear combination of the direction of
projection, qi , and the principal components, ani
l

x n   ani q i
i 1

• The Euclidian distance between the original and


approximated tuples give the error in reconstruction
d
 
Error  x n  x n   (x
i 1
ni  xni ) 2

5
30-09-2020

PCA for Dimension Reduction


• Given: Data with N samples, D  {x n }n 1 , x n  R
N d

• Remove mean for each attribute (dimension) in data


samples (tuples)
• Then construct a data matrix X using the mean
subtracted samples, X  R
Nxd

– Each row of the matrix X corresponds to 1 sample (tuple


or a data vector)
• Compute a correlation matrix C=XTX
• Perform the eigen analysis of correlation matrix C
Cqi  i q i i  1, 2, ..., d
– As correlation matrix (covariance matrix) is symmetric
matrix and positive semideifinite,
• Each eigenvalues λi are distinct and non-negative.
• Eigenvectors qi corresponding to each eigenvalues are
orthonormal vectors
• Eigenvalues indicate the variance or strength of eigenvectors 11

PCA for Dimension Reduction


• Project the xn onto each of the directions
(eigenvectors) to get the principal components
ani  q iT x n i  1, 2, ..., d
– ani is an ith principal component of xn
• Thus, each training example xn is transformed to a new
representation an by projecting on to d-orthonormal
basis (eigenvectors)  xn1  an1 
x  a 
xn   n2  an   n2 
...  ... 
   
 xnd  and 
• It is possible to reconstruct the original data, xn , without
error as linear combination of the direction of projection, qi,
and the principal components, ani d
x n   aniq i
i 1 12

6
30-09-2020

PCA for Dimension Reduction


• In general, we are interested in representing the data
using fewer dimensions such that the data has high
variance along these dimensions
• Idea: Select l out of d orthonormal basis vectors
(eigenvectors) that contain high variance of data (i.e.
more information content)
• Rank order the eigenvalues (λi’s) such that
1  2  ...  d
• Based on the Definition 1, consider the l (l << d)
eigenvectors corresponding to l significant eigenvalues
– Definition 1: Let λ1, λ2, . . . , λd, be the eigenvalues of an
d x d matrix A. λ1 is called the dominant (significant)
eigenvalue of A if | λ1| ≥ | λi| , i = 1, 2, …, d

13

PCA for Dimension Reduction


• Project the xn onto each of the l directions
(eigenvectors) to get reduced dimensional
representation

ani  q iT x n i  1, 2, ..., l

• Thus, each training example xn is transformed to a new


reduced dimensional representation an by projecting on
to l-orthonormal basis vectors (eigenvectors)
 xn1  an1 
x  a 
xn   n2 
an   n2 
...  ... 
   
 xnd  anl 
• The eigenvalue λi correspond to the variance of
projected data
14

7
30-09-2020

PCA for Dimension Reduction


• Since the strongest l directions are considered for
obtaining reduced dimensional representation, it
should be possible to reconstruct a good
approximation of the original data
• An approximation of original data, xn , is obtained as linear
combination of the direction of projection (stongest
eigenvectors), qi , and the principal components, ai
l

x n   ai q i
i 1

15

PCA: Basic Procedure


• Given: Data with N samples, D  {x n }n 1 , x n  R
N d

1. Remove mean for each attribute (dimension) in data


samples (tuples)
2. Then construct a data matrix X using the mean
subtracted samples,X  R
Nxd

– Each row of the matrix X corresponds to 1 sample


(tuple)
3. Compute a correlation matrix C=XTX
4. Perform the eigen analysis of correlation matrix C
Cqi  i q i i  1, 2, ..., d
– As correlation matrix is symmetric matrix,
• Each eigenvalues λi are distinct and non-negative
• Eigenvectors qi corresponding to each eigenvalues are
orthonormal vectors
• Eigenvalues indicate the variance or strength of
eigenvectors 16

8
30-09-2020

PCA for Dimension Reduction


• In general, we are interested in representing the data
using fewer dimensions such that the data has high
variance along these dimensions
5. Rank order the eigenvalues (λi’s) (sorted order) such
that
1  2  ...  d

6. Consider the l (l << d) eigenvectors corresponding to


l significant eigenvalues
7. Project the xn onto each of the l directions
(eigenvectors) to get reduced dimensional
representation
ani  q iT x n i  1, 2, ..., l

17

PCA for Dimension Reduction


8. Thus, each training example xn is transformed to a
new reduced dimensional representation an by
projecting on to l-orthonormal basis
 xn1  an1 
x  a 
xn   n2  an   n2 
...  ... 
   
 xnd  anl 
• The new reduced representation an is uncorrelated
• The eigenvalue λi correspond to the variance of
projected data

18

9
30-09-2020

Illustration: PCA
• Atmospheric Data:
– N = Number tiples
(data vectors) = 20
–d = Number of
attributes (dimension)
=5
• Mean of each
dimension:

23.42 93.63 1003.55 448.88 14.4

19

Illustration: PCA
• Step1: Subtract mean
from each attribute

20

10
30-09-2020

Illustration: PCA
• Step2: Compute correlation matrix from the data
matrix

21

Illustration: PCA
Eigen Values
• Step4: Perform Eigen
analysis on
Eigen Vectors correlation matrix
– Get eigenvalues and
eigenvectors
• Step5: Sort the
eigenvalues in
descending order
• Step6: Arrange the
eigenvectors in the
descending order of
their corresponding
eigenvalues

22

11
30-09-2020

Illustration: PCA
• Step7: Consider the two leading
(significant) eigenvalues and their
corresponding eigenvectors
• Step8: Project the mean subtracted
data matrix onto the selected two
eigenvectors corresponding to leading
eigenvalues

23

Eigenvalues and Eigenvectors


• What happens when a
1 2
8
A vector is multiplied with a
2 1
7 matrix?
7  • The vector gets transformed
6

q2 5 Aq    into a new vector


5 
4 1  – Direction changes
q 
3
3 • The vector may also get
2
scaled (elongated or
1 shortened) in the process
0
1 2 3 4 5 6 7 8
q1

24

12
30-09-2020

Eigenvalues and Eigenvectors


• For a given square symmetric
1 2 matrix A, there exist special
8
A
7 2 1 vectors which do not change
direction when multiplied
6
• These vectors are called
q2 5 eigenvectors
4

3 1 • More formally,


3
Aq     3    q
2 3 1 Aq   q
1 – λ is eigenvalue
1 q 
1 – Eigenvalue indicate the
0
1 2 3 4
q1
5 6 7 8
magnitude of the eigenvector
• The vector will only get scaled
but will not change its
direction
• So what is so special about
eigenvalues and eigenvectors?
25

Linear Algebra: Basic Definitions


• Basis: A set of vectors  R d is called a basis, if
– those vectors are linearly independent and
– every vector  R can be expressed as a linear
d

combination of these basis vectors


• Linearly independent vectors:
– A set of d vectors q1, q2, . . . , qd is linearly independent if
no vector in the set can be expressed as a linear
combination of the remaining d – 1 vectors
– In other words, the only solution to
c1q1  c2q 2  ...  cd q d  0 is c1  c2  ...  cd  0
• Here ci are scalars

26

13
30-09-2020

Linear Algebra: Basic Definitions


• For example consider the
8 space R 2
7 • Consider the vectors:
6 1  0 
q1    q2   
 z1 
q2 5 z  0  1 
4  z2 
• Any vector [z1 z2]T can be
3
expressed as a linear
2
0 combination of these two
q2   
1
1  vectors
 z1  1  0
 z   z1 0  z 2 1 
0
1 2 3 4 5 6 7 8
1  q1
q1     2    
 0
• Further, q1 and q2 are linearly
independent
– The only solution to
c1q1  c2q 2  0 is c1  c2  0
27

Linear Algebra: Basic Definitions


• It turns out that q1 and q2 are
8 unit vectors in the direction of
7
the co-ordinate axes
6
• And indeed we are used to
 z1  represent all vectors in R 2 as a
q2 5
z 
4  z2  linear combination of these
3
two vectors
2
0
1
q2   
1 
0
1 2 3 4 5 6 7 8
1  q1
q1   
 0

28

14
30-09-2020

Linear Algebra: Basic Definitions


• We could have chosen any 2
8 linearly independent vectors in
5 
7 q2    R 2 as the basis vectors
7 
6
• For example, consider the
 z1  linearly independent vectors
q2 5
z 
4  z2  [4 2]T and [5 7]T
3 • Any vector z=[z1 z2]T can be
4
2 q1    expressed as a linear
2 combination of these two
1

vectors  z1   4 5 
 z   1  2  2 7 
0
1 2 3 4 5 6 7 8
q1  2    
z  1 q1  2 q 2
z1  41  52 • We can find λ1 and λ2 by
z 2  21  72 solving a system of linear
equations
29

Linear Algebra: Basic Definitions


• In general, given a set of
8 linearly independent vectors
5  q 1, q 2 , . . . , q d  R d
7 q2   
7 
– we can express any vector z  R
d
6

 z1  as a linear combination of
q2 5 z 
 z2  these vectors
4
z  1 q 1   2 q 2  ...   d q d
3
4  z1   q11   q 21  qd 1 
2 q1    z  q  q   
1
2  2     12     22   ...    q d 2 
.  1
.  2
.  d
. 
       
0
1 2 3 4 5 6 7 8  zd   q1d  q2 d   q dd 
q1
 z1   q11 q 21 ... q d 1  1 
z    
 2    q12 q 22 ... q d 2   2 
.  . . . . . . . . . . . . . . . .  . 
    
 zd   q1d q 2 d ... q dd   d 
z  Q λ
30

15
30-09-2020

Linear Algebra: Basic Definitions


x2 • Let us see if we have
orthonormal basis
5
q iT q i  1 and q iT q j  0 i  j
q2 4
• We can express any vector z  R d
3
z as a linear combination of
q1
2
these vectors
1
2 1 z  1q1  2q 2  ...  d q d
-3 -2 -1 0 1 2 3 4
x1 – Multiply q1 to both sides
q1T z  1q1T q1  2q1Tq 2  ...  d q1Tq d
q1T z  1
• Similarly, 2  q 2T z
• An orthogonal basis is the ...
most convenient basis
d  q Td z
that one can hope for
31

Eigenvalues and Eigenvectors


• What does any of this have to do with eigenvectors?
• Eigenvectors can form a basis
• Theorem 1: The eigenvectors of a matrix A  R d x d
having distinct eigenvalues are linearly independent
• Theorem 2: The eigenvectors of a square symmetric
matrix are orthogonal
• Definition 1: Let λ1, λ2, . . . , λd, be the eigenvalues of
an d x d matrix A. λ1 is called the dominant
(significant) eigenvalue of A if | λ1| ≥ | λi| , i = 1, 2, …, d
• We will put all of this to use for principal component
analysis

32

16
30-09-2020

Principal Component Analysis (PCA)


x2 • Each point (vector) here is
5
represented using a linear
4
combination of the x1 and x2
3
p2 axes
2
• In other words we are using p1
1
and p2 as the basis
p1
-2 -1 0 1 2 3 4 5
x1

33

Principal Component Analysis (PCA)


x2 • Lets consider orthonormal
5
q1 vectors q1 and q2 as a basis
instead of p1 and p2 as the basis
4
q2 3
• We observe that all the points
2
have a very small component in
1
the direction of q2 (almost
noise)
-2 -1 0 1 2 3 4 5
x1

34

17
30-09-2020

Principal Component Analysis (PCA)


x2 • Lets consider orthonormal
5
q1 vectors q1 and q2 as a basis
instead of p1 and p2 as the basis
4
q2
3
• We observe that all the points
2
have a very small component in
1
1 the direction of q2 (almost
2 noise)
-2 -1 0 1 2 3 4 5
x1
• Now the same data can be represented in 1-dimension
in the direction of q1 by making a smarter choice for
the basis
• Why do we not care about q2?
– Variance in the data in this direction is very small
– All data points have almost the same value in the q2
direction
35

Principal Component Analysis (PCA)


x2
• If we were to build a classifier
q1
5 on top of this data then q2
4 would not contribute to the
q2 classier
3

2 – The points are not


1
1 distinguishable along this
2 direction
-2 -1 0 1 2 3 4 5
x1
• In general, we are interested in representing the data
using fewer dimensions such that
– the data has high variance along these dimensions
– the dimensions are linearly independent (uncorrelated)
• PCA preserves the geometrical locality of the
transformed data with respect to original data

36

18
30-09-2020

PCA: Basic Procedure


• Given: Data with N samples, D  {x n }n 1 , x n  R
N d

1. Remove mean for each attribute (dimension) in data


samples (tuples)
2. Then construct a data matrix X using the mean
subtracted samples,X  R
Nxd

– Each row of the matrix X corresponds to 1 sample


(tuple)
3. Compute a correlation matrix C=XTX
4. Perform the eigen analysis of correlation matrix C
Cqi  i q i i  1, 2, ..., d
– As correlation matrix is symmetric matrix,
• Each eigenvalues λi are distinct and non-negative
• Eigenvectors qi corresponding to each eigenvalues are
orthonormal vectors
• Eigenvalues indicate the variance or strength of
eigenvectors 37

PCA for Dimension Reduction


• In general, we are interested in representing the data
using fewer dimensions such that the data has high
variance along these dimensions
5. Rank order the eigenvalues (λi’s) (sorted order) such
that
1  2  ...  d

6. Consider the l (l << d) eigenvectors corresponding to


l significant eigenvalues
7. Project the xn onto each of the l directions
(eigenvectors) to get reduced dimensional
representation
ani  q iT x n i  1, 2, ..., l

38

19
30-09-2020

PCA for Dimension Reduction


8. Thus, each training example xn is transformed to a
new reduced dimensional representation an by
projecting on to l-orthonormal basis
 xn1  an1 
x  a 
xn   n2 
an   n2 
...  ... 
   
 xnd  anl 
• The new reduced representation an is uncorrelated
• The eigenvalue λi correspond to the variance of
projected data

39

Illustration: PCA
• Handwritten Digit Image [1]:
– Size of each image: 28 x 28
– Dimension after linearizing: 784
– Total number of training examples: 5000 (500 per class)

[1] Y. LeCun, L. Bottou, Y. Bengio and P. Haffner, “Gradient-Based Learning


Applied to Document Recognition,” Intelligent Signal Processing, 306-351,
IEEE Press, 2001
40

20
30-09-2020

Illustration: PCA
• Handwritten Digit Image:
– All 784 Eigenvalues

41

Illustration: PCA
• Handwritten Digit Image:
– Leading 100 Eigenvalues

42

21
30-09-2020

Illustration: PCA-Reconstructed Images


Original Image l=1 l=20 l=100

43

22

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy