0% found this document useful (0 votes)

4 views79 pages

Deep Learning Unit 2

The document discusses Feed-Forward Neural Networks, focusing on the RMSPROP optimizer and the backpropagation algorithm for training neural networks. It highlights the advantages and disadvantages of neural networks, including their ability to handle noisy data and the challenges of interpretability and training time. Additionally, it covers Principal Component Analysis (PCA) for dimensionality reduction and the Singular Value Decomposition (SVD) technique, explaining their applications in data analysis and machine learning.

Uploaded by

Rahul Yadav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views79 pages

Deep Learning Unit 2

Uploaded by

Rahul Yadav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 79

Deep Learning

Unit 2
ELECTIVE-VIII (BTCOE705 (B))
Feed-Forward Neural Networks
RMSPROP Optimizer for Gradient Descent
• RMSPROP [ Root Mean Square Propagation]
• Extension to the gradient descent optimization algorithm
zero
Classification based on Backpropagation
• Neural Networks

• Disadvantages of Neural Networks

• Long time for training network
• Interpretability less for humans

• Advantages of Neural Networks

• Highly tolerable for Noisy data (unknown data, data with unmatching
patterns, impure data etc)
• Good for unrecognized patterns (the patterns not yet been trained that can
also be recognized)
• Backpropagation is an algorithm which works in an iterative nature

I/P Hidden O/P Layer

Layer Layer

• Here if the actual output is not the desired one then we backpropagate
to the hidden layer and check with the updated weights and re-
compute the output till the desired output is produced.
Backpropagation algorithm contains
• D: Dataset
• T: Target attributes
• w: weights
• B: bias
Backpropagation algorithm works in two
stages
• Two layer Network

I/p Hidd O/P

en
• Feed-forward Network

Hidd Hidd O/p

I/p en en

• There can be cross connection between layers

• Fully Connected Network
Backpropagation Algorithm
• First is the Input layer
• There should be some output from the input layer
• Let I represents the input layer and Ii represents an input of this layer
• Let Oi represents output of each input of the input layer
• Oi = Ii
Steps in the Backpropagation algorithm
A. Initialize all the weights and bias
B. Repeat while terminating condition is reached
1. First compute the outputs of all the inputs / input layer.
2. Then compute inputs for the second layer i.e hidden layer Ij ,

So, Ij = (∑ wij * Oi ) + bj
3. Now compute output of hidden layer ie., Oj
Oj = 1 / (1 + e-IJ )
4. So, this way we can compute output for as many hidden layers as
we want.
5. Now same way compute input for the output layer k, ie., Ik
So, Ik = (∑ wjk * Oj ) + bk

6. Now compute output of the output layer ie., Ok

Ok = 1 / (1 + e-Ik )

Now, first cycle of output is computed. We need to go for weight reinitialization or

weight update based on error as follows.
C. Calculating error for the kth layer i.e output layer.

• Ek = Oj (1 - Oj )(Tj - Oj )

D. Similarly, calculate the error factor for the jth layer i.e., hidden layer.

• Ej = Oj (1 - Oj ) ∑wjk * Ek

E. Then we go for weight updation

F. At last we go for bias updating as follows:
Principal Components Analysis ( PCA)
• An exploratory technique used to reduce the
dimensionality of the data set to 2D or 3D
• Can be used to:
• Reduce number of dimensions in data
• Find patterns in high-dimensional data
• Visualize data of high dimensionality
• Example applications:
• Face recognition
• Image compression
• Gene expression analysis

41
Principal Components Analysis Ideas ( PCA)

• Does the data set ‘span’ the whole of d dimensional space?

• For a matrix of m samples x n genes, create a new covariance matrix
of size n x n.
• Transform some large number of variables into a smaller number of
uncorrelated variables called principal components (PCs).
• developed to capture as much of the variation in data as possible

42
Principal Component Analysis
◼See online tutorials such as
http://www.cs.otago.ac.nz/cosc453/student_tutorials/princi
pal_components.pdf X2

Y2
x
x
x
Note: Y1 is the x xx
x x
first eigen vector, x
x x
Y2 is the second. x
x
Y2 ignorable. x x
x x
x x X1
x x Key observation:
x x
x x variance = largest!

43
Principal Component Analysis: one
attribute first Temperature
42
40
• Question: how much 24
spread is in the data 30
along the axis? 15

(distance to the mean) 18

15
• Variance=Standard 30
deviation^2 15
n

(Xi − X ) 2 30
35
s =
2 i =1

(n − 1) 30
40
30
44
Now consider two dimensions
X=Temperature Y=Humidity
Covariance: measures the 40 90
correlation between X and Y
• cov(X,Y)=0: independent 40 90
•Cov(X,Y)>0: move same dir 40 90
•Cov(X,Y)<0: move oppo dir 30 90
15 70
15 70
15 70
30 90
n


15 70
( X i − X )(Yi − Y )
i =1 30 70
cov(X , Y ) =
(n − 1) 30 70
30 90
40 70 45
More than two attributes: covariance
matrix
• Contains covariance values between all possible
dimensions (=attributes):

C nxn
= (cij | cij = cov( Dimi , Dim j ))
• Example for three attributes (x,y,z):

 cov( x, x) cov( x, y ) cov( x, z ) 

 
C =  cov( y, x) cov( y, y ) cov( y, z ) 
 cov( z , x) cov( z , y ) cov( z , z ) 
 
46
Eigenvalues & eigenvectors
• Vectors x having same direction as Ax are called
eigenvectors of A (A is an n by n matrix).
• In the equation Ax=x,  is called an eigenvalue of A.

 2 3   3  12   3
  x  =   = 4 x 
 2 1  2  8   2

47
Eigenvalues & eigenvectors

• Ax=x  (A-I)x=0
• How to calculate x and :
• Calculate det(A-I), yields a polynomial (degree n)
• Determine roots to det(A-I)=0, roots are eigenvalues 
• Solve (A- I) x=0 for each  to obtain eigenvectors x

48
Principal components
• 1. principal component (PC1)
• The eigenvalue with the largest absolute value will indicate
that the data have the largest variance along its
eigenvector, the direction along which there is greatest
variation
• 2. principal component (PC2)
• the direction with maximum variation left in data,
orthogonal to the 1. PC
• In general, only few directions manage to capture
most of the variability in the data.

49
Steps of PCA
• Let Xbe the mean
• For matrix C, vectors e
vector (taking the mean (=column vector) having
of all rows) same direction as Ce :
• Adjust the original data • eigenvectors of C is e such
that Ce=e,
by the mean •  is called an eigenvalue of
X’ = X – C.
X
• Compute the covariance • Ce=e  (C-I)e=0
matrix C of adjusted X
• Most data mining packages
• Find the eigenvectors do this for you.
and eigenvalues of C.

50
Eigenvalues
• Calculate eigenvalues  and eigenvectors x for
covariance matrix:
• Eigenvalues j are used for calculation of [% of total variance]
(Vj) for each component j:

j n
V j = 100  n  x =n
 x
x =1
x =1

51
Principal components - Variance

Variance (%) 15

0
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10

52
Transformed Data
• Eigenvalues j corresponds to variance on each
component j
• Thus, sort by j
• Take the first p eigenvectors ei; where p is the number of
top eigenvalues
• These are the directions with the largest variances
 yi1   e1  xi1 − x1 
    
 yi 2   e2  xi 2 − x2 
 ...  =  ...  
    ... 
 y   e  x − x 
 ip   p  in n  53
An Example Mean1=24.1
Mean2=53.8

X1 X2 X1' X2' 100

90
80
70
60
19 63 -5.1 9.25 50 Series1
40
30
20
39 74 14.9 20.25 10
0
0 10 20 30 40 50

30 87 5.9 33.25
40

30
30 23 5.9 -30.75 20

15 35 -9.1 -18.75 -15 -10 -5

0
0 5 10 15 20
Series1

-10

-20

15 43 -9.1 -10.75 -30

-40

15 32 -9.1 -21.75
54
Covariance Matrix
75 106
• C=
106 482

• Using MATLAB, we find out:

• Eigenvectors:
• e1=(-0.98,-0.21), 1=51.8
• e2=(0.21,-0.98), 2=560.2
• Thus the second eigenvector is more important!

55
If we only keep one dimension: e2
0.5
yi
0.4
-10.14
0.3

• We keep the dimension 0.2

0.1
-16.72
-31.35
of e2=(0.21,-0.98) 0
31.374
-40 -20 -0.1 0 20 40
• We can obtain the final -0.2
16.464
-0.3 8.624
data as -0.4 19.404
-0.5
-17.63

 xi1 
yi = (0.21 − 0.98)  = 0.21* xi1 − 0.98 * xi 2
 xi 2 

56
57
58
Applications – Gene expression analysis
• Reference: Raychaudhuri et al. (2000)
• Purpose: Determine core set of conditions for useful
gene comparison
• Dimensions: conditions, observations: genes
• Yeast sporulation dataset (7 conditions, 6118 genes)
• Result: Two components capture most of variability (90%)
• Issues: uneven data intervals, data dependencies
• PCA is common prior to clustering
• Crisp clustering questioned : genes may correlate with
multiple clusters
• Alternative: determination of gene’s closest neighbours

59
Singular Value Decomposition
Underconstrained Least Squares

• What if you have fewer data points than parameters in your

function?
– Intuitively, can’t do standard least squares
– Recall that solution takes the form ATAx = ATb
– When A has more columns than rows,
ATA is singular: can’t take its inverse, etc.
Underconstrained Least Squares

• More subtle version: more data points than unknowns, but data
poorly constrains function
• Example: fitting to y=ax2+bx+c
Underconstrained Least Squares

• Problem: if problem very close to singular,

roundoff error can have a huge effect
– Even on “well-determined” values!
• Can detect this:
– Uncertainty proportional to covariance C = (ATA)-1
– In other words, unstable if ATA has small values
– More precisely, care if xT(ATA)x is small for any x
• Idea: if part of solution unstable, set answer to 0
– Avoid corrupting good parts of answer
Singular Value Decomposition (SVD)

• Handy mathematical technique that has application to many

problems
• Given any mn matrix A, algorithm to find matrices U, V, and W
such that
A = U W VT
U is mn and orthonormal
W is nn and diagonal
V is nn and orthonormal
SVD

   
    T
    w1 0 0  
 =  0  0  
  
A U V
  
    0 0 wn  

   
   

• Treat as black box: code widely available

In Matlab: [U,W,V]=svd(A,0)
SVD

• The wi are called the singular values of A

• If A is singular, some of the wi will be 0
• In general rank(A) = number of nonzero wi
• SVD is mostly unique (up to permutation of singular values, or if
some wi are equal)
SVD and Inverses

• Why is SVD so useful?

• Application #1: inverses
• A-1=(VT)-1 W-1 U-1 = V W-1 UT
– Using fact that inverse = transpose
for orthogonal matrices
– Since W is diagonal, W-1 also diagonal with reciprocals
of entries of W
SVD and Inverses

• A-1=(VT)-1 W-1 U-1 = V W-1 UT

• This fails when some wi are 0
– It’s supposed to fail – singular matrix
• Pseudoinverse: if wi=0, set 1/wi to 0 (!)
– “Closest” matrix to inverse
– Defined for all (even non-square, singular, etc.)
matrices
– Equal to (ATA)-1AT if ATA invertible
SVD and Least Squares

• Solving Ax=b by least squares

• x=pseudoinverse(A) times b
• Compute pseudoinverse using SVD
– Lets you see if data is singular
– Even if not singular, ratio of max to min singular values (condition number)
tells you how stable the solution will be
– Set 1/wi to 0 if wi is small (even if not exactly 0)
SVD and Eigenvectors

• Let A=UWVT, and let xi be ith column of V

• Consider ATA xi: 0  0 
   
   
2   2
A Axi = VW U UWV xi = VW V xi = VW 1 = V wi = wi xi
T T T T 2 T 2
   
   
0  0 
   
• So elements of W are sqrt(eigenvalues) and
columns of V are eigenvectors of ATA
– What we wanted for robust least squares fitting!
Summary of Singular Value Decomposition

Unit 5 (Dimensionality Reduction)
No ratings yet
Unit 5 (Dimensionality Reduction)
96 pages
20 Pca
No ratings yet
20 Pca
50 pages
Unit 3
No ratings yet
Unit 3
21 pages
AML Unit - 1 Material
No ratings yet
AML Unit - 1 Material
36 pages
Principal Component Analysis PCA 17
No ratings yet
Principal Component Analysis PCA 17
58 pages
Week 9 Lecture - Revision Test-Dual-Translated
No ratings yet
Week 9 Lecture - Revision Test-Dual-Translated
92 pages
Presentation A I STD 2
No ratings yet
Presentation A I STD 2
63 pages
L06 Feature Selection and Extraction
No ratings yet
L06 Feature Selection and Extraction
29 pages
PrincipalComponentAnalysis LectureNotesPublic
No ratings yet
PrincipalComponentAnalysis LectureNotesPublic
24 pages
FALLSEM2024-25 SWE1015 ETH VL2024250103260 2024-09-18 Reference-Material-I
No ratings yet
FALLSEM2024-25 SWE1015 ETH VL2024250103260 2024-09-18 Reference-Material-I
62 pages
شباتر اله مجمعه
No ratings yet
شباتر اله مجمعه
126 pages
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Dimensionality Reduction With Principal Component Analysis
No ratings yet
Dimensionality Reduction With Principal Component Analysis
39 pages
Feature Extraction
No ratings yet
Feature Extraction
90 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
27 pages
MLF Notes - Rishab Dec 24
No ratings yet
MLF Notes - Rishab Dec 24
6 pages
ML Chapter 4 Part3
No ratings yet
ML Chapter 4 Part3
82 pages
Lecture 12
No ratings yet
Lecture 12
31 pages
Lecture 9 - Data Reduction
No ratings yet
Lecture 9 - Data Reduction
36 pages
09 Pca
No ratings yet
09 Pca
19 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
17 pages
L 10 Principal Component Analysis 09052024 072206pm
No ratings yet
L 10 Principal Component Analysis 09052024 072206pm
37 pages
DAAI - Lecture - 04 - With - Solutions - 10oct22
No ratings yet
DAAI - Lecture - 04 - With - Solutions - 10oct22
84 pages
Generalized Fermat Equation
From Everand
Generalized Fermat Equation
Ran Van Vo
No ratings yet
Core Parameter Files Part II
No ratings yet
Core Parameter Files Part II
14 pages
Assignment 2 Documentation
No ratings yet
Assignment 2 Documentation
15 pages
Lecture: Dimensionality Reduction With Principal Component Analysis
No ratings yet
Lecture: Dimensionality Reduction With Principal Component Analysis
42 pages
Lecture6 PCA
No ratings yet
Lecture6 PCA
30 pages
1-Python Algebra Maths
No ratings yet
1-Python Algebra Maths
26 pages
10 Autoencoders
No ratings yet
10 Autoencoders
42 pages
Jbel Lahdid (Essaouira) - Schedule 24 - Maintenance Plan Dm037065-En R2
No ratings yet
Jbel Lahdid (Essaouira) - Schedule 24 - Maintenance Plan Dm037065-En R2
34 pages
LectureNotes PCA
No ratings yet
LectureNotes PCA
20 pages
PCA Complete
No ratings yet
PCA Complete
8 pages
8-Rotary Shaft Seals Catalogue
No ratings yet
8-Rotary Shaft Seals Catalogue
104 pages
ML Unit - 3 DimensionalitY Reduction
No ratings yet
ML Unit - 3 DimensionalitY Reduction
39 pages
Our Slope Stability Radar Suite English
No ratings yet
Our Slope Stability Radar Suite English
8 pages
# Loop Over Classes: 6.2 Principal Components Analysis (Pca)
No ratings yet
# Loop Over Classes: 6.2 Principal Components Analysis (Pca)
10 pages
William G. Niederland - The Schreber Case - Psychoanalytic Profile of A Paranoid Personality-Routledge (1984) PDF
No ratings yet
William G. Niederland - The Schreber Case - Psychoanalytic Profile of A Paranoid Personality-Routledge (1984) PDF
197 pages
Auto Encoder
No ratings yet
Auto Encoder
11 pages
Dimensionality Reduction Using PCA (Principal Component Analysis)
No ratings yet
Dimensionality Reduction Using PCA (Principal Component Analysis)
13 pages
Astor MG
No ratings yet
Astor MG
67 pages
Multivariate
100% (1)
Multivariate
78 pages
W9a Autoencoders Pca
No ratings yet
W9a Autoencoders Pca
7 pages
Unit 4 (PCA)
No ratings yet
Unit 4 (PCA)
12 pages
Nexon
No ratings yet
Nexon
34 pages
Lecture 17 and 18
No ratings yet
Lecture 17 and 18
29 pages
A Introduction To SVM PDF
No ratings yet
A Introduction To SVM PDF
48 pages
Pca Kmeans GMM
No ratings yet
Pca Kmeans GMM
96 pages
How Do You Do A Principal Component Analysis?
No ratings yet
How Do You Do A Principal Component Analysis?
13 pages
Pattern Classification 06. Feature Selection & Extraction: Abdelmoniem Bayoumi, PHD
No ratings yet
Pattern Classification 06. Feature Selection & Extraction: Abdelmoniem Bayoumi, PHD
29 pages
Dimensionality Reduction by Pca: Non - Feasible
No ratings yet
Dimensionality Reduction by Pca: Non - Feasible
26 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
3 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
82 pages
Steps For PCA
No ratings yet
Steps For PCA
5 pages
Principal Component Analysis (PCA) Final
No ratings yet
Principal Component Analysis (PCA) Final
37 pages
Unit 3
No ratings yet
Unit 3
28 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
Slides Lecture7 Ext
No ratings yet
Slides Lecture7 Ext
21 pages
PCA - Ensemble Classifiers
No ratings yet
PCA - Ensemble Classifiers
9 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
SFML DATE 19 Lecture3 Svdpca Notes
No ratings yet
SFML DATE 19 Lecture3 Svdpca Notes
6 pages
Presentation
No ratings yet
Presentation
31 pages
Viking M Service Manual PT - 1 Includes PI PM
100% (1)
Viking M Service Manual PT - 1 Includes PI PM
30 pages
Principal Component Analysis: Courtesy:University of Louisville, CVIP Lab
No ratings yet
Principal Component Analysis: Courtesy:University of Louisville, CVIP Lab
48 pages
Altered States of Consciousness PDF
No ratings yet
Altered States of Consciousness PDF
14 pages
Andrew Janiak - Newton and Descartes - Theology and Natural Philosophy
100% (1)
Andrew Janiak - Newton and Descartes - Theology and Natural Philosophy
22 pages
Jurnal - PPH & BPHTB Pada PPJB
No ratings yet
Jurnal - PPH & BPHTB Pada PPJB
20 pages
Barkatullah Vishwavidyalaya Migration Cert - Format Filledup
No ratings yet
Barkatullah Vishwavidyalaya Migration Cert - Format Filledup
5 pages
Mia Liza A. Lustria, Ph.D. (CV)
No ratings yet
Mia Liza A. Lustria, Ph.D. (CV)
24 pages
Fisher 377 Trip Valve: Scope of Manual
No ratings yet
Fisher 377 Trip Valve: Scope of Manual
20 pages
Wk01 Machine Learning
No ratings yet
Wk01 Machine Learning
6 pages
US XX Mobil Solvancer
No ratings yet
US XX Mobil Solvancer
3 pages
A Thesis ON: Pharma Marketing Management
No ratings yet
A Thesis ON: Pharma Marketing Management
3 pages
Step 1
No ratings yet
Step 1
3 pages
STT13005D: High Voltage Fast-Switching NPN Power Transistor
No ratings yet
STT13005D: High Voltage Fast-Switching NPN Power Transistor
10 pages
K Năng T NG H P 5 - Bài KT1
No ratings yet
K Năng T NG H P 5 - Bài KT1
3 pages
Projecting Data To A Lower Dimension With PCA
No ratings yet
Projecting Data To A Lower Dimension With PCA
6 pages
Mag KHS
No ratings yet
Mag KHS
4 pages
Kanban Cheat Sheet PDF
No ratings yet
Kanban Cheat Sheet PDF
1 page
HSC ANNUAL-EXAMINATIONS 2024 Biek - Official
No ratings yet
HSC ANNUAL-EXAMINATIONS 2024 Biek - Official
2 pages
Synopsis Format Propeller Shaft
No ratings yet
Synopsis Format Propeller Shaft
4 pages
Welcome! - Introduction - AE1110x Courseware - Edx
No ratings yet
Welcome! - Introduction - AE1110x Courseware - Edx
4 pages
Abdelrahman Mansour Ele-Engineer
No ratings yet
Abdelrahman Mansour Ele-Engineer
2 pages
Juvenile Delinquency
No ratings yet
Juvenile Delinquency
2 pages
Eco Action Plan Nov 2010
No ratings yet
Eco Action Plan Nov 2010
5 pages
Position Title SG Training Experience Education Eligibility
No ratings yet
Position Title SG Training Experience Education Eligibility
4 pages
Private Label
No ratings yet
Private Label
5 pages
Catalog PCBN en
No ratings yet
Catalog PCBN en
2 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Deep Learning Unit 2

Uploaded by

Deep Learning Unit 2

Uploaded by

Deep Learning

• Disadvantages of Neural Networks

• Advantages of Neural Networks

I/P Hidden O/P Layer

I/p Hidd O/P

Hidd Hidd O/p

• There can be cross connection between layers

6. Now compute output of the output layer ie., Ok

Now, first cycle of output is computed. We need to go for weight reinitialization or

E. Then we go for weight updation

• Does the data set ‘span’ the whole of d dimensional space?

(distance to the mean) 18

 cov( x, x) cov( x, y ) cov( x, z ) 

X1 X2 X1' X2' 100

15 35 -9.1 -18.75 -15 -10 -5

15 43 -9.1 -10.75 -30

• Using MATLAB, we find out:

• We keep the dimension 0.2

• What if you have fewer data points than parameters in your

• Problem: if problem very close to singular,

• Handy mathematical technique that has application to many

• Treat as black box: code widely available

• The wi are called the singular values of A

• Why is SVD so useful?

• A-1=(VT)-1 W-1 U-1 = V W-1 UT

• Solving Ax=b by least squares

• Let A=UWVT, and let xi be ith column of V

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.