0% found this document useful (0 votes)

49 views33 pages

Lec13 Neural Networks and Deep Learning PDF

Uploaded by

keerthi2k6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views33 pages

Lec13 Neural Networks and Deep Learning PDF

Uploaded by

keerthi2k6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Neural Networks and Deep Learning

Brain inspired way to learn from patterns

COURSE: CS60045

Pallab Dasgupta
Professor,
Dept. of Computer Sc & Engg

1
INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR
Brain inspired computing

• Simple units
• The power is in
the network

INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

2
Preliminaries

• Deciding the capacity of the model

• Under-fitting, if the capacity is
weak
• Over-fitting, if the capacity is
unnecessarily large

• Neural network offers a generic

model, which offers:
• Structural variants, so as to
scale up / down the capacity
• Various types of activation
functions, which enables the
modeling of various types of
functions.

INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

3
Neural Networks
A neural network consists of a set of nodes
(neurons/units) connected by links 𝒂𝒂𝟎𝟎 = −𝟏𝟏

• Each link has a numeric weight 𝑾𝑾𝟎𝟎,𝒊𝒊

Bias Weight 𝒂𝒂𝒊𝒊 = 𝒈𝒈(𝒊𝒊𝒊𝒊𝒊𝒊 )

Each unit has: 𝒈𝒈

• a set of input links from other units, 𝑾𝑾𝒋𝒋,𝒊𝒊
𝒂𝒂𝒋𝒋 𝒊𝒊𝒊𝒊𝒊𝒊 𝒂𝒂𝒊𝒊
• a set of output links to other units,
• a current activation level, and
Input Output
an activation function to compute the Input Links Function Activation
•
activation level in the next time step. Function

𝒏𝒏 𝒏𝒏

𝒊𝒊𝒊𝒊𝒊𝒊 = � 𝑾𝑾𝒋𝒋,𝒊𝒊 𝒂𝒂𝒋𝒋 𝒂𝒂𝒊𝒊 = 𝒈𝒈(𝒊𝒊𝒊𝒊𝒊𝒊 ) = 𝒈𝒈 � 𝑾𝑾𝒋𝒋,𝒊𝒊 𝒂𝒂𝒋𝒋

𝒋𝒋=𝟎𝟎 𝒋𝒋=𝟎𝟎

INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

4
Perceptron
𝒂𝒂 = 𝒈𝒈(𝒊𝒊𝒊𝒊)
𝒙𝒙𝟎𝟎 = −𝟏𝟏 𝑾𝑾𝟎𝟎
𝒈𝒈
𝑾𝑾𝟏𝟏
𝒙𝒙𝟏𝟏 𝒊𝒊𝒊𝒊 𝒂𝒂

𝒙𝒙𝟐𝟐 𝑾𝑾𝟐𝟐 Input Output

Function Activation
Function
Studying a perceptron helps us to
𝟐𝟐 understand the limitations in capacity and
𝒊𝒊𝒊𝒊 = � 𝑾𝑾𝒊𝒊 𝒙𝒙𝒊𝒊
𝟎𝟎 𝒊𝒊𝒊𝒊 𝒊𝒊𝒊𝒊 ≤ 𝟎𝟎 the corresponding inability to model certain
𝒂𝒂 = �
𝒋𝒋=𝟎𝟎
𝟏𝟏 𝒊𝒊𝒊𝒊 𝒊𝒊𝒊𝒊 > 𝟎𝟎
.
types of functions.

INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

5
Perceptron
Linear Function:
𝒂𝒂 = 𝒈𝒈(𝒊𝒊𝒊𝒊)
𝒙𝒙𝟎𝟎 = −𝟏𝟏 𝑾𝑾𝟎𝟎 𝒊𝒊𝒊𝒊 = 𝒙𝒙𝟏𝟏 𝑾𝑾𝟏𝟏 + 𝒙𝒙𝟐𝟐 𝑾𝑾𝟐𝟐 − 𝑾𝑾𝟎𝟎
𝒈𝒈
𝑾𝑾𝟏𝟏
𝒙𝒙𝟏𝟏 𝒊𝒊𝒊𝒊 𝒂𝒂 𝟎𝟎 𝒊𝒊𝒊𝒊 𝒊𝒊𝒊𝒊 ≤ 𝟎𝟎
𝒂𝒂 = �
𝟏𝟏 𝒊𝒊𝒊𝒊 𝒊𝒊𝒊𝒊 > 𝟎𝟎
.
𝒙𝒙𝟐𝟐 𝑾𝑾𝟐𝟐 Input Output
Function Activation
Function
AND: W1 = 1, W2 = 1, W0 = 1
in = x1 + x2 − 1

OR: W1 = 2, W2 = 2, W0 = 1
in = 2x1 + 2x2 − 1

What about XOR?

INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

6
Multiple Layers Increase the Capacity

The black and white dots are not linearly separable, that
is, no linear function of the following form separates them:
𝒊𝒊𝒊𝒊 = 𝒙𝒙𝟏𝟏 𝑾𝑾𝟏𝟏 + 𝒙𝒙𝟐𝟐 𝑾𝑾𝟐𝟐 − 𝑾𝑾𝟎𝟎

With two layers, it is possible to

model the XOR function.

INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

7
Supervised Learning by back-propagating errors
The basic idea:
• We compute the output error as: Golden
output
Error = golden output (y) − output of network (a)
• The training error function computed over all training data is: Training
Neural Network
𝟏𝟏 input
𝑬𝑬 = ∑𝒊𝒊(𝒚𝒚𝒊𝒊 − 𝒂𝒂𝒊𝒊 )𝟐𝟐
𝟐𝟐

• We wish to find values of Wj such that E is minimum over the

Adjust weights
training data
• For this purpose we may iteratively do the following:
• Present a training sample to the network
• Compute the error for this output
• Factorize the error in proportion to the contribution of the
nodes and readjust the weights accordingly

INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

8
Learning in Single Layered Networks
Idea: Optimize the weights so as to minimize error function:
𝟏𝟏 𝟏𝟏 𝟐𝟐
𝑬𝑬 = 𝑬𝑬𝑬𝑬𝑬𝑬𝟐𝟐 = 𝒚𝒚 − 𝒈𝒈 ∑𝒏𝒏𝒋𝒋=𝟎𝟎 𝑾𝑾𝒋𝒋 𝒙𝒙𝒋𝒋
𝟐𝟐 𝟐𝟐

We can use gradient descent to reduce the squared error by

calculating the partial derivative of E with respect to each weight.

𝝏𝝏𝝏𝝏
𝝏𝝏𝑾𝑾𝒋𝒋 Weight update rule:
𝝏𝝏𝝏𝝏𝝏𝝏𝝏𝝏
= 𝑬𝑬𝑬𝑬𝑬𝑬 × 𝑾𝑾𝒋𝒋 ← 𝑾𝑾𝒋𝒋 + 𝜶𝜶 × 𝑬𝑬𝑬𝑬𝑬𝑬 × 𝒈𝒈𝒈(𝒊𝒊𝒊𝒊) × 𝒙𝒙𝒋𝒋
𝝏𝝏𝑾𝑾𝒋𝒋
𝒏𝒏 where α is the learning rate
𝝏𝝏
= 𝑬𝑬𝑬𝑬𝑬𝑬 × 𝒚𝒚 − 𝒈𝒈 � 𝑾𝑾𝒋𝒋 𝒙𝒙𝒋𝒋
𝝏𝝏𝑾𝑾𝒋𝒋
𝒋𝒋=𝟎𝟎 We purposefully eliminate a fraction of the
= −𝐄𝐄𝐄𝐄𝐄𝐄 × 𝒈𝒈𝒈(𝒊𝒊𝒊𝒊) × 𝒙𝒙𝒋𝒋 error through the weight adjustment rule,
but not the whole of it. Why?
INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

9
Multi-Layer Feed-Forward Network
Weight updation rule at the output layer:
Oi Output units
𝑾𝑾𝒋𝒋 ← 𝑾𝑾𝒋𝒋 + 𝜶𝜶 × 𝑬𝑬𝑬𝑬𝑬𝑬 × 𝒈𝒈𝒈(𝒊𝒊𝒊𝒊) × 𝒙𝒙𝒋𝒋
Wj,i (same as single layer)

aj Hidden units

Wk,j

Ik Input units

In multilayer networks, the hidden layers also contribute to the error at the output.
• So the important question is: How do we revise the hidden layers?

10
INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR
Back-Propagation Learning

• To update the connections between the input units

and the hidden units, we need to define a quantity
analogous to the error term for output nodes
• The propagation rule for the ∆ values is
the following:
• We do an error back-propagation, defining error as 𝚫𝚫𝒋𝒋 = 𝒈𝒈𝒈(𝒊𝒊𝒊𝒊𝒋𝒋 ) ∑𝒊𝒊 𝑾𝑾𝒋𝒋,𝒊𝒊 𝚫𝚫𝒊𝒊
𝚫𝚫𝒊𝒊 = 𝑬𝑬𝑬𝑬𝑬𝑬𝒊𝒊 × 𝒈𝒈𝒈(𝒊𝒊𝒊𝒊𝒊𝒊 )
• The idea is that a hidden node j is responsible for
• The update rule for the hidden layers is:
some fraction of the error in each of the output
𝑾𝑾𝒌𝒌,𝒋𝒋 ← 𝑾𝑾𝒌𝒌,𝒋𝒋 + 𝜶𝜶 × 𝒂𝒂𝒌𝒌 × 𝚫𝚫𝒋𝒋
nodes to which it connects

• Thus the ∆i values are divided according to the

strength of the connection between the hidden
node and the output node and are propagated
back to provide the ∆j values for the hidden layer.

11
INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR
The mathematics behind the updation rule
The squared error on a single example is defined as:
𝟏𝟏
𝑬𝑬 = ∑𝒊𝒊(𝒚𝒚𝒊𝒊 − 𝒂𝒂𝒊𝒊 )𝟐𝟐
𝟐𝟐

where the sum is over the nodes in the output layer. To obtain the gradient with respect to a specific weight Wj,i
in the output layer, we need only expand out the activation ai as all other terms in the summation are unaffected
by Wj,i

𝝏𝝏𝝏𝝏 𝝏𝝏𝒂𝒂𝒊𝒊 𝝏𝝏𝝏𝝏 𝒊𝒊𝒊𝒊𝒊𝒊 ′

𝝏𝝏𝒊𝒊𝒊𝒊𝒊𝒊
= − 𝒚𝒚𝒊𝒊 − 𝒂𝒂𝒊𝒊 = − 𝒚𝒚𝒊𝒊 − 𝒂𝒂𝒊𝒊 = − 𝒚𝒚𝒊𝒊 − 𝒂𝒂𝒊𝒊 𝒈𝒈 𝒊𝒊𝒊𝒊𝒊𝒊
𝝏𝝏𝑾𝑾𝒋𝒋,𝒊𝒊 𝝏𝝏𝑾𝑾𝒋𝒋,𝒊𝒊 𝝏𝝏𝑾𝑾𝒋𝒋,𝒊𝒊 𝝏𝝏𝑾𝑾𝒋𝒋,𝒊𝒊
′
𝝏𝝏
= − 𝒚𝒚𝒊𝒊 − 𝒂𝒂𝒊𝒊 𝒈𝒈 𝒊𝒊𝒊𝒊𝒊𝒊 � 𝑾𝑾𝒋𝒋,𝒊𝒊 𝒂𝒂𝒋𝒋
𝝏𝝏𝑾𝑾𝒋𝒋,𝒊𝒊 ai
𝒋𝒋

= − 𝒚𝒚𝒊𝒊 − 𝒂𝒂𝒊𝒊 𝒈𝒈′ 𝒊𝒊𝒊𝒊𝒊𝒊 𝒂𝒂𝒋𝒋 = −𝒂𝒂𝒋𝒋 𝚫𝚫𝒊𝒊

𝑾𝑾𝒋𝒋,𝑖𝑖 ← 𝑾𝑾𝒋𝒋,𝑖𝑖 + 𝜶𝜶 × 𝑎𝑎𝒋𝒋 × ∆𝑖𝑖 Wj,i
aj
INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR
The mathematics contd.

𝝏𝝏𝝏𝝏 𝝏𝝏𝝏𝝏 𝒊𝒊𝒊𝒊𝒊𝒊 ′

𝝏𝝏𝒊𝒊𝒊𝒊𝒊𝒊
= − � 𝒚𝒚𝒊𝒊 −𝒂𝒂𝒊𝒊 = − � 𝒚𝒚𝒊𝒊 −𝒂𝒂𝒊𝒊 𝒈𝒈 𝒊𝒊𝒊𝒊𝒊𝒊
𝝏𝝏𝑾𝑾𝒌𝒌,𝒋𝒋 𝝏𝝏𝑾𝑾𝒌𝒌,𝒋𝒋 𝝏𝝏𝑾𝑾𝒌𝒌,𝒋𝒋
𝒊𝒊 𝒊𝒊
𝝏𝝏 𝝏𝝏𝒂𝒂𝒋𝒋 𝝏𝝏𝒈𝒈(𝒊𝒊𝒊𝒊𝒋𝒋 )
= − � 𝚫𝚫𝒊𝒊 � 𝑾𝑾𝒋𝒋,𝒊𝒊 𝒂𝒂𝒋𝒋 = − � 𝚫𝚫𝒊𝒊 𝑾𝑾𝒋𝒋,𝒊𝒊 = − � 𝚫𝚫𝒊𝒊 𝑾𝑾𝒋𝒋,𝒊𝒊
𝝏𝝏𝑾𝑾𝒌𝒌,𝒋𝒋 𝝏𝝏𝑾𝑾𝒌𝒌,𝒋𝒋 𝝏𝝏𝑾𝑾𝒌𝒌,𝒋𝒋
𝒊𝒊 𝒋𝒋 𝒊𝒊 𝒊𝒊
𝝏𝝏𝒊𝒊𝒊𝒊𝒋𝒋
′
= − � 𝚫𝚫𝒊𝒊 𝑾𝑾𝒋𝒋,𝒊𝒊 𝒈𝒈 𝒊𝒊𝒊𝒊𝒋𝒋 ai
𝝏𝝏𝑾𝑾𝒌𝒌,𝒋𝒋
𝒊𝒊

𝝏𝝏
′
= − � 𝚫𝚫𝒊𝒊 𝑾𝑾𝒋𝒋,𝒊𝒊 𝒈𝒈 𝒊𝒊𝒊𝒊𝒋𝒋
𝝏𝝏𝑾𝑾𝒌𝒌,𝒋𝒋
� 𝑾𝑾𝒌𝒌,𝒋𝒋 𝒂𝒂𝒌𝒌 Wj,i
𝒊𝒊 𝒌𝒌
aj
= − � 𝚫𝚫𝒊𝒊 𝑾𝑾𝒋𝒋,𝒊𝒊 𝒈𝒈′ 𝒊𝒊𝒊𝒊𝒋𝒋 𝒂𝒂𝒌𝒌 = −𝒂𝒂𝒌𝒌 𝚫𝚫𝒋𝒋
𝒊𝒊

Wk,j
𝑾𝑾𝒌𝒌,𝒋𝒋 ← 𝑾𝑾𝒌𝒌,𝒋𝒋 + 𝜶𝜶 × 𝒂𝒂𝒌𝒌 × 𝚫𝚫𝒋𝒋
ak
Problems with this Learning

• The weight updation rules define a single step of

gradient descent
• Gradient descent may reach a local minima
• The minimum training error reached at the end
of training is not the best

• The final network is not explainable. We do not know

what the network has learned.
• For a single layer network, the error can be
explained in terms of the inputs and the weights
• In a multi-layer network, the hidden layers do
not make any sense to the end user

14
INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR
Convolutional and Recurrent
Neural Networks
• Convolution is useful for learning artifacts that
have a small locality of reference
• Recurrence is useful for learning sequences

15
INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR
16
INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR
The Convolution Operation

Suppose we are tracking the location of a spaceship with a laser sensor.

• Our laser sensor produces a single output x(t), the position of the spaceship at time t
• Suppose that our laser sensor is somewhat noisy, and therefore we wish to take the average of
multiple measurements.
• More recent measurements have more weight, so we need a weighting function w(a), which
returns the weight of measurement taken at the past time, a.

𝒔𝒔 𝒕𝒕 = � 𝒙𝒙 𝒂𝒂 𝒘𝒘 𝒕𝒕 − 𝒂𝒂 𝒅𝒅𝒅𝒅 = (𝒙𝒙 ∗ 𝒘𝒘)(𝒕𝒕)

This operation is called convolution. The first argument, x( ), is called the input, and the
second argument, w( ), is called the kernel.

17
INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR
Discrete Convolution

If we assume that x and w are defined only on integer t, we can define discrete convolution:
∞

𝒔𝒔 𝒕𝒕 = 𝒙𝒙 ∗ 𝒘𝒘 𝒕𝒕 = � 𝒙𝒙 𝒂𝒂 𝒘𝒘(𝒕𝒕 − 𝒂𝒂)
𝒂𝒂=−∞

Convolution can also be defined over more than one axis at a time. For example, if we use a two dimensional
image I as our input, we may want to use a two dimensional kernel:

𝒔𝒔 𝒊𝒊, 𝒋𝒋 = 𝑰𝑰 ∗ 𝑲𝑲 𝒊𝒊, 𝒋𝒋 = � � 𝑰𝑰 𝒎𝒎, 𝒏𝒏 𝑲𝑲(𝒊𝒊 − 𝒎𝒎, 𝒋𝒋 − 𝒏𝒏)

𝒎𝒎 𝒏𝒏

Convolution is commutative, that is, we can also write (by replacing m by i — m and n by j — n):

𝒔𝒔 𝒊𝒊, 𝒋𝒋 = 𝑲𝑲 ∗ 𝑰𝑰 𝒊𝒊, 𝒋𝒋 = � � 𝑰𝑰 𝒊𝒊 − 𝒎𝒎, 𝒋𝒋 − 𝒏𝒏 𝑲𝑲(𝒎𝒎, 𝒏𝒏)

𝒎𝒎 𝒏𝒏

18
INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR
Convolution Networks help us to learn image filters

Machine learning can be used to learn these filters.

• The weights of a convolutional network are learned
• How does the network look like?

19
INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR
If kernel width is small, the network will be sparse

20
INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR
Convolution and Pooling A pooling function replaces the output
of the net at a certain location with a
summary statistic of the nearby
outputs

Set of three learned filters

The output of pooling unit is the same

in both cases. Hence both the 5s are
recognized.

21
INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR
Sequence Modeling: Recurrent and Recursive Networks

• Recurrent Neural Networks (RNNs) are a family of neural networks for processing sequential data
• Recurrent networks can scale to much longer sequences than would be practical for networks without
sequence-based specialization
• Most recurrent networks can also process sequences of variable length
• The key idea behind RNNs is parameter sharing
• For example, in a dynamical system, the parameters of the transfer function do not change with time
• Therefore we can use the same part of the neural network over and over again

22
INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR
Unfolding Computation
Consider a dynamical system:
𝒔𝒔(𝒕𝒕) = 𝒇𝒇 𝒔𝒔 𝒕𝒕−𝟏𝟏
; 𝜽𝜽
where s(t) is the state at time t and θ is the set of parameters of f

• The state after a finite number of steps can be obtained by applying the definition recursively. For example,
after 3 steps:
𝒔𝒔(𝟑𝟑) = 𝒇𝒇 𝒔𝒔 𝟐𝟐 ; 𝜽𝜽 = 𝒇𝒇 𝒇𝒇 𝒔𝒔 𝟏𝟏 ; 𝜽𝜽 ; 𝜽𝜽

• For a dynamical system driven by an external input signal x(t) :

𝒔𝒔(𝒕𝒕) = 𝒇𝒇 𝒔𝒔 𝒕𝒕−𝟏𝟏
, 𝒙𝒙 𝒕𝒕 ; 𝜽𝜽

23
INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR
Unfolding computation and Recurrent Network

h h(t−1) h(t) h(t+1)

f f f f
unfold

x x(t−1) x(t) x(t+1)

𝒉𝒉(𝒕𝒕) = 𝒇𝒇 𝒉𝒉 𝒕𝒕−𝟏𝟏 , 𝒙𝒙 𝒕𝒕 ; 𝜽𝜽

• Regardless of the sequence length, the learned model always has the same input size, because it is specified
in terms of transition from one state to another state, rather than specified in terms of a variable-length history
of states
• It is possible to use the same transition function f with the same parameters at each step

24
INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR
Useful topologies of RNNs

• RNNs that produce an output at each time step and have recurrent connections between hidden units

• RNNs that produce an output at each time step and have recurrent connections only from the output at one
time step to the hidden units at the next time step

• RNNs with recurrent connections between hidden units, that read an entire sequence and then produce a
single output

25
INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR
RNN with hidden-hidden feedback

RNN with hidden-hidden feedback is

universal. Any function computable by
a Turing machine can be computed by
such a RNN of finite size (weights can
have infinite precision).

Figure from Deep Learning,

Goodfellow, Bengio and Courville
RNN with output-hidden feedback

Less powerful than the hidden-hidden

feedback model.
Advantage: Each time step can be trained
in isolation (why?)

Figure from Deep Learning,

Goodfellow, Bengio and Courville
RNN with output only at the end

Can be used to summarize a sequence

and produce a fixed-size representation to
be used as an input for further processing

Figure from Deep Learning,

Goodfellow, Bengio and Courville
Boltzmann Machines
A Boltzmann machine is a network of units with an energy defined for the overall network. Its units
produce binary results. The global energy, E, is:
𝑬𝑬 = − ∑𝒊𝒊<𝒋𝒋 𝒘𝒘𝒊𝒊𝒊𝒊 𝒔𝒔𝒊𝒊 𝒔𝒔𝒋𝒋 + ∑𝒊𝒊 𝜽𝜽𝒊𝒊 𝒔𝒔𝒊𝒊
where:
• wij is the connection strength between unit j and unit i.
• si is the state, si ∈ { 0,1 }, of unit i
• 𝜽𝜽𝒊𝒊 is the bias of unit i in the global energy function. (−𝜽𝜽𝒊𝒊 is the activation threshold for the unit)

∆𝑬𝑬𝒊𝒊 = � 𝒘𝒘𝒊𝒊𝒊𝒊 𝒔𝒔𝒋𝒋 + � 𝒘𝒘𝒋𝒋𝒋𝒋 𝒔𝒔𝒋𝒋 + 𝜽𝜽𝒊𝒊

𝒋𝒋>𝒊𝒊 𝒋𝒋<𝒊𝒊

• From this we obtain (the scalar T is called the temperature):

𝟏𝟏
𝒑𝒑𝒊𝒊=𝑶𝑶𝑶𝑶 =
∆𝑬𝑬𝒊𝒊
𝟏𝟏 + 𝒆𝒆𝒆𝒆𝒆𝒆 −
𝑻𝑻

29
INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR
Source: DARPA

30
INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR
The ML problem in regression
What is the function 𝒇𝒇 . ?

Solution: This is where the different ML methods come in

• Linear model: 𝒇𝒇 𝒙𝒙 = 𝒘𝒘𝑻𝑻 𝒙𝒙
• Linear basis functions: 𝒇𝒇 𝒙𝒙 = 𝒘𝒘𝑻𝑻 𝝓𝝓(𝒙𝒙)
• Where 𝝓𝝓 𝒙𝒙 = [𝝓𝝓𝟎𝟎 𝒙𝒙 𝝓𝝓𝟏𝟏 𝒙𝒙 … 𝝓𝝓𝑳𝑳 (𝒙𝒙)]𝑻𝑻 and 𝝓𝝓𝒍𝒍 (𝒙𝒙) is the basis function.
• Choices for the basis function:
• Powers of 𝒙𝒙: 𝝓𝝓𝒍𝒍 𝒙𝒙 = 𝒙𝒙𝒍𝒍
• Gaussian / Sigmoidal / Fourier / …
• Neural networks
• …

31
INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR
Classification
Given training data set with:
• Input values: 𝒙𝒙𝒏𝒏 = [𝒙𝒙𝟏𝟏 𝒙𝒙𝟐𝟐 … 𝒙𝒙𝑴𝑴 ]𝑻𝑻 for 𝒏𝒏 = 𝟏𝟏 … 𝑵𝑵.
• Output class labels, for example:
• 0/1 or −1/+1 for binary classification problems
• 1 … K for multi-class classification problems
• 1-of-K coding scheme:
𝐲𝐲 = [𝟎𝟎 … 𝟎𝟎 𝟏𝟏 𝟎𝟎 … 𝟎𝟎]𝑻𝑻
where, if 𝒙𝒙𝒏𝒏 belongs to class k, then the kth bit is 1 and all others are 0.

Objective: Predict the output class for new, unknown inputs 𝒙𝒙

�𝒎𝒎 .

32
INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR
Classification strategies

Linear discriminants
(2-class classifiers)

K-class discriminant

Combining 2-class classifiers to obtain multi-class classifiers is a bad idea !!

33
INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

neural (2)
No ratings yet
neural (2)
32 pages
Artificial Neural Networks: HCMC University of Technology Sep. 2008
No ratings yet
Artificial Neural Networks: HCMC University of Technology Sep. 2008
71 pages
CNN and Gan: Introduction To
No ratings yet
CNN and Gan: Introduction To
58 pages
Lecture15 NeuronNetworks
No ratings yet
Lecture15 NeuronNetworks
61 pages
Artificial Neural Network: Lecture Module 22
No ratings yet
Artificial Neural Network: Lecture Module 22
54 pages
Main
No ratings yet
Main
25 pages
Artificial Neural Networks: HCMC University of Technology Sep. 2008
No ratings yet
Artificial Neural Networks: HCMC University of Technology Sep. 2008
71 pages
Ann MJJ-1
No ratings yet
Ann MJJ-1
64 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
35 pages
mv_cs4243_2024_amir_6_p1 (1)
No ratings yet
mv_cs4243_2024_amir_6_p1 (1)
97 pages
Introduction Deep Eng (1)
No ratings yet
Introduction Deep Eng (1)
50 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
71 pages
Pattern Recognition 21BR551 MODULE 05 NOTES
No ratings yet
Pattern Recognition 21BR551 MODULE 05 NOTES
10 pages
Neural Networks: Some Material Adopted From Notes by
No ratings yet
Neural Networks: Some Material Adopted From Notes by
35 pages
Module 2
No ratings yet
Module 2
44 pages
855597620
No ratings yet
855597620
44 pages
Neural Net 3rdclass
No ratings yet
Neural Net 3rdclass
35 pages
ML UNIT-5
No ratings yet
ML UNIT-5
20 pages
ANN MODULE 1 Part2
No ratings yet
ANN MODULE 1 Part2
58 pages
A Beginner's Tutorial For CNN
100% (1)
A Beginner's Tutorial For CNN
35 pages
NN-BNU2
No ratings yet
NN-BNU2
47 pages
An Introduction To Neural Networks: Instituto Tecgraf PUC-Rio Nome: Fernanda Duarte Orientador: Marcelo Gattass
No ratings yet
An Introduction To Neural Networks: Instituto Tecgraf PUC-Rio Nome: Fernanda Duarte Orientador: Marcelo Gattass
45 pages
Chapter21 4e
No ratings yet
Chapter21 4e
35 pages
CC511 Week 5 - 6 - NN - BP
No ratings yet
CC511 Week 5 - 6 - NN - BP
62 pages
Advanced Information Retreival: Chapter 02: Modeling - Neural Network Model
No ratings yet
Advanced Information Retreival: Chapter 02: Modeling - Neural Network Model
31 pages
AI Mod4 Session 8 Best Fit Line & ANN
No ratings yet
AI Mod4 Session 8 Best Fit Line & ANN
39 pages
Deep Learning (1)
No ratings yet
Deep Learning (1)
19 pages
Lecture 10
No ratings yet
Lecture 10
155 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
14 pages
Essential Concept in Artificial Neural Networks
No ratings yet
Essential Concept in Artificial Neural Networks
27 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
14 pages
mlc_03_dl_graph_generation-sose2023
No ratings yet
mlc_03_dl_graph_generation-sose2023
53 pages
Neural-Network(Basics)
No ratings yet
Neural-Network(Basics)
48 pages
chapter 4 Neural Network
No ratings yet
chapter 4 Neural Network
46 pages
UNIT III 3.1 ML Artificial Neural Networks
No ratings yet
UNIT III 3.1 ML Artificial Neural Networks
65 pages
Neural Networks
No ratings yet
Neural Networks
40 pages
ml-unit-4
No ratings yet
ml-unit-4
32 pages
Notes DL-1
No ratings yet
Notes DL-1
10 pages
Lecture 10 Neural Network
No ratings yet
Lecture 10 Neural Network
34 pages
Module1 ECO-598 AI & ML Aug 21
No ratings yet
Module1 ECO-598 AI & ML Aug 21
45 pages
Aidl Unit III
No ratings yet
Aidl Unit III
79 pages
Neural Network and Fuzzy Logic
50% (2)
Neural Network and Fuzzy Logic
54 pages
Deep Learning PDF
100% (1)
Deep Learning PDF
87 pages
Institute For Advanced Management Systems Research Department of Information Technologies Abo Akademi University
No ratings yet
Institute For Advanced Management Systems Research Department of Information Technologies Abo Akademi University
41 pages
Neural Network
100% (1)
Neural Network
54 pages
Lecture 4
No ratings yet
Lecture 4
50 pages
TO Artificial Neural Networks
No ratings yet
TO Artificial Neural Networks
22 pages
Neural Networks
No ratings yet
Neural Networks
54 pages
ANN-Implemetation of Back-Prop
No ratings yet
ANN-Implemetation of Back-Prop
89 pages
Module 5 Lecture 2
No ratings yet
Module 5 Lecture 2
45 pages
JNTU - Neural Network
No ratings yet
JNTU - Neural Network
5 pages
TO Artificial Neural Networks
No ratings yet
TO Artificial Neural Networks
22 pages
Notes Chapter Neural Networks
No ratings yet
Notes Chapter Neural Networks
18 pages
Lecture+8
No ratings yet
Lecture+8
65 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Neural Networks
From Everand
Neural Networks
Sasha Kurzweil
No ratings yet
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Exercises of Logarithms and Exponentials
From Everand
Exercises of Logarithms and Exponentials
Simone Malacrida
No ratings yet
Exercises of Derivatives
From Everand
Exercises of Derivatives
Simone Malacrida
No ratings yet
Bio Inspired Computing: Fundamentals and Applications for Biological Inspiration in the Digital World
From Everand
Bio Inspired Computing: Fundamentals and Applications for Biological Inspiration in the Digital World
Fouad Sabry
No ratings yet
KRAI Practical
No ratings yet
KRAI Practical
14 pages
Mock Test Wipro
No ratings yet
Mock Test Wipro
2 pages
Final Answer Keys: University of Kashmir
No ratings yet
Final Answer Keys: University of Kashmir
2 pages
Unit 3 Deep Learning SPPU BE IT
No ratings yet
Unit 3 Deep Learning SPPU BE IT
30 pages
Yadnesh Resume
No ratings yet
Yadnesh Resume
1 page
ML Unit-2 - RTU
No ratings yet
ML Unit-2 - RTU
33 pages
Machine Learning: Chapter 4. Artificial Neural Networks
No ratings yet
Machine Learning: Chapter 4. Artificial Neural Networks
34 pages
ppt_poject
No ratings yet
ppt_poject
10 pages
Lec RNNs 2 LLMs - 1
No ratings yet
Lec RNNs 2 LLMs - 1
117 pages
2404.18311v2
No ratings yet
2404.18311v2
38 pages
Expectation Maximization
No ratings yet
Expectation Maximization
23 pages
Internal Assessment Test-Iii Department of Computer Science & Engineering
No ratings yet
Internal Assessment Test-Iii Department of Computer Science & Engineering
2 pages
Addiction of Thoughts
No ratings yet
Addiction of Thoughts
1 page
Top 50 Robotics Interview Questions for 2025
No ratings yet
Top 50 Robotics Interview Questions for 2025
6 pages
An Empirical Study of Language CNN For Image Captioning
No ratings yet
An Empirical Study of Language CNN For Image Captioning
10 pages
Hopfield Network
No ratings yet
Hopfield Network
32 pages
Machine Learning Toolkit User Manual
No ratings yet
Machine Learning Toolkit User Manual
7 pages
Roadmap Gen AI
No ratings yet
Roadmap Gen AI
2 pages
Final Report 1
No ratings yet
Final Report 1
45 pages
[FREE PDF sample] Deep Learning for Natural Language Processing (MEAP V07) Stephan Raaijmakers ebooks
100% (2)
[FREE PDF sample] Deep Learning for Natural Language Processing (MEAP V07) Stephan Raaijmakers ebooks
55 pages
Bcck Nhom4 Baomattmdt Tiet789
No ratings yet
Bcck Nhom4 Baomattmdt Tiet789
26 pages
Words Related to Artificial Intelligence
No ratings yet
Words Related to Artificial Intelligence
2 pages
RNN & LSTM: Nguyen Van Vinh Computer Science Department, UET, Vnu Ha Noi
No ratings yet
RNN & LSTM: Nguyen Van Vinh Computer Science Department, UET, Vnu Ha Noi
35 pages
2024-07-09T06!54!57.011Z - (ISRO) Bharatiya Antariksh Hackathon 24 - Idea Submission Template
No ratings yet
2024-07-09T06!54!57.011Z - (ISRO) Bharatiya Antariksh Hackathon 24 - Idea Submission Template
11 pages
ANN Lab Manual
100% (3)
ANN Lab Manual
35 pages
20HCC22XX: B.Tech (III Sem)
No ratings yet
20HCC22XX: B.Tech (III Sem)
2 pages
Deepfake Audio Detection Via MFCC Features Using Machine Learning
No ratings yet
Deepfake Audio Detection Via MFCC Features Using Machine Learning
11 pages
cs344-lect23-robot-plan-training-11mar08
No ratings yet
cs344-lect23-robot-plan-training-11mar08
13 pages
Implementation of Handwritten Digit Recognizer Using CNN: Vinjit, Bhojak, Kumar and Nikam
No ratings yet
Implementation of Handwritten Digit Recognizer Using CNN: Vinjit, Bhojak, Kumar and Nikam
9 pages
1T01878A03
No ratings yet
1T01878A03
197 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Lec13 Neural Networks and Deep Learning PDF

Uploaded by

Lec13 Neural Networks and Deep Learning PDF

Uploaded by

Neural Networks and Deep Learning

Brain inspired way to learn from patterns

INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

• Deciding the capacity of the model

• Neural network offers a generic

INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

• Each link has a numeric weight 𝑾𝑾𝟎𝟎,𝒊𝒊

Each unit has: 𝒈𝒈

𝒊𝒊𝒊𝒊𝒊𝒊 = � 𝑾𝑾𝒋𝒋,𝒊𝒊 𝒂𝒂𝒋𝒋 𝒂𝒂𝒊𝒊 = 𝒈𝒈(𝒊𝒊𝒊𝒊𝒊𝒊 ) = 𝒈𝒈 � 𝑾𝑾𝒋𝒋,𝒊𝒊 𝒂𝒂𝒋𝒋

INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

𝒙𝒙𝟐𝟐 𝑾𝑾𝟐𝟐 Input Output

INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

What about XOR?

INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

With two layers, it is possible to

INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

• We wish to find values of Wj such that E is minimum over the

INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

We can use gradient descent to reduce the squared error by

• To update the connections between the input units

• Thus the ∆i values are divided according to the

𝝏𝝏𝝏𝝏 𝝏𝝏𝒂𝒂𝒊𝒊 𝝏𝝏𝝏𝝏 𝒊𝒊𝒊𝒊𝒊𝒊 ′

= − 𝒚𝒚𝒊𝒊 − 𝒂𝒂𝒊𝒊 𝒈𝒈′ 𝒊𝒊𝒊𝒊𝒊𝒊 𝒂𝒂𝒋𝒋 = −𝒂𝒂𝒋𝒋 𝚫𝚫𝒊𝒊

𝝏𝝏𝝏𝝏 𝝏𝝏𝝏𝝏 𝒊𝒊𝒊𝒊𝒊𝒊 ′

• The weight updation rules define a single step of

• The final network is not explainable. We do not know

Suppose we are tracking the location of a spaceship with a laser sensor.

𝒔𝒔 𝒕𝒕 = � 𝒙𝒙 𝒂𝒂 𝒘𝒘 𝒕𝒕 − 𝒂𝒂 𝒅𝒅𝒅𝒅 = (𝒙𝒙 ∗ 𝒘𝒘)(𝒕𝒕)

𝒔𝒔 𝒊𝒊, 𝒋𝒋 = 𝑰𝑰 ∗ 𝑲𝑲 𝒊𝒊, 𝒋𝒋 = � � 𝑰𝑰 𝒎𝒎, 𝒏𝒏 𝑲𝑲(𝒊𝒊 − 𝒎𝒎, 𝒋𝒋 − 𝒏𝒏)

𝒔𝒔 𝒊𝒊, 𝒋𝒋 = 𝑲𝑲 ∗ 𝑰𝑰 𝒊𝒊, 𝒋𝒋 = � � 𝑰𝑰 𝒊𝒊 − 𝒎𝒎, 𝒋𝒋 − 𝒏𝒏 𝑲𝑲(𝒎𝒎, 𝒏𝒏)

Machine learning can be used to learn these filters.

Set of three learned filters

The output of pooling unit is the same

• For a dynamical system driven by an external input signal x(t) :

h h(t−1) h(t) h(t+1)

x x(t−1) x(t) x(t+1)

RNN with hidden-hidden feedback is

Figure from Deep Learning,

Less powerful than the hidden-hidden

Figure from Deep Learning,

Can be used to summarize a sequence

Figure from Deep Learning,

∆𝑬𝑬𝒊𝒊 = � 𝒘𝒘𝒊𝒊𝒊𝒊 𝒔𝒔𝒋𝒋 + � 𝒘𝒘𝒋𝒋𝒋𝒋 𝒔𝒔𝒋𝒋 + 𝜽𝜽𝒊𝒊

• From this we obtain (the scalar T is called the temperature):

Solution: This is where the different ML methods come in

Objective: Predict the output class for new, unknown inputs 𝒙𝒙

Combining 2-class classifiers to obtain multi-class classifiers is a bad idea !!

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.