Unit Iii Convolutional Networks and Sequence Modelling

Download as pdf or txt
Download as pdf or txt
You are on page 1of 38

UNIT III Convolutional Networks

and Sequence Modelling


Convolutional Networks
• Convolutional networks (also known as
convolutional neural networks or CNNs) are a
specialized kind of neural network for
processing data that has a known, grid-like
topology.
Convolutional Networks
• The name “convolutional neural network”
indicates that the network employs a
mathematical operation called convolution.
• Convolution is a specialized kind of linear
operation.
• CNNs are simply neural networks that use
convolution in place of general matrix
multiplication in at least one of their layers.
• CNN is a type of Deep Learning neural network
architecture commonly used in Computer Vision.
Convolution Operation
• The convolution operation is typically denoted
as: s(t) = (x ∗ w)(t)
• where the first argument (in this example, the
function x) to the convolution is often referred
to as the input and the second argument (in
this example, the function w) as the kernel.
The output is sometimes referred to as the
feature map
Convolution Operation
The convolution operates on
the input with a kernel
(weights) to produce an output
map given by:
Convolution Operation
Convolution Operation
Discrete convolution
2-D discrete convolution operation can be given by:

2D- convolutions over more than one axis at a time. For


example, if we use a two-dimensional image I as our
input, we probably also want to use a two-dimensional
kernel K
An example of 2-D convolution
Convolution without kernel flipping applied to a 2D tensor
• Output is restricted to case where kernel is situated entirely within the
image
• Arrows show how upper left of input tensor is used to form upper-left of
output tensor
2-D convolution Applied to Image
2-D Convolution Applied to Image
2-D Convolution Applied to Image
Working example of 2D Convolution
Example of 3D Convolution
Convolution operation
Example of 2D Convolution
Effect of Strides
• Stride is the number of pixels shifts over the input matrix.
• In order to allow for calculation of features at a coarser level strided
convolutions can be used.
• The effect of strided convolution is the same as that of a convolution
followed by a downsampling stage.
• Strides can be used to reduce the representation size.
Example of 2D Convolution
Motivation for using convolution
networks
1.Convolution leverages three important ideas
that can help improve a machine learning
system:
I. Sparse interactions
II. Parameter sharing
III. Equivariant representations
2. Convolution also allows for working with
inputs of variable size.
Motivation-Sparse Interactions

• In traditional Neural Networks, every output


unit interacts with every input unit.
• Convolutional networks, however, typically
have sparse interactions, by making kernel
smaller than input.
Reduces memory requirements
Improves statistical efficiency
• In a deep convolutional network, units in the
deeper layers may indirectly interact with a
larger portion of the input.
Motivation-Sparse Interactions
Highlight one input x3 and output units s affected by it
• Right hand side: when s is formed by convolution with a kernel
of width 3, only three outputs are affected by s3
• left hand side: when s is formed by matrix multiplication
connectivity is no longer sparse. So all outputs are affected by s3


Motivation-Sparse Interactions
Learning of Traditional vs Convolutional Networks
• Traditional neural network layers use matrix multiplication by a matrix of
parameters with a separate parameter describing the interaction between each
input unit and each output unit
s =g(WTx )

With m inputs and n outputs, matrix multiplication requires m x n parameters and


O( m × n ) runtime per example. This means every output unit interacts with every
input unit.
• Convolutional network layers have sparse interactions

• If we limit no of connections for each input to k we need k x n parameters and


O(k × n) runtime
Motivation-Sparse Interactions
Keeping up performance with reduced connections
• It is possible to obtain good performance while keeping k
several magnitudes lower than m
• In a deep neural network, units in deeper layers may
indirectly interact with a larger portion of the input
• Receptive Field in Deeper layers is larger than the receptive
field of units in shallow layers
• This allows the network to efficiently describe complicated
interactions between many variables from simple building
blocks that only describe sparse interactions
2. Motivation-Parameter Sharing

• Parameter sharing refers to using same parameter


for more than one function in a model.
• In convolutional neural net, each member of
kernel is used at every position of input i.e.
parameters used to compute different output units
are tied together (all times their values are
same).
• Sparse interactions and parameter sharing
combined can improve efficiency of a linear
function for detecting edges in an image
How parameter sharing works
Black arrows: connections that use a particular parameter
1. Convolutional model: Black arrows indicate uses of the central element of a 3-
element kernel

2.Fully connected model: Single black arrow indicates use of the central element of the
weight matrix . Model has no parameter sharing, so the parameter is used only once

How sparse connectivity and parameter sharing can dramatically improve efficiency of
image edge detection.
Efficiency of Parameter Sharing

• Parameter sharing by convolution operation


means that rather than learning a separate set of
parameters for every location, we learn only one
set
• This does not affect runtime of forward
propagation– which is still O(k ✕ n)
• But further reduces the storage requirements to k
parameters
• k is orders of magnitude less than m
• Since m and n are roughly the same size k is
much smaller than m x n
Motivation-Equivariance

• Parameter sharing in a convolutional


network provides equivariance to translation.

• Translation of image results in corresponding


translation in the output map.
• Convolution operation by itself is not equivariant to
changes in scale or rotation
Equivariance of Convolution to
Translation
Absence of Equivariance
CNN architecture
• CNN consists of multiple layers like the input layer,
Convolutional layer, Pooling layer, and fully connected layers.
• The Convolutional layer applies filters to the input image to
extract features, the Pooling layer downsamples the image to
reduce computation, and the fully connected layer makes the
final prediction. The network learns the optimal filters
through backpropagation and gradient descent.
CNN Architecture
CNN Architecture
• Now imagine taking a small patch of this image and running a small neural
network, called a filter or kernel on it, with say, K outputs and
representing them vertically.
• Now slide that neural network across the whole image, as a result, we will
get another image with different widths, heights, and depths.
• Instead of just R, G, and B channels now we have more channels but
lesser width and height. This operation is called Convolution.
• If the patch size is the same as that of the image it will be a regular neural
network. Because of this small patch, we have fewer weights.
CNN Architecture
• Now, a bit of mathematics that is involved in the
whole convolution process.
• Convolution layers consist of a set of learnable filters
(or kernels) having small widths and heights and the
same depth as that of input volume (3 if the input
layer is image input).
• For example, if we have to run convolution on an
image with dimensions 34 x 34 x 3. The possible size
of filters can be a x a x 3, where ‘a’ can be anything
like 3, 5, or 7 but smaller as compared to the image
dimension.
CNN architecture
• During the forward pass, we slide each filter across
the whole input volume step by step where each
step is called stride (which can have a value of 2, 3,
or even 4 for high-dimensional images) and compute
the dot product between the kernel weights and
patch from input volume.
• As we slide our filters we’ll get a 2-D output for each
filter and we’ll stack them together as a result, we’ll
get output volume having a depth equal to the
number of filters. The network will learn all the
filters.
CNN Architecture
• A complete CNN architecture is also known as covnets. A
covnets is a sequence of layers, and every layer
transforms one volume to another through a
differentiable function.

Types of layers: datasets,Let’s take an example by running a


covnets on of image of dimension 32 x 32 x 3.
• Input Layers: It’s the layer in which we give input to our
model. In CNN, Generally, the input will be an image or a
sequence of images. This layer holds the raw input of the
image with width 32, height 32, and depth 3.
CNN architecture
• Convolutional Layers: This is the layer, which is used
to extract the feature from the input dataset. It
applies a set of learnable filters known as the kernels
to the input images.
• The filters/kernels are smaller matrices usually 2×2,
3×3, or 5×5 shape. it slides over the input image data
and computes the dot product between kernel
weight and the corresponding input image patch.
• The output of this layer is referred as feature maps.
Suppose we use a total of 12 filters for this layer we’ll
get an output volume of dimension 32 x 32 x 12.
CNN Architecture
• Activation Layer: By adding an activation function to the output of the
preceding layer, activation layers add nonlinearity to the network.
• it will apply an element-wise activation function to the output of the
convolution layer. Some common activation functions are RELU: max(0,
x), Tanh, Leaky RELU, etc. The volume remains unchanged hence output
volume will have dimensions 32 x 32 x 12.

• Pooling layer: This layer is periodically inserted in the covnets and its main
function is to reduce the size of volume which makes the computation fast
reduces memory and also prevents overfitting.
• Two common types of pooling layers are max pooling and average
pooling. If we use a max pool with 2 x 2 filters and stride 2, the resultant
volume will be of dimension 16x16x12.
CNN Architecture
• Flattening: The resulting feature maps are flattened into a one-
dimensional vector after the convolution and pooling layers so they can
be passed into a completely linked layer for categorization or
regression.
• Fully Connected Layers: It takes the input from the previous layer and
computes the final classification or regression task.
CNN Architecture
• Output Layer: The output from the fully
connected layers is then fed into a logistic
function for classification tasks like sigmoid or
softmax which converts the output of each class
into the probability score of each class.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy