cnn

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Convolutional Neural Network

Convolution neural network (also known as ConvNet or CNN) is a type of feed-forward


forward neural network
used in tasks like image analysis, natural language processing, and other complex image classification
problems.

Before GPUs came into use, computers were not able to process large amounts of image data within a
reasonable time, and the training was performed on images with low resolutions only.

CNNs Layers

Here's an overview of layers used to build Convolutional Neu


Neural
ral Network architectures.

Convolutional Layer

CNN works by comparing images piece by piece.

Filters are spatially small along width and height but extend through the full depth of the input
image. It is designed in such a manner that it detects a specifi
specificc type of feature in the input image.

In the convolution layer, we move the filter/kernel to every possible position on the input matrix.
Element-wise
wise multiplication between the filter
filter-sized
sized patch of the input image and filter is done,
which is then summed.

The translation of the filter to every possible position of the input matrix of the image gives an
opportunity to discover that feature is present anywhere in the image.

The generated resulting matrix is called the feature map.


Convolution neural networks rks can learn from multiple features parallelly. In the final stage, we
stack all the output feature maps along with the depth and produce the output

Local connectivity refers to images represented in a matrix of pixel values. The dimension
increases depending on the size of the image. If all the neurons are connected to all previous
neurons as in a fully connected layer, the number of parameters increases manifold.

To resolve this, we connect each neuron to only a patch of input data. This spatial extent
ext (also
known as the receptive field of the neuron
neuron) determines the size of the filter.

Suppose we have an input image is of size 128*128*3. If the filter size is 5*5*3 then each
neuron in the convolution layer will have a total of 5*5*3 = 75 weights (and +1 bias parameter).

Spatial arrangement governs the size of the neurons in the output volume and how they are
arranged.

Three hyperparameters that control the size of the output volume:

 The depth—The depth of the output volume is equal to the number of filters we use to
look for different features in the image. The output volume has stacked activation/feature
maps along with the depth, making it equal to the number of filters used.
 Stride - Stride refers
ers to the number of pixels we slide while matching the filter with the
input image patch. If the stride is one, we move the filters one pixel at a time. Higher the
stride, smaller output volumes will be produced spatially.
 Zero-padding—It allows us to control the spatial size of the output volume by padding
zeros around the border of the input data.

Parameter Sharing means that the same weight matrix acts on all the neurons in a particular
feature map—the same filter is applied in different regions of the image. Natural images have
statistical properties, one being invariant to translation.

For example, an image of a cat remains an image of a cat even if it is translated one pixel to the
right—CNNs take this property into account by sharing parameters across multiple image
locations. Thus, we can find a cat with the same feature matrix whether the cat appears at column
i or column i+1 in the image.

ReLU Layer

In this layer, the ReLU activation function is used, and every negative value in the output volume
from the convolution layer is replaced with zero. This is done to prevent the values from
summing up to zero.

Pooling Layer

Pooling layers are added in between two convolution layers with the sole purpose of reducing the
spatial size of the image representation.

The pooling layer has two hyperparameters:

 window size
 stride

From each window, we take either the maximum value or the average of the values in the
window depending upon the type of pooling being performed.

The Pooling Layer operates independently on every depth slice of the input and resizes it
spatially ,and later stacks them together.
Types of Pooling

Max Pooling selects the maximum element from each of the windows of the feature map. Thus,
after the max-pooling
pooling layer, the output would be a fe
feature
ature map containing the most dominant
features of the previous feature map.

Average Pooling computes the average of the elements present in the region of the feature map
covered by the filter. It simply averages the features from the feature map.
Normalization Layer

Normalization layers, as the name suggests, normalize the output of the previous layers. It is
added in between the convolution and pooling layers, allowing every layer of the network to
learn more independently and avoid overfitting the model.

However, normalization layers are not used in advanced architectures because they do not
contribute much towards effective training.

Fully-Connected Layer

The Convolutional Layer, along with the Pooling Layer, forms a block in the Convolutional
Neural Network. The number of such layers may be increased for capturing finer details
depending upon the complexity of the task at the cost of more computational power.
Having been able to furnish important feature extraction, we are going to flatten the final
fi feature
representation and feed it to a regular fully
fully-connected
connected neural network for image classification
purposes

How do Convolutional Neural Networks work?

Now, let's get into the nitty-gritty


gritty of how CNNs work in practice.

A CNN has hidden layers of convolution


nvolution layers that form the base of ConvNets. Like any other
layer, a convolutional layer receives input volume, performs mathematical scalar product with
the feature matrix (filter), and outputs the feature maps.

Features refer to minute details in the image data like edges, borders, shapes, textures, objects,
circles, etc.

At a higher level, convolutional layers detect these patterns in the image data with the help of
filters. The higher-level
level details are taken care of by the first few convolutional layers.
la

The deeper the network goes, the more sophisticated the pattern searching becomes.

For example, in later layers rather than edges and simple shapes, filters may detect specific
objects like eyes or ears, and eventually a cat, a dog, and whatnot.
Thee first hidden layer in the network dealing with images is usually a convolutional layer.

When adding a convolutional layer to a network, we need to specify the number of filters we
want the layer to have.

A filter can be thought of as a relatively small m


matrix
atrix for which we decide the number of rows
and columns this matrix has. The value of this feature matrix is initialized with random numbers.
When this convolutional layer receives pixel values of input data, the filter will convolve over
each patch of the input matrix.

The output of the convolutional layer is usually passed through the ReLU activation function to
bring non-linearity
linearity to the model. It takes the feature map and replaces all the negative values
with zero.

But—

We haven't addressed the issue of too much computation that was a setback of using feedforward
neural networks, did we?

It's because there's no significant improvement.

The pooling layer is added in succession to the convolutional layer to reduce the dimensions.
We take a window of say 2x2 and select either the maximum pixel value or the average of all
pixels in the window and continue sliding the window. So, we take the feature map, perform a
pooling operation, and generate a new feature map reduced in size.

Pooling is a very important step in the ConvNet as reduces the computation and makes the model
tolerant towards distortions and variations.

The convolutional layer was responsible for the feature extraction. But
But—

What about the final prediction?

A fully connected
nnected dense neural network would use a flattened feature matrix and predict
according to the use case.

-----*****-_____

Full Architectural Description of Convolutional Layer

Building blocks of CNN :


VGGNet :

VGGNet came with a solution to improve performance rather than keep adding
more dense layers in the model.

The key innovation came down to grouping layers into blocks that were
repetitively used in the architecture because more layers of narrow convolutions
were deemed more powerful than a smaller number of wider convolutions.

A VGG-block had a bunch of 3x3 convolutions padded by 1 to keep the size of


output the same as that of input, followed by max pooling to half the resolution.
The architecture had n number of VGG blocks followed by three fully connected
dense layers.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy