cnn
cnn
cnn
Before GPUs came into use, computers were not able to process large amounts of image data within a
reasonable time, and the training was performed on images with low resolutions only.
CNNs Layers
Convolutional Layer
Filters are spatially small along width and height but extend through the full depth of the input
image. It is designed in such a manner that it detects a specifi
specificc type of feature in the input image.
In the convolution layer, we move the filter/kernel to every possible position on the input matrix.
Element-wise
wise multiplication between the filter
filter-sized
sized patch of the input image and filter is done,
which is then summed.
The translation of the filter to every possible position of the input matrix of the image gives an
opportunity to discover that feature is present anywhere in the image.
Local connectivity refers to images represented in a matrix of pixel values. The dimension
increases depending on the size of the image. If all the neurons are connected to all previous
neurons as in a fully connected layer, the number of parameters increases manifold.
To resolve this, we connect each neuron to only a patch of input data. This spatial extent
ext (also
known as the receptive field of the neuron
neuron) determines the size of the filter.
Suppose we have an input image is of size 128*128*3. If the filter size is 5*5*3 then each
neuron in the convolution layer will have a total of 5*5*3 = 75 weights (and +1 bias parameter).
Spatial arrangement governs the size of the neurons in the output volume and how they are
arranged.
The depth—The depth of the output volume is equal to the number of filters we use to
look for different features in the image. The output volume has stacked activation/feature
maps along with the depth, making it equal to the number of filters used.
Stride - Stride refers
ers to the number of pixels we slide while matching the filter with the
input image patch. If the stride is one, we move the filters one pixel at a time. Higher the
stride, smaller output volumes will be produced spatially.
Zero-padding—It allows us to control the spatial size of the output volume by padding
zeros around the border of the input data.
Parameter Sharing means that the same weight matrix acts on all the neurons in a particular
feature map—the same filter is applied in different regions of the image. Natural images have
statistical properties, one being invariant to translation.
For example, an image of a cat remains an image of a cat even if it is translated one pixel to the
right—CNNs take this property into account by sharing parameters across multiple image
locations. Thus, we can find a cat with the same feature matrix whether the cat appears at column
i or column i+1 in the image.
ReLU Layer
In this layer, the ReLU activation function is used, and every negative value in the output volume
from the convolution layer is replaced with zero. This is done to prevent the values from
summing up to zero.
Pooling Layer
Pooling layers are added in between two convolution layers with the sole purpose of reducing the
spatial size of the image representation.
window size
stride
From each window, we take either the maximum value or the average of the values in the
window depending upon the type of pooling being performed.
The Pooling Layer operates independently on every depth slice of the input and resizes it
spatially ,and later stacks them together.
Types of Pooling
Max Pooling selects the maximum element from each of the windows of the feature map. Thus,
after the max-pooling
pooling layer, the output would be a fe
feature
ature map containing the most dominant
features of the previous feature map.
Average Pooling computes the average of the elements present in the region of the feature map
covered by the filter. It simply averages the features from the feature map.
Normalization Layer
Normalization layers, as the name suggests, normalize the output of the previous layers. It is
added in between the convolution and pooling layers, allowing every layer of the network to
learn more independently and avoid overfitting the model.
However, normalization layers are not used in advanced architectures because they do not
contribute much towards effective training.
Fully-Connected Layer
The Convolutional Layer, along with the Pooling Layer, forms a block in the Convolutional
Neural Network. The number of such layers may be increased for capturing finer details
depending upon the complexity of the task at the cost of more computational power.
Having been able to furnish important feature extraction, we are going to flatten the final
fi feature
representation and feed it to a regular fully
fully-connected
connected neural network for image classification
purposes
Features refer to minute details in the image data like edges, borders, shapes, textures, objects,
circles, etc.
At a higher level, convolutional layers detect these patterns in the image data with the help of
filters. The higher-level
level details are taken care of by the first few convolutional layers.
la
The deeper the network goes, the more sophisticated the pattern searching becomes.
For example, in later layers rather than edges and simple shapes, filters may detect specific
objects like eyes or ears, and eventually a cat, a dog, and whatnot.
Thee first hidden layer in the network dealing with images is usually a convolutional layer.
When adding a convolutional layer to a network, we need to specify the number of filters we
want the layer to have.
The output of the convolutional layer is usually passed through the ReLU activation function to
bring non-linearity
linearity to the model. It takes the feature map and replaces all the negative values
with zero.
But—
We haven't addressed the issue of too much computation that was a setback of using feedforward
neural networks, did we?
The pooling layer is added in succession to the convolutional layer to reduce the dimensions.
We take a window of say 2x2 and select either the maximum pixel value or the average of all
pixels in the window and continue sliding the window. So, we take the feature map, perform a
pooling operation, and generate a new feature map reduced in size.
Pooling is a very important step in the ConvNet as reduces the computation and makes the model
tolerant towards distortions and variations.
The convolutional layer was responsible for the feature extraction. But
But—
A fully connected
nnected dense neural network would use a flattened feature matrix and predict
according to the use case.
-----*****-_____
VGGNet came with a solution to improve performance rather than keep adding
more dense layers in the model.
The key innovation came down to grouping layers into blocks that were
repetitively used in the architecture because more layers of narrow convolutions
were deemed more powerful than a smaller number of wider convolutions.