Lecture 6 - Convolution Neural Network (CNN)
Lecture 6 - Convolution Neural Network (CNN)
1
Convolutional Neural Network Lecture 6
Review of Fully Neural Network
• The CIFAR-10 dataset: The CIFAR-10 dataset consists of 60000 32x32 colour
images in 10 classes, with 6000 images per class.
• For example, an image of more respectable size, e.g. 200x200x3, would lead to
neurons that have 200*200*3 = 120,000 weights.
• This full connectivity is wasteful and the huge number of parameters would
quickly lead to overfitting.
• INPUT [32x32x3] will hold the raw pixel values of the image.
Width 32, Height 32, and 3 channels R,G,B.
• CONV layer will compute the output of neurons that are connected to local
regions in the input, each computing a dot product between their weights and a
small region they are connected to in the input volume. This may result in
volume such as [32x32x12] if we decided to use 12 filters.
• RELU layer will apply an elementwise activation function, such as the max(0,x)
thresholding at zero. This leaves the size of the volume unchanged
([32x32x12]).
• POOL layer will perform a downsampling operation along the spatial dimensions
(width, height), resulting in volume such as [16x16x12].
• FC (i.e. fully-connected) layer will compute the class scores, resulting in volume
of size [1x1x10], where each of the 10 numbers correspond to a class score,
such as among the 10 categories of CIFAR-10.
• RELU layer will apply an elementwise activation function, such as the max(0,x)
thresholding at zero. This leaves the size of the volume unchanged
([32x32x12]).
• POOL layer will perform a downsampling operation along the spatial dimensions
(width, height), resulting in volume such as [16x16x12].
• FC (i.e. fully-connected) layer will compute the class scores, resulting in volume
of size [1x1x10], where each of the 10 numbers correspond to a class score,
such as among the 10 categories of CIFAR-10.
Convolution
Stride
Consider where we described a convolution operation as
“sliding” a small matrix across a large matrix, stopping at each
coordinate, computing an element-wise multiplication and sum,
then storing the output.
Convolution
Zero-padding
• We need to “pad” the borders of an image to retain the original image size when
applying a convolution.
• Using zero-padding, we can “pad” our input along the borders such that our
output volume size matches our input volume size.
• The amount of padding we apply is controlled by the parameter P.
Convolution
Zero-padding
• The output volume is smaller (3×3)
than the input volume (5×5).
• If we instead set P = 1, we can
pad our input volume with zeros to
create a 7× 7 volume.
• The output volume size that
matches the original input volume
size of 5× 5.
Convolution
• The input volume size (W)
• The receptive field size of the Conv Layer neurons (F).
• The stride with which they are applied (S)
• The amount of zero padding used (P) on the border.
• The output volume is calculated by the formular:
(W−F+2P)/S+1
For example for a 7x7 input and a 3x3 filter with stride 1 and pad 0 we would get
a 5x5 output.
Convolution
5x5
3x3
Wise
Element
Convolution
Pooling Layer
Pooling is a down-sampling operation that reduces the dimensionality of the
feature map. The rectified feature map now goes through a pooling layer to
generate a pooled feature map.
Pooling Layer
Pooling Layer
Pooling Layer
Average pooling
Activation Layer
Fattening
Flattening is used to convert all the resultant 2-Dimensional arrays from pooled
feature maps into a single long continuous linear vector.
Fattening
Flattening is used to convert all the resultant 2-Dimensional arrays from pooled
feature maps into a single long continuous linear vector.