Cnnbasics 171028092801
Cnnbasics 171028092801
Networks
Anantharaman Palacode Narayana Iyer
narayana dot Anantharaman at gmail dot com
5 Aug 2017
References
“A dramatic moment in the meteoric rise of
deep learning came when a convolutional
network won this challenge for the first time
and by a wide margin, bringing down the
state-of-the-art top-5 error rate from 26.1% to
15.3% (Krizhevsky et al., 2012), meaning that
the convolutional network produces a ranked list
of possible categories for each image and the
correct category appeared in the first five entries
of this list for all but 15.3% of the test examples.
Since then, these competitions are consistently
won by deep convolutional nets, and as of this
writing, advances in deep learning have brought
the latest top-5 error rate in this contest down to
3.6%” – Ref: Deep Learning Book by Y Bengio
et al
What is a convolutional neural network?
• Convolution is a mathematical
operation having a linear form
Types of inputs
• Inputs have a structure
• Color images are three dimensional and so have a volume
• Time domain speech signals are 1-d while the frequency domain representations (e.g. MFCC
vectors) take a 2d form. They can also be looked at as a time sequence.
• Medical images (such as CT/MR/etc) are multidimensional
• Videos have the additional temporal dimension compared to stationary images
• Speech signals can be modelled as 2 dimensional
• Variable length sequences and time series data are again multidimensional
• The classifier then needs to accept a tensor as input and perform the necessary
machine learning task. In the case of an image, this tensor represents a volume.
CNNs are everywhere
• Image retrieval
• Detection
• Self driving cars
• Semantic segmentation
• Face recognition (FB tagging)
• Pose estimation
• Detect diseases
• Speech Recognition
• Text processing
• Analysing satellite data
𝑘=∞
𝑦𝑛 = 𝑥 𝑘 ℎ[𝑛 − 𝑘]
𝑘=−∞
𝐶𝑜𝑛𝑣𝑜𝑙𝑢𝑡𝑖𝑜𝑛 𝑖𝑛 2 𝐷𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛𝑠:
𝑘1=−∞ 𝑘2=∞
𝑦 𝑛1 , 𝑛2 = 𝑥 𝑘1 , 𝑘2 ℎ[ 𝑛1 − 𝑘1 , 𝑛2 − 𝑘2 ]
𝑘1=−∞ 𝑘2=−∞
CNNs
Types of layers in a CNN:
• Convolution Layer
• Pooling Layer
• Activation maps are feature inputs to the subsequent layer of the network
• Without any padding, the 2D surface area of the activation map is smaller than
the input surface area for a stride of >= 1
Copyright 2016 JNResearch, All Rights Reserved
Stacking Convolution Layers
• As we add more layers and use larger strides, the output surface dimensions keep
reducing and this may impact the accuracy.
• Often, we may want to preserve the spatial extent during the initial layers and
downsample them at a later time.
• Padding the input with suitable values (padding with zero is common) helps to
preserve the spatial size
Zero Padding the border
• Filter Size
• # Filters
• Stride
• Padding
Fig Credit: A Karpathy, CS231n
Pooling Layer
• Pooling is a downsampling
operation