0% found this document useful (0 votes)

52 views43 pages

Cnnbasics 171028092801

Uploaded by

jayasanthi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views43 pages

Cnnbasics 171028092801

Uploaded by

jayasanthi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

Convolutional Neural

Networks
Anantharaman Palacode Narayana Iyer
narayana dot Anantharaman at gmail dot com
5 Aug 2017
References
“A dramatic moment in the meteoric rise of
deep learning came when a convolutional
network won this challenge for the first time
and by a wide margin, bringing down the
state-of-the-art top-5 error rate from 26.1% to
15.3% (Krizhevsky et al., 2012), meaning that
the convolutional network produces a ranked list
of possible categories for each image and the
correct category appeared in the first five entries
of this list for all but 15.3% of the test examples.
Since then, these competitions are consistently
won by deep convolutional nets, and as of this
writing, advances in deep learning have brought
the latest top-5 error rate in this contest down to
3.6%” – Ref: Deep Learning Book by Y Bengio
et al
What is a convolutional neural network?

Convolutional networks are simply

neural networks that use
convolution in place of general
matrix multiplication in at least
one of their layers.

• Convolution is a mathematical
operation having a linear form
Types of inputs
• Inputs have a structure
• Color images are three dimensional and so have a volume
• Time domain speech signals are 1-d while the frequency domain representations (e.g. MFCC
vectors) take a 2d form. They can also be looked at as a time sequence.
• Medical images (such as CT/MR/etc) are multidimensional
• Videos have the additional temporal dimension compared to stationary images
• Speech signals can be modelled as 2 dimensional
• Variable length sequences and time series data are again multidimensional

• Hence it makes sense to model them as tensors instead of vectors.

• The classifier then needs to accept a tensor as input and perform the necessary
machine learning task. In the case of an image, this tensor represents a volume.
CNNs are everywhere
• Image retrieval
• Detection
• Self driving cars
• Semantic segmentation
• Face recognition (FB tagging)
• Pose estimation
• Detect diseases
• Speech Recognition
• Text processing
• Analysing satellite data

Copyright 2016 JNResearch, All Rights Reserved

CNNs for applications that involve images
• Why CNNs are more suitable to process images?

• Pixels in an image correlate to each other. However, nearby pixels correlate

stronger and distant pixels don’t influence much
• Local features are important: Local Receptive Fields

• Affine transformations: The class of an image doesn’t change with translation. We

can build a feature detector that can look for a particular feature (e.g. an edge)
anywhere in the image plane by moving across. A convolutional layer may have
several such filters constituting the depth dimension of the layer.
Fully connected layers
• Fully connected layers (such as the hidden layers of a traditional neural network)
are agnostic to the structure of the input
• They take inputs as vectors and generate an output vector
• There is no requirement to share parameters unless forced upon in specific architectures.
This blows up the number of parameters as the input and/or output dimensions increase.
• Suppose we are to perform classification on an image of 100x100x3 dimensions.
• If we implement using a feed forward neural network that has an input, hidden
and an output layer, where: hidden units (nh) = 1000, output classes = 10 :
• Input layer = 10k pixels * 3 = 30k, weight matrix for hidden to input layer = 1k * 30k = 30 M
and output layer matrix size = 10 * 1000 = 10k
• We may handle this is by extracting the features using pre processing and
presenting a lower dimensional input to the Neural Network. But this requires
expert engineered features and hence domain knowledge
Convolution
𝐶𝑜𝑛𝑣𝑜𝑙𝑢𝑡𝑖𝑜𝑛 𝑖𝑛 1 𝐷𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛:

𝑘=∞

𝑦𝑛 = 𝑥 𝑘 ℎ[𝑛 − 𝑘]
𝑘=−∞

𝐶𝑜𝑛𝑣𝑜𝑙𝑢𝑡𝑖𝑜𝑛 𝑖𝑛 2 𝐷𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛𝑠:

𝑘1=−∞ 𝑘2=∞

𝑦 𝑛1 , 𝑛2 = 𝑥 𝑘1 , 𝑘2 ℎ[ 𝑛1 − 𝑘1 , 𝑛2 − 𝑘2 ]
𝑘1=−∞ 𝑘2=−∞
CNNs
Types of layers in a CNN:

• Convolution Layer

• Pooling Layer

• Fully Connected Layer

Convolution Layer
• A layer in a regular neural
network take vector as input
and output a vector.

• A convolution layer takes a

tensor (3d volume for RGB
images) as input and
generates a tensor as output
Fig Credit: Lex Fridman, MIT, 6.S094
Slide Credit: Lex Fridman, MIT, 6.S094
Local Receptive Fields

• Filter (Kernel) is applied on the input image

like a moving window along width and height

• The depth of a filter matches that of the input.

• For each position of the filter, the dot product

of filter and the input are computed
(Activation)

• The 2d arrangement of these activations is

called an activation map.

• The number of such filters constitute the

depth of the convolution layer

Fig Credit: Lex Fridman, MIT, 6.S094

Convolution Operation between filter and image
• The convolution layer
computes dot products
between the filter and a
piece of image as it slides
along the image

• The step size of slide is

called stride

• Without any padding, the

convolution process
decreases the spatial
dimensions of the output
Fig Credit: A Karpathy, CS231n
Activation Maps
• Example:
• Consider an image 32 x 32 x 3 and a 5 x 5 x 3 filter.
• The convolution happens between a 5 x 5 x 3 chunk of the image with the filter: 𝑤 𝑇 𝑥 + 𝑏
• In this example we get 75 dimensional vector and a bias term
• In this example, with a stride of 1, we get 28 x 28 x 1 activation for 1 filter without padding
• If we have 6 filters, we would get 28 x 28 x 6 without padding

• In the above example we have an activation map of 28 x 28 per filter.

• Activation maps are feature inputs to the subsequent layer of the network

• Without any padding, the 2D surface area of the activation map is smaller than
the input surface area for a stride of >= 1
Copyright 2016 JNResearch, All Rights Reserved
Stacking Convolution Layers

Fig Credit: A Karpathy, CS231n

Feature Representation as a hierarchy
Padding
• The spatial (x, y) extent of the output produced by the convolutional layer is less
than the respective dimensions of the input (except for the special case of 1 x 1
filter with a stride 1).

• As we add more layers and use larger strides, the output surface dimensions keep
reducing and this may impact the accuracy.

• Often, we may want to preserve the spatial extent during the initial layers and
downsample them at a later time.

• Padding the input with suitable values (padding with zero is common) helps to
preserve the spatial size
Zero Padding the border

Fig Credit: A Karpathy, CS231n

Hyperparameters of the convolution layer

• Filter Size

• # Filters

• Stride

• Padding
Fig Credit: A Karpathy, CS231n
Pooling Layer
• Pooling is a downsampling
operation

• The rationale is that the “meaning”

embedded in a piece of image can
be captured using a small subset of
“important” pixels

• Max pooling and average pooling

are the two most common
operations

• Pooling layer doesn’t have any

trainable parameters
Fig Credit: A Karpathy, CS231n
Max Pooling Illustration
Popular Network Architectures
Current trend: Deeper Models
• CNNs consistently outperform other
approaches for the core tasks of CV
• Deeper models work better
• Increasing the number of parameters in layers
of CNN without increasing their depth is not
effective at increasing test set performance.
• Shallow models overfit at around 20 million
parameters while deep ones can benefit from
having over 60 million.
• Key insight: Model performs better when it is
architected to reflect composition of simpler
functions than a single complex function. This
may also be explained off viewing the
computation as a chain of dependencies
VGG Net
VGG net
ResNet
Core Tasks of Computer Vision
Core CV Task Task Description Output Metrics
Classification Given an image, assign a label Class Label Accuracy
Localization Determine the bounding box containing Box given by (x1, y1, Ratio of intersection to
the object in the given image x2, y2) the union (Overlap)
between the ground truth
and bounding box
Object Given an image, detect all the objects and For each object: Mean Avg Best Overlap
Detection their locations in the image (Label, Box) (MABO,) mean Average
Precision (mAP)
Semantic Given an image, assign each pixel to a A set of image Classification metrics,
Segmentation class label, so that we can look at the segments Intersection by Union
image as a set of labelled segments overlap
Instance Same as semantic segmentation, but each A set of image
Segmentation instance of a segment class is determined segments
uniquely
Object Localization

• Given an image containing an object

of interest, determine the bounding
box for the object

• Classify the object

Slide Credit: A Karpathy, CS231n
Slide Credit: A Karpathy, CS231n
Datasets for evaluation

• Imagenet challenges provide a platform for

researchers to benchmark their novel
algorithms

• PASCAL VOC 2010 is great for small scale

experiments. About 1.3 GB download size.

• MS COCO datasets are available for tasks

like Image Captioning. Download size is
huge but selective download is possible.

Convolutional Neural Network
No ratings yet
Convolutional Neural Network
95 pages
Cnn
No ratings yet
Cnn
98 pages
Cnn
No ratings yet
Cnn
123 pages
unit 5th ig ann
No ratings yet
unit 5th ig ann
112 pages
465-Lecture 5-6
No ratings yet
465-Lecture 5-6
40 pages
Cours CNN eng (1)
No ratings yet
Cours CNN eng (1)
60 pages
DL_MOD3
No ratings yet
DL_MOD3
102 pages
Unit 3 CNN
No ratings yet
Unit 3 CNN
47 pages
PNAL9_CNNs
No ratings yet
PNAL9_CNNs
61 pages
DL-UNIT-4&5
No ratings yet
DL-UNIT-4&5
30 pages
Lecture 3 Updated
No ratings yet
Lecture 3 Updated
56 pages
unit-3-CNN-2024
No ratings yet
unit-3-CNN-2024
58 pages
Unit 3 - Machine Learning
No ratings yet
Unit 3 - Machine Learning
27 pages
neural-networks-unit-3 edited
No ratings yet
neural-networks-unit-3 edited
94 pages
Lecture - 07 (Convolutional Neural Networks)
No ratings yet
Lecture - 07 (Convolutional Neural Networks)
57 pages
Unit 3
No ratings yet
Unit 3
80 pages
cnn-1
No ratings yet
cnn-1
19 pages
[Fall 2024] Images and Convolutions
No ratings yet
[Fall 2024] Images and Convolutions
69 pages
586_114_216_Convolutional_Neural_Networks
No ratings yet
586_114_216_Convolutional_Neural_Networks
48 pages
AE556_2024_Topic4_CNN
No ratings yet
AE556_2024_Topic4_CNN
26 pages
What Is Convolutional Neural Network
No ratings yet
What Is Convolutional Neural Network
16 pages
Convolutional Neural Networks _ deeplearning-notes
No ratings yet
Convolutional Neural Networks _ deeplearning-notes
43 pages
DL UNIT III
No ratings yet
DL UNIT III
13 pages
Variants of Cnn(page no 17-23), structured output(29-31),datatypes
No ratings yet
Variants of Cnn(page no 17-23), structured output(29-31),datatypes
31 pages
CNN Midterm
No ratings yet
CNN Midterm
103 pages
Module 3 Notes
No ratings yet
Module 3 Notes
22 pages
5-Convolutional Neural Network
No ratings yet
5-Convolutional Neural Network
43 pages
Convolution Neural Network (CNN) Unit 2: Dr. Kavita R Singh
No ratings yet
Convolution Neural Network (CNN) Unit 2: Dr. Kavita R Singh
65 pages
Chapter14 CNN
No ratings yet
Chapter14 CNN
54 pages
Unit3 2023 NNDL
No ratings yet
Unit3 2023 NNDL
69 pages
Convolutional Neural Networks - Annotated
No ratings yet
Convolutional Neural Networks - Annotated
83 pages
Convolutional Networks1
No ratings yet
Convolutional Networks1
52 pages
CNN notes unit-3
No ratings yet
CNN notes unit-3
12 pages
Convolutional Neural Networks - Part 1
No ratings yet
Convolutional Neural Networks - Part 1
44 pages
What is a Convolutional Neural Network-unit3.docx
No ratings yet
What is a Convolutional Neural Network-unit3.docx
12 pages
Unit 3
No ratings yet
Unit 3
105 pages
Intro to CNN
No ratings yet
Intro to CNN
93 pages
21CS743_DL_Module4_notes
No ratings yet
21CS743_DL_Module4_notes
7 pages
FODL Unit-4
No ratings yet
FODL Unit-4
46 pages
CVlecture 5
No ratings yet
CVlecture 5
56 pages
Convolutional Neural Network (CNN)
No ratings yet
Convolutional Neural Network (CNN)
38 pages
DL Unit4
No ratings yet
DL Unit4
31 pages
CS601 Machine Learning Unit 3
No ratings yet
CS601 Machine Learning Unit 3
47 pages
Unit 3 - Machine Learning
No ratings yet
Unit 3 - Machine Learning
29 pages
Convolutional_Networks_2024
No ratings yet
Convolutional_Networks_2024
44 pages
Assignment #1: Afzal Ali (11282) Muhammad Hammad (11293) Muhammad Bilal (11291) Mehran Ahmed (11287) Date 20/03/2019
No ratings yet
Assignment #1: Afzal Ali (11282) Muhammad Hammad (11293) Muhammad Bilal (11291) Mehran Ahmed (11287) Date 20/03/2019
7 pages
CC511 Week 7 - Deep - Learning
No ratings yet
CC511 Week 7 - Deep - Learning
33 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
35 pages
The Math Behind Convolutional Neural Networks - Towards Data Science
No ratings yet
The Math Behind Convolutional Neural Networks - Towards Data Science
37 pages
Understanding of Convolutional Neural Network (CNN) - Deep Learning
No ratings yet
Understanding of Convolutional Neural Network (CNN) - Deep Learning
7 pages
CNN Iitkgp
No ratings yet
CNN Iitkgp
112 pages
CNN and Autoencoder
No ratings yet
CNN and Autoencoder
56 pages
Unit-4
No ratings yet
Unit-4
19 pages
Unit - 2
No ratings yet
Unit - 2
51 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
61 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
11 pages
21CS743_Module4_notes
No ratings yet
21CS743_Module4_notes
15 pages
Interventional Pain Medicine - 1st Edition All Format Download
100% (12)
Interventional Pain Medicine - 1st Edition All Format Download
16 pages
NN 06
No ratings yet
NN 06
18 pages
CIRCHEARTFAILURE.115.002543
No ratings yet
CIRCHEARTFAILURE.115.002543
9 pages
Sarkar2011 Algo
No ratings yet
Sarkar2011 Algo
8 pages
Street Talk 3
100% (1)
Street Talk 3
334 pages
2013_IOPScience
No ratings yet
2013_IOPScience
6 pages
Corbell in i 2014
No ratings yet
Corbell in i 2014
5 pages
ypenburg2007
No ratings yet
ypenburg2007
4 pages
3. Orientation and Field Astronomy
100% (1)
3. Orientation and Field Astronomy
26 pages
Introduction To Convolution Neural Network
No ratings yet
Introduction To Convolution Neural Network
6 pages
Period of Nursing in History of Nursing New
100% (3)
Period of Nursing in History of Nursing New
258 pages
EDU535 - P2 REVIEWER
No ratings yet
EDU535 - P2 REVIEWER
10 pages
Tle9 Appetizer Q1 Examination
No ratings yet
Tle9 Appetizer Q1 Examination
4 pages
2011 XRAY Catalog
No ratings yet
2011 XRAY Catalog
58 pages
Man Ffmpeg
No ratings yet
Man Ffmpeg
39 pages
Vlsi FDTP PSG Itech
No ratings yet
Vlsi FDTP PSG Itech
2 pages
JD CL - CJ4 - 2023 - 1.33.4
No ratings yet
JD CL - CJ4 - 2023 - 1.33.4
3 pages
Energy: Lower Secondary 7 Science Checkpoint
No ratings yet
Energy: Lower Secondary 7 Science Checkpoint
26 pages
Peanut Processing Plant
No ratings yet
Peanut Processing Plant
19 pages
SSP 297 Touareg
100% (1)
SSP 297 Touareg
56 pages
Chart - Stitch Fiddle
No ratings yet
Chart - Stitch Fiddle
4 pages
Life Cycle, Food and Environment of Bombyx Mori
100% (2)
Life Cycle, Food and Environment of Bombyx Mori
9 pages
Beauty Industry
No ratings yet
Beauty Industry
20 pages
Adobe Scan Apr 14, 2025 (3)
No ratings yet
Adobe Scan Apr 14, 2025 (3)
1 page
Experimental Analysis of Greenhouse Dryer in No-Load Conditions
No ratings yet
Experimental Analysis of Greenhouse Dryer in No-Load Conditions
8 pages
Dessler Hrm13 Inppt01
No ratings yet
Dessler Hrm13 Inppt01
33 pages
Complete Listing (Sorted by Title)
No ratings yet
Complete Listing (Sorted by Title)
86 pages
Programme Schedule of Mini-Symposium On 3rd September
No ratings yet
Programme Schedule of Mini-Symposium On 3rd September
2 pages
Ricardo Jaimez F20 Work Study Biweekly Timesheet
No ratings yet
Ricardo Jaimez F20 Work Study Biweekly Timesheet
1 page
Solo Leving Ragnarok Cap 61
No ratings yet
Solo Leving Ragnarok Cap 61
1 page
ITIL®4 Reference Guide: General Management Practices (14) Service Management Practices
No ratings yet
ITIL®4 Reference Guide: General Management Practices (14) Service Management Practices
1 page
Inspection Report Certificate No. 069/INS-FISH/VI/2012
No ratings yet
Inspection Report Certificate No. 069/INS-FISH/VI/2012
5 pages
CityPleat 150200300400500
No ratings yet
CityPleat 150200300400500
3 pages
Term 1 Timetable
No ratings yet
Term 1 Timetable
2 pages
Resume Chen
No ratings yet
Resume Chen
1 page
Penny Battery
100% (1)
Penny Battery
2 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Cnnbasics 171028092801

Uploaded by

Cnnbasics 171028092801

Uploaded by

Convolutional Neural

Convolutional networks are simply

• Hence it makes sense to model them as tensors instead of vectors.

Copyright 2016 JNResearch, All Rights Reserved

• Pixels in an image correlate to each other. However, nearby pixels correlate

• Affine transformations: The class of an image doesn’t change with translation. We

• Fully Connected Layer

• A convolution layer takes a

• Filter (Kernel) is applied on the input image

• The depth of a filter matches that of the input.

• For each position of the filter, the dot product

• The 2d arrangement of these activations is

• The number of such filters constitute the

Fig Credit: Lex Fridman, MIT, 6.S094

• The step size of slide is

• Without any padding, the

• In the above example we have an activation map of 28 x 28 per filter.

Fig Credit: A Karpathy, CS231n

Fig Credit: A Karpathy, CS231n

• The rationale is that the “meaning”

• Max pooling and average pooling

• Pooling layer doesn’t have any

• Given an image containing an object

• Classify the object

• Imagenet challenges provide a platform for

• PASCAL VOC 2010 is great for small scale

• MS COCO datasets are available for tasks

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.