Res Net

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 29

Deep Residual

Learning for Image


Recognition
Kaiming He
Xiangyu Zhang
Shaoqing Ren
Jian Sun
Microsoft Research
Challenge Winner(1)
Object Categories 80
MS COCO -Detection More than 300,000 images
Challenge Winner(2)
MS COCO -Segmentation
Challenge Winner(3)
ImageNet - Classification,
Localization & Detection
Advantages of Depth
Very Deep Networks
• Two Similar approaches.
• ResNet.
• Highway Nets (R. K. Srivastava, K. Greff, and J.
Schmidhuber. Highway networks.
• arXiv:1505.00387, 2015.)
Degradation Problem
Possible Causes ?
• Vanishing/Exploding Gradients.
• Overfitting
Vanishing/Exploding
Gradients
Vanishing Gradients

1
¿ 𝑤 𝑘∨¿ 1 ¿ 𝜎 ′ 𝑤𝑘∨¿
4

Gradients in the first layer become very small

/http://neuralnetworksanddeeplearning.com
Exploding Gradients

¿ 𝑤 𝑘∨¿ 100 ¿ 𝜎 ′ 𝑤𝑘∨≈ 25

Gradients in the first layer become very large

/http://neuralnetworksanddeeplearning.com
Batch Normalization(1)
• Addresses the problem of vanishing/exploding gradients.
• Increases learning speed and solves many other problems.
• Each activation in every iteration each layer is normalized to
have zero mean and variance 1 over a minibatch.
• Integrated into back –propagation algorithm.
Batch Normalization(2)

S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep


network training by reducing internal covariate shift. In ICML, 2015
BN solves vanishing
Gradients(1)

𝑧 𝑖 =𝜎 (𝐵𝑁 ( 𝑊 𝑧 𝑖−1 +𝑏𝑖 ) )

S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep


network training by reducing internal covariate shift. In ICML, 2015
Residual Learning

• 2 weight layers fit F(X) = H(x) – X instead of F(X).


• Networks might have “difficulties” fitting the unity mapping.
Similar Architecture –
Highway Net

𝐻(𝑥 ,𝑊 𝐻 )
C

R. K. Srivastava, K. Greff, and J.


.Schmidhuber. Highway networks
.arXiv:1505.00387, 2015
y
Highway Net vs. ResNet
• The gates C and T are data dependent and have parameters.
• When a gated shortcut is “closed” the layers in highway
networks represent non-residual functions.
• High-2 way networks have not demonstrated accuracy gains
with depth of over 100 layers.

R. K. Srivastava, K. Greff, and J.


.Schmidhuber. Highway networks
.arXiv:1505.00387, 2015
ResNet Design Rules
• 3 X3 Filters.
• Same number of filters for same feature map size.

Feature Map Number of Filters


Size halved doubled

• Almost no hidden max-pooling.


• Not Hidden FC-Layers.
• No Dropout.
ResNet Buliding Blocks
Bottleneck
Building
Block

• Building Blocks are stacked to create full networks.


ResNet and Plain Nets
ResNets Vs. PlainNets(1)

• Deeper ResNets has lower train and test errors!


ResNets Vs. PlainNets(2)

• Deeper ResNets has lower train and test errors!


ResNets Vs. PlainNets(3)

• For 34 layers ResNet has a lower error.


• For 18 layers error is roughly the same but convergence is faster.
Layer Responses
Magnitude

• Responses of 3X3 layers before non-linearity and after BN.


• Residual functions are closer to zero than non-residual.
Dimensionality Change
• Shortcut connections assume dimension equality between
input X and output F(x).
• If dimensions do not match :

Zero Pad X

Projection
mapping
Extra(
)Parameters
Exploring Different
Shortcuts Types
• Three Options:
• A - Zero padding for increasing dimensions.
• B – Projection shortcuts for increasing dimensions; others are
identity.
• C – All shortcuts are projections.
Overall Results
Uses For Image Detection
and localization
• Based on Faster RCNN architecture.
• ResNet-101 architecture is used.
• Obtained best results on MS-COCO, imageNet localization
and imageNet Detection datasets.
Conclusions
• Degradation problem is addressed for very deep NN.
• No additional parameter complexity.
• Faster convergence.
• Good for different types of tasks.
• Can be easily trained with existing solvers (Caffe,
MatConvNet, etc…).
• Sepp Hochreiter, presumably described the phenomena in 1991.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy