Res Net
Res Net
Res Net
1
¿ 𝑤 𝑘∨¿ 1 ¿ 𝜎 ′ 𝑤𝑘∨¿
4
/http://neuralnetworksanddeeplearning.com
Exploding Gradients
/http://neuralnetworksanddeeplearning.com
Batch Normalization(1)
• Addresses the problem of vanishing/exploding gradients.
• Increases learning speed and solves many other problems.
• Each activation in every iteration each layer is normalized to
have zero mean and variance 1 over a minibatch.
• Integrated into back –propagation algorithm.
Batch Normalization(2)
𝐻(𝑥 ,𝑊 𝐻 )
C
Zero Pad X
Projection
mapping
Extra(
)Parameters
Exploring Different
Shortcuts Types
• Three Options:
• A - Zero padding for increasing dimensions.
• B – Projection shortcuts for increasing dimensions; others are
identity.
• C – All shortcuts are projections.
Overall Results
Uses For Image Detection
and localization
• Based on Faster RCNN architecture.
• ResNet-101 architecture is used.
• Obtained best results on MS-COCO, imageNet localization
and imageNet Detection datasets.
Conclusions
• Degradation problem is addressed for very deep NN.
• No additional parameter complexity.
• Faster convergence.
• Good for different types of tasks.
• Can be easily trained with existing solvers (Caffe,
MatConvNet, etc…).
• Sepp Hochreiter, presumably described the phenomena in 1991.