L3 - UUCLxDeepMind DL2020
L3 - UUCLxDeepMind DL2020
UCL x DeepMind
lecture series
In this lecture series, leading research scientists
from leading AI research lab, DeepMind, will give
12 lectures on an exciting selection of topics
in Deep Learning, ranging from the fundamentals
of training neural networks via advanced ideas around
memory, attention, and generative modelling to the
important topic of responsible innovation.
Please join us for a deep dive lecture series into
Deep Learning!
#UCLxDeepMind
General
information Exits:
At the back, the way you came in
Wifi:
UCL guest
TODAY’S SPEAKER
Sander Dieleman
Sander Dieleman is a Research Scientist at DeepMind
in London, UK, where he he has worked on the development
of AlphaGo and WaveNet. He was previously a PhD student at
Ghent University, where he conducted research on feature
learning and deep learning techniques for learning hierarchical
representations of musical audio signals. During his PhD he also
developed the deep learning library Lasagne and won solo and
team gold medals respectively in Kaggle's "Galaxy Zoo"
competition and the first National Data Science Bowl. In the
summer of 2014, he interned at Spotify in New York, where he
worked on implementing audio-based music recommendation
using deep learning on an industrial scale.
In the past decade, convolutional neural
networks have revolutionised computer
vision. In this lecture, we will take a closer
look at convolutional network architectures
through several case studies, ranging from
the early 90's to the current state of the art.
We will review some of the building blocks
that are in common use today, discuss the
TODAY’S LECTURE
challenges of training deep models, and
Neural Networks
for Image
Recognition
Convolutional
Neural Networks for
Image Recognition
Sander Dieleman
01 02 03
Background Building blocks Convolutional neural
networks
04 05 06
Going deeper: Advanced topics Beyond image
case studies recognition
1 Background
Last week:
neural networks
Linear Sigmoid Linear Softmax Cross
Linear Node Loss entropy
Data Target
How can we feed
images to a neural
network?
Linear Sigmoid Linear Softmax Cross
Linear Node Loss entropy
Data Target
Neural networks for images
fully-connected unit
From fully connected to locally connected
locally-connected units
3✕3 receptive field
From locally connected to convolutional
convolutional units
3✕3 receptive field
From locally connected to convolutional
Receptive field
Feature map
Implementation: the convolution operation
height
width
Inputs and outputs are tensors
channels
height
width
Variants of the convolution operation
Strided convolution: kernel slides along the image with a step > 1
Variants of the convolution operation
Strided convolution: kernel slides along the image with a step > 1
Variants of the convolution operation
Strided convolution: kernel slides along the image with a step > 1
Variants of the convolution operation
Strided convolution: kernel slides along the image with a step > 1
Variants of the convolution operation
Dilated convolution: kernel is spread out, step > 1 between kernel elements
Variants of the convolution operation
Dilated convolution: kernel is spread out, step > 1 between kernel elements
Variants of the convolution operation
Dilated convolution: kernel is spread out, step > 1 between kernel elements
Variants of the convolution operation
Depthwise convolution: each output channel is connected only to one input channel
Pooling
CNNs or “convnets”
Up to 100s of layers
Alternate convolutions and
pooling to create a hierarchy
Recap: neural networks as computational graphs
input loss
computation parameters
Simplified diagram: implicit parameters and loss
input
computation
Simplified diagram: implicit parameters and loss
input
computation
Computational building blocks of convnets
fully connected
input convolution
nonlinearity pooling
4 Going deeper:
Case studies
No
nl
in
ea
rity
Po
ol
in
g
C
on
vo
lu
t io
n
No
nl
in
ea
rity
Po
ol
in
g
Fu
lly
co
nn
ec
te
No d
nl
in
ea
rity
Fu
lly
co
nn
ec
te
No d
nl
in
ea
rity
AlexNet (2012)
Input image:
→ 224✕224✕3
AlexNet (2012)
Layer 1 convolution:
kernel 11✕11, 96 channels, stride 4
→ 56✕56✕96
AlexNet (2012)
ReLU
AlexNet (2012)
Max-pooling:
window 2✕2
→ 28✕28✕96
AlexNet (2012)
Softmax
22
4✕
22
4✕
So 3
ft 56
m ✕
ax 56
✕
96
AlexNet (2012)
10
00
Re
LU
Re 28
LU ✕
28
✕
96
40 28
96 ✕
28
✕
25
Re 6
LU
Re
LU
40 14
96 ✕
14
✕
25
7✕ 6
7✕ 14
25 ✕
6 14
✕
38
4
Re
LU
Re
LU
14
✕ 14
14
✕ ✕
14
25 ✕
6 38
4
Re
LU
Deeper is better Private & Confidential
Computational
complexity
Optimisation difficulties
Improving optimisation
Careful initialisation
Sophisticated optimisers
Normalisation layers
Network design
GoogLeNet (2014)
1✕
3✕3
1
1✕
1
5✕5 Want to learn more?
Szegedy, C. et al.
Going deeper with convolutions IEEE
conference on computer vision and pattern
recognition (2015)
3✕3 1✕1
Batch normalisation
n
.
.
rm
rm
io
io
ut
ut
no
no
LU
LU
l
l
vo
vo
Re
Re
h
h
tc
tc
on
on
Ba
Ba
C
C
+
residual connection
3✕3 1✕1 +
Depthwise convolutions
Separable convolutions
Inverted bottlenecks
(MobileNetV2, MNasNet,
EfficientNet)
5 Advanced
topics