0% found this document useful (0 votes)
3 views77 pages

02 Cnn Slides

The lecture covers convolutional neural networks (CNNs) with a focus on image classification tasks, specifically using the MNIST dataset. Key topics include the limitations of multilayer perceptrons (MLPs) in handling spatial structures, the introduction of convolutional layers to address these issues, and the mechanics of 1D and 2D convolutions. The session also discusses parameter sharing and local connectivity as methods to reduce the number of parameters in the network.

Uploaded by

mrolaw01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views77 pages

02 Cnn Slides

The lecture covers convolutional neural networks (CNNs) with a focus on image classification tasks, specifically using the MNIST dataset. Key topics include the limitations of multilayer perceptrons (MLPs) in handling spatial structures, the introduction of convolutional layers to address these issues, and the mechanics of 1D and 2D convolutions. The session also discusses parameter sharing and local connectivity as methods to reduce the number of parameters in the network.

Uploaded by

mrolaw01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 77

CS-E4890 Deep Learning

Lecture #2 Convolutional neural networks

14.1.2025

Jorma Laaksonen ––– Juho Kannala ––– Alexander Ilin


Today’s topics

1. Image classification task


2. 1D and 2D convolutions
3. Convolutional layer
4. Pooling layer
5. Convolutional networks
6. Modern convolutional neural networks
7. Applications of convolutional networks
8. Home assignment

1
Image classification task

• Example classification problem: Classify images of handwritten digits from the MNIST dataset.
• Inputs x(n) : Images 28 × 28 pixels in scalar greyscale values.
• Targets y(n) : One of the 10 classes in one-hot coding.

2
Spatial structure matters

• If we change the order of the pixels (in the same way for all images), the classification task
becomes much harder for humans.
• This suggests that our model can and should benefit from using the spatial information.

3
Image classification with a multilayer perceptron

• We can solve the classification task using


a multilayer perceptron model (MLP)
that we considered in the first lecture.
• We can flatten the images (for example,
stack the columns of the images into one
vector) and feed to the MLP model.

4
Problem 1: MLP ignores the spatial structure

• If we shuffle the pixels, we simply feed the


pixels into different inputs of the MLP.
• This means that the MLP ignores the
spatial structure and essentially solves a
more difficult problem.

5
Problem 1: MLP ignores the spatial structure

• Small translations of the input image (for


example, shifting the image one pixel to the
left/right/top/bottom) result in significant
changes of the MLP inputs, therefore the
outputs of the MLP will change in an
unpredictable way.
• The MLP has to learn to be invariant to such
transformations, which may require a
considerable amount of training.

6
Problem 2: Number of parameters

10 outputs
• Let us use an MLP with the following structure to solve
f = softmax(W3 h2 + b3 )
the MNIST classification task.
• Let us count the number of parameters in the network 144 units
(ignoring the bias terms b): h2 = relu(W2 h1 + b2 )

28 × 28 × 225 + 225 × 144 + 144 × 10 = 210240 225 units

h1 = relu(W1 x + b1 )
• If we want to process images that contain millions of
pixels, the number of parameters would be several orders 784 pixels
of magnitude larger. input x

7
Motivation for a layer of a new type

• We want to design an alternative to the fully-connected layer that would address these problems:
• Take into account the order of the inputs
• Change the outputs in a predictable way for simple transformations such as translation
• Reduce the number of parameters in the network

8
Convolutional layer
Fully-connected layer as a starting point

• Let us consider an input with one-dimensional structure. For example, we want to process time
series and the order of the inputs is determined by the time of the measurements.
• Let us start with a fully-connected layer that has 5 inputs and 5 outputs:

x1 x2 x3 x4 x5

• The layer has 5 × 5 = 25 parameters (ignoring the bias terms).

10
Local connectivity

• We can reduce the number of parameters by using only “local” connections.


• Now the outputs also have an order because each output corresponds to a particular location in
the inputs.

• The layer has now 13 parameters.

11
Parameter sharing

• We can further reduce the number of parameters by using weight sharing (arrows with the same
color red/black/blue represent shared weights).
• Now the layer has only 3 parameters.

• Why parameter sharing is useful: patterns that appear in different parts of the input sequence will
activate neurons in a similar way in the corresponding location of the output layer.
=⇒ Position/translation/shift equivariance in the input-output mapping.

12
1D convolutional layer

• The computations performed in such a layer:


X
yi = w∆i xi+∆i + b
∆i=−1,0,1

• The layer is called a (one-dimensional) convolutional layer because the computations are closely
related to (one-dimensional) discrete convolution familiar from signal processing:
X
(w ∗ x )[t] = w [a]x [t − a]
a

13
1D convolutional layer

• Inputs and outputs of such a layer usually contain multiple elements (usually called channels):
XX
yi,o = w∆i,o,c xi+∆i,c + bo
∆i c

• Weights w∆i,o,c are usually called kernel.


• There are two ways to process inputs at the borders:

0 0

no padding (different output size) padding (usually with zeros)

14
Inputs with 2D structure

1
W , 1b

• Same ideas can be used for inputs with 2D


spatial structure like images.
• Local connectivity: output is affected by
inputs in its neighborhood.
• Shared parameters: same colors in
connections represent shared weights.

15
2D convolutional layer: Forward computations

• Simplified example: inputs with one


channel (black-and-white images) x

outputs with one channel. 0 0 0 0 0 w y


0 1 1 1 0 dot 1 1 store 0
• Slide the filter across the entire input value
0 1 0 1 0 1 0
and compute dot products between 0 1 1 1 0
input entries and filter weights. 0 0 0 0 0
• Computations can be parallelized.

16
2D convolutional layer: Forward computations

• Simplified example: inputs with one


channel (black-and-white images) x

outputs with one channel. 0 0 0 0 0 w y


0 1 1 1 0 dot 1 1 store 0 1
• Slide the filter across the entire input value
0 1 0 1 0 1 0
and compute dot products between 0 1 1 1 0
input entries and filter weights. 0 0 0 0 0
• Computations can be parallelized.

16
2D convolutional layer: Forward computations

• Simplified example: inputs with one


channel (black-and-white images) x

outputs with one channel. 0 0 0 0 0 w y


0 1 1 1 0 dot 1 1 store 0 1 1
• Slide the filter across the entire input value
0 1 0 1 0 1 0
and compute dot products between 0 1 1 1 0
input entries and filter weights. 0 0 0 0 0
• Computations can be parallelized.

16
2D convolutional layer: Forward computations

• Simplified example: inputs with one


channel (black-and-white images) x

outputs with one channel. 0 0 0 0 0 w y


0 1 1 1 0 dot 1 1 store 0 1 1 1
• Slide the filter across the entire input value
0 1 0 1 0 1 0
and compute dot products between 0 1 1 1 0
input entries and filter weights. 0 0 0 0 0
• Computations can be parallelized.

16
2D convolutional layer: Forward computations

• Simplified example: inputs with one


channel (black-and-white images) x

outputs with one channel. 0 0 0 0 0 w y


0 1 1 1 0 dot 1 1 store 0 1 1 1
• Slide the filter across the entire input value
0 1 0 1 0 1 0 1
and compute dot products between 0 1 1 1 0
input entries and filter weights. 0 0 0 0 0
• Computations can be parallelized.

16
2D convolutional layer: Forward computations

• Simplified example: inputs with one


channel (black-and-white images) x

outputs with one channel. 0 0 0 0 0 w y


0 1 1 1 0 dot 1 1 store 0 1 1 1
• Slide the filter across the entire input value
0 1 0 1 0 1 0 1 3
and compute dot products between 0 1 1 1 0
input entries and filter weights. 0 0 0 0 0
• Computations can be parallelized.

16
2D convolutional layer: Forward computations

• Simplified example: inputs with one


channel (black-and-white images) x

outputs with one channel. 0 0 0 0 0 w y


0 1 1 1 0 dot 1 1 store 0 1 1 1
• Slide the filter across the entire input value
0 1 0 1 0 1 0 1 3 2
and compute dot products between 0 1 1 1 0
input entries and filter weights. 0 0 0 0 0
• Computations can be parallelized.

16
2D convolutional layer: Forward computations

• Simplified example: inputs with one


channel (black-and-white images) x

outputs with one channel. 0 0 0 0 0 w y


0 1 1 1 0 dot 1 1 store 0 1 1 1
• Slide the filter across the entire input value
0 1 0 1 0 1 0 1 3 2 2
and compute dot products between 0 1 1 1 0
input entries and filter weights. 0 0 0 0 0
• Computations can be parallelized.

16
2D convolutional layer: Forward computations

• Simplified example: inputs with one


channel (black-and-white images) x

outputs with one channel. 0 0 0 0 0 w y


0 1 1 1 0 dot 1 1 store 0 1 1 1
• Slide the filter across the entire input value
0 1 0 1 0 1 0 1 3 2 2
and compute dot products between 0 1 1 1 0 1
input entries and filter weights. 0 0 0 0 0
• Computations can be parallelized.

16
2D convolutional layer: Forward computations

• Simplified example: inputs with one


channel (black-and-white images) x

outputs with one channel. 0 0 0 0 0 w y


0 1 1 1 0 dot 1 1 store 0 1 1 1
• Slide the filter across the entire input value
0 1 0 1 0 1 0 1 3 2 2
and compute dot products between 0 1 1 1 0 1 2
input entries and filter weights. 0 0 0 0 0
• Computations can be parallelized.

16
2D convolutional layer: Forward computations

• Simplified example: inputs with one


channel (black-and-white images) x

outputs with one channel. 0 0 0 0 0 w y


0 1 1 1 0 dot 1 1 store 0 1 1 1
• Slide the filter across the entire input value
0 1 0 1 0 1 0 1 3 2 2
and compute dot products between 0 1 1 1 0 1 2 2
input entries and filter weights. 0 0 0 0 0
• Computations can be parallelized.

16
2D convolutional layer: Forward computations

• Simplified example: inputs with one


channel (black-and-white images) x

outputs with one channel. 0 0 0 0 0 w y


0 1 1 1 0 dot 1 1 store 0 1 1 1
• Slide the filter across the entire input value
0 1 0 1 0 1 0 1 3 2 2
and compute dot products between 0 1 1 1 0 1 2 2 2
input entries and filter weights. 0 0 0 0 0
• Computations can be parallelized.

16
2D convolutional layer: Forward computations

• Simplified example: inputs with one


channel (black-and-white images) x

outputs with one channel. 0 0 0 0 0 w y


0 1 1 1 0 dot 1 1 store 0 1 1 1
• Slide the filter across the entire input value
0 1 0 1 0 1 0 1 3 2 2
and compute dot products between 0 1 1 1 0 1 2 2 2
input entries and filter weights. 0 0 0 0 0 1
• Computations can be parallelized.

16
2D convolutional layer: Forward computations

• Simplified example: inputs with one


channel (black-and-white images) x

outputs with one channel. 0 0 0 0 0 w y


0 1 1 1 0 dot 1 1 store 0 1 1 1
• Slide the filter across the entire input value
0 1 0 1 0 1 0 1 3 2 2
and compute dot products between 0 1 1 1 0 1 2 2 2
input entries and filter weights. 0 0 0 0 0 1 2
• Computations can be parallelized.

16
2D convolutional layer: Forward computations

• Simplified example: inputs with one


channel (black-and-white images) x

outputs with one channel. 0 0 0 0 0 w y


0 1 1 1 0 dot 1 1 store 0 1 1 1
• Slide the filter across the entire input value
0 1 0 1 0 1 0 1 3 2 2
and compute dot products between 0 1 1 1 0 1 2 2 2
input entries and filter weights. 0 0 0 0 0 1 2 2
• Computations can be parallelized.

16
2D convolutional layer: Forward computations

• Simplified example: inputs with one


channel (black-and-white images) x

outputs with one channel. 0 0 0 0 0 w y


0 1 1 1 0 dot 1 1 store 0 1 1 1
• Slide the filter across the entire input value
0 1 0 1 0 1 0 1 3 2 2
and compute dot products between 0 1 1 1 0 1 2 2 2
input entries and filter weights. 0 0 0 0 0 1 2 2 1
• Computations can be parallelized.

16
2D convolutional layer as feature detector

• We can view the filter that we used in this example as a simple feature detector.
• Note that the filter has the shape of a corner. And the output is maximum at the position where
this corner is present in the input image.
• The local image structure or image feature “correlates” with the values of the convolution
mask/template/window/kernel.

0 0 0 0 0
0 1 1 1 0 dot 1 1 0 1 1 1
0 1 0 1 0 1 0 1 3 2 2
0 1 1 1 0 1 2 2 2
0 0 0 0 0 1 2 2 1

17
2D convolutional layer with multiple channels

• Inputs may contain multiple channels (RGB images).


• We need to detect multiple features using multiple filters. Therefore, a convolutional layer
contains multiple channels both in its inputs and outputs:
XXX
yi,j,o = w∆i,∆j,o,c xi+∆i,j+∆j,c + bo
∆i ∆j c

• The 1D position i and offset ∆i have been replaced by their 2D counterparts i, j and ∆i, ∆j.
• Just like in multilayer perceptrons, the output of a convolutional layer is usually run through a
nonlinear activation function, such as ReLU:
0
yi,j,o = relu(yi,j,o )

18
Convolution ∗

• One filter learns to detect a feature in the input:

• The features can be e.g. oriented edges.

19
2D convolutional layer in PyTorch

• torch.nn.Conv2d(in channels, out channels, kernel size, stride, padding)


stride x
0 0 0 0 0 0 0
0 0 w y
0 0
dot store
0 0
value
0 0
0 0 kernel size

0 0 0 0 0 0 0
padding

• Convolution visualization
• The size of the output will be different:

Hi + 2p − k − (k − 1)(d − 1)
 
Ho = +1
s

20
Why do we need padding?

• With padding, the output of a convolutional layer can have the same height and width as the
input.
• It is easier to design networks when the height and width is preserved.
• To use skip connections x + conv(x), like in ResNet, we need the dimensions to match.
• With padding, we can use deeper networks. Without padding, the size would reduce quickly with
adding new layers.
• Padding improves the performance by keeping information at the borders.

21
Convolutional layer is equivariant to translation

• Shifting the input image by one pixel to the right changes the output in the same way: it is
shifted by one pixel to the right.
0 0 0 0 0 0 0 0 0 0
0 1 1 1 0 dot 1 1 0 1 1 1 0 0 1 1 1 dot 1 1 0 0 1 1
0 1 0 1 0 1 0 1 3 2 2 0 0 1 0 1 1 0 0 1 3 2
0 1 1 1 0 1 2 2 2 0 0 1 1 1 0 1 2 2
0 0 0 0 0 1 2 2 1 0 0 0 0 0 0 1 2 2

• A function f is equivariant with respect to a transformation T if

f (T (x )) = T (f (x )).

• A function f is invariant with respect to a transformation T if

f (T (x )) = f (x ).

The result of f does not change when you apply the transformation to the input.

22
Convolutional networks
Example: MNIST classification

• Let us build a convolutional neural network (a network with convolutional layers) to solve the
MNIST classification task.
• The input is 28 × 28 pixels and 1 channel.
• First convolutional layer:
• 9 filters with 5 × 5 kernel and padding.
• First hidden layer: 28 × 28 pixels and 9 channels
• The number of parameters in the first layer (ignoring
biases):
5 × 5 × 9 = 225
• Compare with the fully connected layer:
28 × 28 × 225 = 176400

24
Example: MNIST classification

• Let us count the number of signals in the first layer:

28 × 28 × 9 = 7056

• Compare with the fully connected layer: 225


• The number of intermediate signals is much larger in
the convolutional layer. To process such a
high-dimensional signal, we need a significant amount
of computations in the rest of the network.
• In order to decrease the amount of computations, it makes sense to reduce the number of
intermediate signals.
• We can do so by a pooling layer.

25
Pooling layer

• A common way is to take the maximum value in a


small window (max pooling).
• For instance if we use max pooling with a filter of
size 2 × 2 we discard 75 percent of the values.
• Alternative to max pooling is to use mean/average 0 1 1 1
pooling. 1 3 2 2 max pooling 3
1 2 2 2
• Pooling aims to reduce the representation size and
1 2 2 1
number of network weights while removing only
unnecessary or redundant information.
• Pooling helps to make the representation
approximately invariant to small translations of the
input.

26
Pooling layer

• A common way is to take the maximum value in a


small window (max pooling).
• For instance if we use max pooling with a filter of
size 2 × 2 we discard 75 percent of the values.
• Alternative to max pooling is to use mean/average 0 1 1 1
pooling. 1 3 2 2 max pooling 3 2
1 2 2 2
• Pooling aims to reduce the representation size and
1 2 2 1
number of network weights while removing only
unnecessary or redundant information.
• Pooling helps to make the representation
approximately invariant to small translations of the
input.

26
Pooling layer

• A common way is to take the maximum value in a


small window (max pooling).
• For instance if we use max pooling with a filter of
size 2 × 2 we discard 75 percent of the values.
• Alternative to max pooling is to use mean/average 0 1 1 1
pooling. 1 3 2 2 max pooling 3 2
1 2 2 2 2
• Pooling aims to reduce the representation size and
1 2 2 1
number of network weights while removing only
unnecessary or redundant information.
• Pooling helps to make the representation
approximately invariant to small translations of the
input.

26
Pooling layer

• A common way is to take the maximum value in a


small window (max pooling).
• For instance if we use max pooling with a filter of
size 2 × 2 we discard 75 percent of the values.
• Alternative to max pooling is to use mean/average 0 1 1 1
pooling. 1 3 2 2 max pooling 3 2
1 2 2 2 2 2
• Pooling aims to reduce the representation size and
1 2 2 1
number of network weights while removing only
unnecessary or redundant information.
• Pooling helps to make the representation
approximately invariant to small translations of the
input.

26
Example: MNIST classification

• After adding a 2 × 2 pooling layer.

27
Stack more layers

• Note: Each unit looks at all the channels of the


previous layer.

28
Full network

• Finally, we flatten the outputs of the last convolutional layer and feed
them to a fully-connected layer with 10 outputs.
• We apply the softmax nonlinearity to the outputs and use the
cross-entropy loss.
• The network can be trained by any gradient-based optimization
procedure, for example, Adam.
• The gradients are computed by backpropagation as in the multilayer
perceptron. The biggest difference is that we need to take into
account parameter sharing inside the convolutional layers.

29
Backpropagation through a convolutional layer

• Forward computations in a convolutional layer:


XXX y
yi,j,o = w∆i,∆j,o,c xi+∆i,j+∆j,c + bo
∆i ∆j c

• Backward computations in a convolutional layer:


x
∂L X X ∂L
= xi+∆i,j+∆j,c
∂w∆i,∆j,o,c ∂yi,j,o ∂L
i j ∂y

∂L XXX ∂L
= w∆i,∆j,o,c
∂xi,j,c ∂yi−∆i,j−∆j,o
∆i ∆j o
∂L
• The latter operation is called transposed convolution.
∂x

30
Modern convolutional neural networks
Historical note: First convolutional networks

• Fukushima (1980) proposed neocognitron, a neural net-


work architecture with:
• Multiple layers of local feature detectors
• Weight sharing
• (LeCun et al., 1989): A convolutional network applied to
handwritten character recognition
• The method became the basis of a nationally deployed
check-reading systems. Fukushima’s Neocognitron

• (Waibel et al., 1989): Time-delay neural network which were similar to conv nets but applied to
audio (in a moving window).
• (LeCun et al., 1998): LeNet-5, a classical architecture of a convolutional neural networks.

32
ImageNet progress

Classification error % (top-5)

32

16 AlexNet

8 VGG
Human

4 ResNet

2
2011 2012 2013 2014 2015 2016 2017

33
AlexNet (Krizhevsky, 2012)

image source:oreilly.com

• Five convolutional layers and three fully-connected/dense layers


• ReLU nonlinearities after convolutional layers, dropout (regularization, wait for the next lecture)

34
ImageNet progress

Classification error % (top-5)

32

16 AlexNet

8 VGG
Human

4 ResNet

2
2011 2012 2013 2014 2015 2016 2017

35
VGG-19 (Simonyan & Zisserman, 2015)

• Suppose we have c input and c output channels.


• One convolutional layer with 7 × 7 filters: • If we stack three 3 × 3 conv layers:
• 49c 2 parameters • Effective receptive field is 7 × 7
• 27c 2 parameters (45% less)

36
VGG-19 (Simonyan & Zisserman, 2015)

• Compared to AlexNet:
• Smaller (3 × 3) filters
• Deeper network (more layers)

37
ImageNet progress

Classification error % (top-5)

32

16 AlexNet

8 VGG
Human

4 ResNet

2
2011 2012 2013 2014 2015 2016 2017

38
ResNet (He et al, 2016)

• Training deeper networks is a more difficult optimization problem:

• Experiments: deeper networks tend to have higher training error.


• Deeper networks should not produce higher training error compared to more shallow networks (extra
layers can learn simple identity mappings if needed).

39
ResNet (He et al., 2016)

• ResNet:
• Instead of learning f (x), layers learn x + h(x).
• He et al., (2016): If an identity mapping is optimal, it might be easier to push residual h(x) to zero
than to learn an identity mapping with f (x).

• Compared to VGG:
• Skip connections
• More layers

40
Why residual connections help training

• Balduzzi et al. (2017) Experiment with a randomly initialized MLP f : R → R, each hidden layer
contains 200 neurons with ReLU activations.
• Gradients ∂f
∂x
(x ) as a function of the input:

1-layer feedforward 24-layer feedforward 50-layer resnet

• Gradients are shattered for deep network without skip connections: Small changes of the input
have significant effect on the gradient. Thus the optimization becomes more difficult.

41
Batch normalization in convolutional networks

• Batch normalization facilitates faster convergence of the optimization procedure.


• BatchNorm2d: The batch statistics are computed across all examples in a mini-batch and all pixels.
N W H N W H
1 X X X (n) 1 X X X (n)
µ= xij σ2 = (xij − µ)2
NWH NWH
n=1 i=1 j=1 n=1 i=1 j=1

µ and σ 2 have as many elements as there are channels in z.


• Each channel c of the input map is transformed using the batch statistics and the BatchNorm
parameters γc and βc :
(n)
(n) zijc − µc
yijc = γc p + βc
σc2 + 

42
Applications of convolutional networks
Advantages of convolutional networks

• Advantages of convolutional networks


• Take into account the order or spatial structure of the inputs.
• Can process input sequences with varying lengths (due to parameter sharing).
• The computations can be effectively parallelized.
• For these reasons, convolutional networks have been used for processing images, text data,
speech, analyzing game positions and even for predicting protein folding.

44
Temporal convolutions

• WaveNet (van den Oord et al., 2016): an autoregressive model of speech:


T
Y
p(x) = p(xt |x1 , . . . , xt−1 )
t=1

• The conditional distribution p(xt |x1 , . . . , xt−1 ) is modeled with a 1D convolutional network:

One needs lots of


layers to model
long-term
dependencies

45
WaveNet: Dilated convolutions

• Stack of dilated causal convolutional layers:

• Dilated convolutions allow fast growth of the receptive field which is good for modeling long-term
dependencies.
• WaveNet (van den Oord, 2016) by Google, based on dilated convolutions, used to be the
state-of-the-art model for speech generation.

46
Semantic segmentation

• Segmentation: Generating pixel-wise segmentations giving the class of the object visible at each
pixel, or ”background” otherwise.

input image output segmentation map

• We need to classify each pixel of the input image.

47
Semantic segmentation with U-Net (Ronneberger et al, 2015)

• The contracting path (left side) is needed to extract high-level features.


• The expansive path (right side) is needed to make the classification decisions on the pixel level
(transposed convolutions are used here). The expansive path uses representations from the
contracting path (via skip connections and concatenation).

48
Convolutional model for neural machine translation (Gehring et al., 2017)

• Translation task: we need to translate a sentence in the


source language into a sentence in the target language.
• Convolutional networks are used to encode the source
sentence (sequence of words) and use that
representation to compute the probabilities of words in
the output sequence.

49
Convolutional networks in reinforcement learning

• In RL, convolutional networks are used to process sensory


inputs with two-dimensional structure.
• Example: AlphaZero (Silver et al, 2017), an RL algorithm
that achieves superhuman performance in the games of Go,
chess and shogi.
• Convolutional networks are used to compute the probability of
the next move pt and the probability vt that the player wins
the game from the current position (to build a search tree).

50
Protein folding (DeepMind blog)

• Proteins are large, complex molecules essential to all of life. What any given protein can do
depends on its unique 3D structure.

• Proteins are comprised of chains of amino acids. The information about the sequence of amino
acids is contained in DNA.
• Protein folding problem: Predicting how these chains will fold into the 3D structure of a protein.

51
CASP competition

• The Critical Assessment of protein Structure


Prediction (CASP): a biennial global competition
established in 1994, is the standard for assessing
predictive techniques.
• In 2018, DeepMind’s AlphaFold won the
competition by a significant margin. The model
was based on a convolutional neural network.
• AlphaFold 2 (Jumper et al., 2021) was based on an
attention-based neural network, that attempts to
interpret the structure of the “spatial graph” of a
folded protein.
• AlphaFold 3 Abramson et al., 2024 was a
diffusion-based model.

52
AlphaFold (Senior et al., 2020)

• The input of the model is a sequence of amino


acids. Each sequence is represented as a 2D matrix
in which each element corresponds to one pair of
amino acids. The features (channels) of each pixel
are produced using an external model.
• The output is the distances between the Cβ atoms
of pairs of amino acid residues of a protein. The
output can also be represented as a 2D matrix.
• A convolutional neural network is used to predict
the outputs from the inputs.

53
Recommended reading

• Chapter 9 of Deep Learning book


• References in the slides

54
Recap
Summary of Lecture #2

1. Convolutional neural networks (CNNs) reduce the number of network weights.


2. CNN layers introduce translation equivariance in the network.
3. Padding, pooling and dilation are techniques combined with CNNs.
4. LeNet, AlexNet, VGG and ResNet are classical CNN models.
5. CNNs can be applied in many image and signal processing tasks.

56
Home assignment
Assignment 02 cnn

• Implement and train three convolutional networks


1. CNN inspired by classical LeNet-5

2. VGG-style network

3. ResNet

58

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy