0% found this document useful (0 votes)

3 views77 pages

02 Cnn Slides

The lecture covers convolutional neural networks (CNNs) with a focus on image classification tasks, specifically using the MNIST dataset. Key topics include the limitations of multilayer perceptrons (MLPs) in handling spatial structures, the introduction of convolutional layers to address these issues, and the mechanics of 1D and 2D convolutions. The session also discusses parameter sharing and local connectivity as methods to reduce the number of parameters in the network.

Uploaded by

mrolaw01

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views77 pages

02 Cnn Slides

Uploaded by

mrolaw01

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 77

CS-E4890 Deep Learning

Lecture #2 Convolutional neural networks

14.1.2025

Jorma Laaksonen ––– Juho Kannala ––– Alexander Ilin

Today’s topics

1. Image classification task

2. 1D and 2D convolutions
3. Convolutional layer
4. Pooling layer
5. Convolutional networks
6. Modern convolutional neural networks
7. Applications of convolutional networks
8. Home assignment

1
Image classification task

• Example classification problem: Classify images of handwritten digits from the MNIST dataset.
• Inputs x(n) : Images 28 × 28 pixels in scalar greyscale values.
• Targets y(n) : One of the 10 classes in one-hot coding.

2
Spatial structure matters

• If we change the order of the pixels (in the same way for all images), the classification task
becomes much harder for humans.
• This suggests that our model can and should benefit from using the spatial information.

3
Image classification with a multilayer perceptron

• We can solve the classification task using

a multilayer perceptron model (MLP)
that we considered in the first lecture.
• We can flatten the images (for example,
stack the columns of the images into one
vector) and feed to the MLP model.

4
Problem 1: MLP ignores the spatial structure

• If we shuffle the pixels, we simply feed the

pixels into different inputs of the MLP.
• This means that the MLP ignores the
spatial structure and essentially solves a
more difficult problem.

5
Problem 1: MLP ignores the spatial structure

• Small translations of the input image (for

example, shifting the image one pixel to the
left/right/top/bottom) result in significant
changes of the MLP inputs, therefore the
outputs of the MLP will change in an
unpredictable way.
• The MLP has to learn to be invariant to such
transformations, which may require a
considerable amount of training.

6
Problem 2: Number of parameters

10 outputs
• Let us use an MLP with the following structure to solve
f = softmax(W3 h2 + b3 )
the MNIST classification task.
• Let us count the number of parameters in the network 144 units
(ignoring the bias terms b): h2 = relu(W2 h1 + b2 )

28 × 28 × 225 + 225 × 144 + 144 × 10 = 210240 225 units

h1 = relu(W1 x + b1 )
• If we want to process images that contain millions of
pixels, the number of parameters would be several orders 784 pixels
of magnitude larger. input x

7
Motivation for a layer of a new type

• We want to design an alternative to the fully-connected layer that would address these problems:
• Take into account the order of the inputs
• Change the outputs in a predictable way for simple transformations such as translation
• Reduce the number of parameters in the network

8
Convolutional layer
Fully-connected layer as a starting point

• Let us consider an input with one-dimensional structure. For example, we want to process time
series and the order of the inputs is determined by the time of the measurements.
• Let us start with a fully-connected layer that has 5 inputs and 5 outputs:

x1 x2 x3 x4 x5

• The layer has 5 × 5 = 25 parameters (ignoring the bias terms).

10
Local connectivity

• We can reduce the number of parameters by using only “local” connections.

• Now the outputs also have an order because each output corresponds to a particular location in
the inputs.

• The layer has now 13 parameters.

11
Parameter sharing

• We can further reduce the number of parameters by using weight sharing (arrows with the same
color red/black/blue represent shared weights).
• Now the layer has only 3 parameters.

• Why parameter sharing is useful: patterns that appear in different parts of the input sequence will
activate neurons in a similar way in the corresponding location of the output layer.
=⇒ Position/translation/shift equivariance in the input-output mapping.

12
1D convolutional layer

• The computations performed in such a layer:

X
yi = w∆i xi+∆i + b
∆i=−1,0,1

• The layer is called a (one-dimensional) convolutional layer because the computations are closely
related to (one-dimensional) discrete convolution familiar from signal processing:
X
(w ∗ x )[t] = w [a]x [t − a]
a

13
1D convolutional layer

• Inputs and outputs of such a layer usually contain multiple elements (usually called channels):
XX
yi,o = w∆i,o,c xi+∆i,c + bo
∆i c

• Weights w∆i,o,c are usually called kernel.

• There are two ways to process inputs at the borders:

0 0

no padding (different output size) padding (usually with zeros)

14
Inputs with 2D structure

1
W , 1b

• Same ideas can be used for inputs with 2D

spatial structure like images.
• Local connectivity: output is affected by
inputs in its neighborhood.
• Shared parameters: same colors in
connections represent shared weights.

15
2D convolutional layer: Forward computations