0% found this document useful (0 votes)

14 views

Unit 4a - Convolutional Neural Networks

Convolutional neural networks emerged from studies of the visual cortex and have achieved superhuman performance on complex visual tasks. CNNs use techniques like convolutional layers and pooling layers, which apply concepts like sparse interactions, parameter sharing, and equivariant representations to process spatial information in images. CNNs have become very successful in applications like image recognition, natural language processing, and more.

Uploaded by

Esha Thaniya Malla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

Unit 4a - Convolutional Neural Networks

Uploaded by

Esha Thaniya Malla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 107

Convolutional Neural

Networks
(Chapter 9 from the DL book)
(Chapter 14 from Hands-on ML book)
(Chapter 8 from Chollet’s book 2e)
Chapter 5 from Weidman’s book 1
Convolutional Neural Networks
 CNNs emerged from the study of the
brain’s visual cortex.
 CNNs have managed to achieve superhuman
performance on complex visual tasks.
 They power image search services, self-
driving cars, automatic video classification
systems.
 CNNs are also successful at many other tasks,
such as voice recognition and natural
language processing.

2
Convolutional Neural Networks
 Hubel and Wiesel discovered that the neurons
that receive visual input from the eye are in
general most responsive to simple, straight
edges at particular, specific orientations.
 Fittingly, they named these cells simple neurons.
 A large group of simple neurons together is able
to represent all 360 degrees of orientation.
 These edge-orientation detecting simple cells
then pass along information to a large number of
so-called complex neurons.
 Capable of detecting complex shapes like a corner or
a curve. 3
Convolutional Neural Networks

4
Convolutional Neural Networks
 The studies of visual cortex inspired the
neocognitron, introduced in 1980, which
gradually evolved into what we now call
convolutional neural networks.
 In 1998, Yann LeCun et al. introduced the
famous LeNet-5 architecture, widely used
by banks to recognize handwritten check
numbers.
 Introduced two new building blocks:
convolutional layers and pooling layers.
5
Convolutional Neural Networks
 Convolution is an operation on two
functions
 CNN convolutions
 First function is network input x, second is
kernel w
 The convolution kernel is usually a sparse
matrix in contrast to the usual fully-connected
weight matrix

6
Convolution operation
Input
Kernel
a b c d
w x
e f g h
y z
i j k l

Output

aw + bx + bw + cx + cw + dx +
ey + fz fy + gz gy + hz

ew + fx + fw + gx + gw + hx +
iy + jz jy + kz ky + lz

7
Convolutional Neural Networks
 Convolution leverages three important
ideas that help improve machine learning
systems
1. Sparse interactions
2. Parameter sharing
3. Equivariant representations
 CNNs take advantage of spatial
information
 Local patterns that are translation-invariant.
 and spatial hierarchies of these patterns.
8
Sparse Interactions
 Fully connected traditional networks
 with m neurons in a layer and n neurons in the
next layer
 requires O(m x n) runtime (per example).
 Sparse interactions
 Also called sparse- connectivity or weights.
 Accomplished by making kernel smaller than
input
 k << m requires O(k x n) runtime (per example)
 k is typically several orders of magnitude smaller
than m
9
Sparse Connectivit y

Sparse s1 s2 s3 s4 s5
Viewed
connections
from
due to small below
convolution
x1 x2 x3 x4 x5
kernel

s1 s2 s3 s4 s5
Dense
connections
Fully
connected x1 x2 x3 x4 x5

10
Sparse Connectivit y

Sparse s1 s2 s3 s4 s5
Viewed
connections
from above
due to small (receptive
convolution fields)
x1 x2 x3 x4 x5
kernel

s1 s2 s3 s4 s5
Dense
connections
Fully
connected x1 x2 x3 x4 x5

11
Growing Receptive Fields

g1 g2 g3 g4 g5

h1 h2 h3 h4 h5

x1 x2 x3 x4 x5
Parameter Sharing
 In traditional neural networks
 Each element of the weight matrix is unique.
 Parameter sharing mean using the same
value for more than one parameters.
 The network has tied weights.
 Reduces storage requirements to k
parameters.
 Forward propagation runtime O(k x n).

13
Equivariant Representations
 For an invariant function, if the input
changes, the output change in same way.
 For convolution, a particular form of
parameter sharing causes equivariance to
translation
 For example, as the dog moves in the input
image, the detected edges move in same way.
 In image processing, detecting edges is useful
in the first layer, and edges appear more or
less everywhere in the image.

14
Problem of Equivariance

 Solution: Capsule networks!

15
Receptive Field
 A neuron located in row i, column j of a given layer is
connected to the outputs of the neurons in the previous
layer located in rows i to i + fh – 1, columns j to j + fw – 1.
 Zero padding: In order for a layer to have the same height
and width as the previous layer, it is common to add
zeros around the inputs.

16
Stride greater than 1
 It is also possible to connect a large input
layer to a much smaller layer by spacing
out the receptive fields.

17
Padding valid/same

18
Stacking Multiple Feature Maps

 Typically, a convolutional
layer has multiple filters
and outputs one feature
map per filter.
 All neurons in a

feature map share the

same parameters.
 Neurons in different

feature maps use

different parameters.

19
Pooling Layers
 The pooling function replaces the output of
the net at a certain location with a
summary statistic of the nearby outputs
 Max pooling reports the maximum output
within a rectangular neighborhood
 Average pooling reports the average output
 Pooling helps make the representation
approximately invariant to small input
translations.
 Max pooling layer is the most commonly
used and performs better.
20
Pooling Layers
 People mostly use max pooling layers
instead of average pooling layers because
 Max pooling generally perform better.
 Max pooling preserves only the strongest
features, getting rid of all the meaningless
ones, so the next layers get a cleaner signal to
work with.
 Max pooling offers stronger translation
invariance than average pooling, and it
requires slightly less computing.

21
Pooling Layers
 Pooling layers subsample (i.e., shrink) the
input image in order to reduce the
computational load, the memory usage,
and the number of parameters.
 Thereby limiting the risk of overfitting.

22
Pooling Layers
 Other than reducing computations, memory
usage, and the number of parameters, a max
pooling layer also introduces some level of
invariance to small translations.

23
Pooling Layers - Depthwise
 Max pooling and average pooling can be
performed along the depth dimension rather than
the spatial dimensions.
 This can allow the CNN to learn to be invariant to
various features.
 For example, learn multiple filters, each detecting
a different rotation of the same pattern.
 The depthwise max pooling layer would ensure that
the output is the same regardless of the rotation.
 The CNN could similarly learn to be invariant to
anything else: thickness, brightness, skew, color,
and so on. 24
Pooling Layers - Depthwise

25
Pooling Layers
 One last type of pooling layer that you will
often see in modern architectures is the
global average pooling layer.
 Computes the mean of each entire feature map (it’s
like an average pooling layer using a pooling kernel
with the same spatial dimensions as the inputs).
 This means that it just outputs a single number per
feature map and per instance.
 USED as output layer in many well-known CNN
architectures (e.g., Googlnet, Xception, SEnet).

26
Convolutional Filter Hyperparameters

 Kernel size
 Padding
 Stride length

27
Convolutional Neural
Networks

(Chapter 8 from Chollet book 2e)

28
CNN for mnist
inputs = keras.Input(shape=(28, 28, 1))
x = layers.Conv2D(filters=32, kernel_size=3, activation="relu")(inputs)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=64, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=128, kernel_size=3, activation="relu")(x)
x = layers.Flatten()(x)
outputs = layers.Dense(10, activation="softmax")(x)
model = keras.Model(inputs=inputs, outputs=outputs)

29
CNN for mnist
Calculate the number of parameters:
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 28, 28, 1)] 0
conv2d (Conv2D) (None, 26, 26, 32) 320
max_pooling2d (MaxPooling2D) (None, 13, 13, 32) 0
conv2d_1 (Conv2D) (None, 11, 11, 64) 18496
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64) 0
conv2d_2 (Conv2D) (None, 3, 3, 128) 73856
flatten (Flatten) (None, 1152) 0
dense (Dense) (None, 10) 11530
=================================================================
Total params: 104,202
30
CNN for mnist
 Convolutions operate over rank-3 tensors called
feature maps, with two spatial axes (height and
width) as well as a depth axis (or channels axis).
 The convolution operation extracts patches from
its input feature map and applies the same
transformation to all of these patches, producing
an output feature map.
 Each of the 32 output channels contains a 26 ×
26 grid of values, which is a response map of
the filter over the input, indicating the response of
that filter pattern at different locations in the input.
31
CNN for mnist
>>> test_loss, test_acc = model.evaluate(test_images, test_labels)
>>> print(f"Test accuracy: {test_acc:.3f}")
Test accuracy: 0.991

 Whereas the densely connected model

from Lab Experiment 1 had a test accuracy
of 97.8%, the basic convnet has a test
accuracy of 99.1%.
 We decreased the error rate by about 60%
(from 2.2% to 0.9%).
32
Training a CNN on small dataset
 Classification of 5000 dogs and cats
 Training set: 1000 dogs and 1000 cats
 Validation set: 500 dogs and 500 cats
 Test set: 1000 dogs and 1000 cats
 4 tools in our deep learning toolbox
 Training from scratch on a small dataset
 Data augmentation to increase dataset size
 Feature extraction using a pretrained model
 Fine-tuning a pretrained model
33
Training a CNN on small dataset

34
Training a CNN on small dataset
inputs = keras.Input(shape=(180, 180, 3))
x = layers.Rescaling(1./255)(inputs)
x = layers.Conv2D(filters=32, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
A total of
x = layers.Conv2D(filters=64, kernel_size=3, activation="relu")(x)
991,041
x = layers.MaxPooling2D(pool_size=2)(x)
parameters.
x = layers.Conv2D(filters=128, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=256, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=256, kernel_size=3, activation="relu")(x)
x = layers.Flatten()(x)
outputs = layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs=inputs, outputs=outputs)
35
Training a CNN on small dataset
Data preprocessing

1 Read the picture files.

2 Decode the JPEG content to RGB grids of pixels.
3 Convert these into floating-point tensors.
4 Resize them to a shared size (we’ll use 180 × 180).
5 Pack them into batches (we’ll use batches of 32 images).

Keras' utility function image_dataset_from_directory()

helps preprocessing images.

36
Training a CNN on small dataset
callbacks = [
keras.callbacks.ModelCheckpoint(
filepath="convnet_from_scratch.keras",
save_best_only=True, monitor="val_loss")
]

history = model.fit(
train_dataset, epochs=30,
validation_data=validation_dataset,
callbacks=callbacks)

37
Training a CNN on small dataset
 Overfitting starts within 10 epochs.
 Validation accuracy peaks at 75%.
 We get a test accuracy of 69.5%.
 Expected due to random sampling on a small
dataset.
 We can try many techniques to mitigate
overfitting, such as dropout and weight
decay (L2 regularization).
 We try data augmentation technique.
38
Training a CNN on small dataset

data_augmentation = keras.Sequential(
[
layers.RandomFlip("horizontal"),
layers.RandomRotation(0.1),
layers.RandomZoom(0.2),
]
)

39
Training a CNN on small dataset
inputs = keras.Input(shape=(180, 180, 3))
x = data_augmentation(inputs)
x = layers.Rescaling(1./255)(x)
x = layers.Conv2D(filters=32, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=64, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=128, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=256, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=256, kernel_size=3, activation="relu")(x)
x = layers.Flatten()(x)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs=inputs, outputs=outputs)
40
Training a CNN on small dataset
 After training for 100 epochs, we found
 Overfitting occurring around 60th epoch (much
better than 10th epoch).
 Validation accuracy in the 80–85% range
(again a big improvement over our first try).
 We got a test accuracy of 83.5% (pretty decent
compared to 69.5%).

41
Feature extraction - pretrained model
 A common and highly effective approach to deep
learning on small image datasets is to use a
pretrained model that was previously trained on a
large dataset, typically on a large-scale image-
classification task.
 If this original dataset is large enough and
general enough, the spatial hierarchy of features
learned by the pretrained model can effectively
act as a generic model of the visual world, and
can prove useful for many different computer
vision problems.
42
Feature extraction - pretrained model

 Let’s consider a large convnet trained on

the ImageNet dataset (1.4 million labeled
images and 1,000 different classes).
 ImageNet contains many animal classes,
including different species of cats and
dogs, and you can thus expect it to perform
well on the dogs-versus-cats classification
problem.
 Two ways to use a pretrained model:
feature extraction and fine-tuning.
43
Feature extraction - pretrained model
 CNNs used for image classification comprise two
parts:
 a series of pooling and convolution layers,

 and a densely connected classifier.

 The first part is called the convolutional base of

the model.
 Feature extraction consists of taking the
convolutional base of a previously trained
network, running the new data through it, and
training a new classifier on top of the output.

44
Feature extraction - pretrained model
Prediction Prediction Prediction

Trained Trained New

Classifier Classifier Classifier

Trained
Trained Trained
Convolutional
Convolutional Convolutional
Base
base base
(frozen)

Input Input Input

45
Feature extraction - pretrained model

 Let’s put this into practice by using the

convolutional base of the VGG16 network,
trained on ImageNet, to extract interesting
features from cat and dog images, and
then train a dogs-versus-cats classifier on
top of these features.
 The VGG16 model, among others, comes
prepackaged with Keras.
 Import it from the keras.applications module.

46
Feature extraction - pretrained model
There are two ways we could proceed:
 Run the convolutional base over our dataset, record its
output to a NumPy array on disk, and then use this data
as input to a standalone, densely connected classifier.
 This solution is fast and cheap to run, because it only

requires running the convolutional base once for every

input image. But for the same reason, this technique
won’t allow us to use data augmentation.
 Extend the conv_base by adding Dense layers on top,
and run the whole thing from end to end on the input
data. This will allow us to use data augmentation,
because every input image goes through the
convolutional base every time it’s seen by the model.
47
Feature extraction - pretrained model
conv_base = keras.applications.vgg16.VGG16(weights="imagenet",
Extracting the VGG16 features and labels

include_top=False, input_shape=(180, 180, 3))

def get_features_and_labels(dataset):
all_features = [ ]
all_labels = [ ]
for images, labels in dataset:
preprocessed_images = keras.applications.vgg16.preprocess_input(images)
features = conv_base.predict(preprocessed_images)
all_features.append(features)
all_labels.append(labels)
return np.concatenate(all_features), np.concatenate(all_labels)

train_features, train_labels = get_features_and_labels(train_dataset)

val_features, val_labels = get_features_and_labels(validation_dataset)
test_features, test_labels = get_features_and_labels(test_dataset)

48
Feature extraction - pretrained model
Defining and training the output layer

>>> train_features.shape
(2000, 5, 5, 512)

inputs = keras.Input(shape=(5, 5, 512))

x = layers.Flatten()(inputs)
x = layers.Dense(256)(x)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs, outputs)

history = model.fit(
train_features, train_labels, epochs=20,
validation_data=(val_features, val_labels),
callbacks=callbacks)

49
Feature extraction - pretrained model

 We reach a validation accuracy of about

97% — much better than what we
achieved in the previous section with the
small model trained from scratch.

 Now, let us look at feature extraction

together with data augmentation
 creating a model that chains the conv_base
with a new dense classifier, and training it end
to end on the inputs.
50
Feature extraction - pretrained model
conv_base = keras.applications.vgg16.VGG16(weights="imagenet",
include_top=False)
conv_base.trainable = False
inputs = keras.Input(shape=(180, 180, 3))
x = data_augmentation(inputs)
x = keras.applications.vgg16.preprocess_input(x)
x = conv_base(x)
x = layers.Flatten()(x)
x = layers.Dense(256)(x)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs, outputs)
51
Feature extraction - pretrained model

 Now, we reach a validation accuracy of

over 98%.
 This is a strong improvement over the
previous model.
 We got a test accuracy of 97.5%.

52
Fine-tuning a pretrained model

history = model.fit(train_dataset, epochs=30,

validation_data=validation_dataset,
callbacks=callbacks)

 Here, we got a test accuracy of 98.5%.

54
Chapter Summary
 Convnets are the best type of machine learning
models for computer vision tasks.
 Convnets work by learning a hierarchy of
modular patterns and concepts to represent the
visual world.
 It’s easy to reuse an existing convnet on a new
dataset via feature extraction.
 A valuable technique for small image datasets.
 As a complement to feature extraction, we can
use fine-tuning.
 This pushes performance a bit further.
55
Convolutional Neural
Networks

Chapter 5 from Weidman’s book

56
Representation Learning
 Learning process in ANNs starts by creating
initially random combinations of the original
features via multiplication by a random weight
matrix;
 Through training, the neural network learns to
refine combinations that are helpful and discard
those that aren’t.
 e.g., x1 being higher than average, x139 being lower
than average, and x237 also being lower than average
strongly predicts that an image will be of digit 9.

57
Representation Learning
 This process of learning which
combinations of features are important is
known as representation learning, and
it’s the main reason why neural networks
are successful across different domains.

58
Spatial Patterns in images
 In images, the interesting “combinations of
features” (pixels) tend to come from pixels
that are close together in the image.
 In an image, it is simply much less likely that
an interesting feature will result from a
combination of 9 randomly selected pixels
throughout the image than from a 3 × 3 patch
of adjacent pixels.
 We want to exploit this fundamental fact
about image data.
59
Spatial Patterns in images
 How to exploit spatial patterns in machine
learning for computer vision?
 A solution, at a high level, is to create an
order of magnitude more combinations of
features, and have each one to be only a
combination of the pixels from a small
rectangular patch in the input image.

60
Spatial Patterns in images

61
Convolution operation
Input
Kernel
a b c d
w x
e f g h
y z
i j k l

Output

aw + bx + bw + cx + cw + dx +
ey + fz fy + gz gy + hz

ew + fx + fw + gx + gw + hx +
iy + jz jy + kz ky + lz

62
Convolution operation
 It turns out that features computed in this way
have a special interpretation: they represent
whether a visual pattern defined by the weights is
present at that location of the image.
 Kernels are essentially “pattern detectors.”

 The same set of weights W are used to detect

whether the visual pattern defined by the kernel
W existed at each location in the input image.
 The result is a “feature map” showing the

locations in the input image where the pattern

defined by W was present.
63
Multichannel Convolution Operation
 CNNs create an order of magnitude more features,
and each feature is a function of just a small patch
from the input image.

f f
nxf
64
Multichannel Convolution Operation
 The first hidden layer with m1 convolutional filters
transforms an input image into m1 feature maps.
 m1 feature maps represent presence/absence of m1
visual patterns at each location in the input image.
 Output of next layer with m2 filters represents
presence/absence of pattern of patterns at each
location in the input image.
 m2 feature maps of the second layer represent a
combination of the m1 visual features already learned
in the prior convolutional layer.

65
Multichannel Convolution Operation
 Each convolutional layer has
1. Input shape (batch size x input channels x
image height x image width)
2. Output shape (batch size x output channels x
image height x image width)
3. The convolutional filters have shape (input
channels x output channels x filter height x
filter width)
 We’ll keep all of this in mind when we
implement this convolution operation.
66
Convolutional vs Dense layers

67
Convolutional vs Dense layers
 One last difference between the two kinds of
layers is the way in which the individual neurons
themselves are interpreted:
 The interpretation of each neuron of a fully connected
layer is that it detects whether or not a particular
combination of the features learned by the prior layer
is present in the current observation.
 The interpretation of each neuron of a convolutional
layer is that it detects whether or not a particular
combination of visual patterns learned by the prior
layer is present at the given location of the input
image.

68
The Flatten Layer
 The last convolutional layer outputs a 3D
array of shape (channels × image height ×
image width) for each input image.
 This needs to be converted into 1D array to
be fed to the output layer to make a final
prediction.
 We do this with a flatten layer.

69
Pooling Layers
 Pooling layers simply down sample each of the
feature maps created by a convolution operation;
 for the most typically used pooling size of 2, a 2n × 2n
image would be downsampled to size n × n.

70
Pooling Layers
 The main advantage of pooling is
computational: by down-sampling the
image to contain one-fourth as many pixels
as the prior layer, pooling decreases both
the number of weights and the number of
computations needed to train the network
by a factor of 4;
 This can be further compounded if multiple
pooling layers are used in the network, as they
were in many architectures in the early days.
71
Pooling Layers
 The downside of pooling, of course, is that only one fourth
as much information can be extracted from the down-
sampled image.
 However, the strong performance in CV proved the trade-offs in
terms of increased computational speed were worth it.
 Nevertheless, pooling was considered by many to be a
trick that just happened to work but should probably be
done away with.
“The pooling operation used in convolutional neural networks is a
big mistake and the fact that it works so well is a disaster.”
---Geoffrey Hinton 2014.
 Most recent CNN architectures (such as “ResNets”) use
pooling minimally or not at all.
72
Pooling Layers
 A much more widely accepted way to do down-sampling
is to modify the stride of the convolution operation.
 With a stride of 2, the filter would be convolved with every
other element of the input image, so that the output would
be half the size of the input.
 This means that, using a stride of 2 would result in the
same output size and thus much the same reduction in
computation we would get from pooling with size 2, but
without as much loss of information:
 with pooling of size 2, only one-fourth of the elements

in the input have any effect on the output, whereas

with a stride of 2, every element of the input has some
effect on the output.
73
Applying CNNs beyond images
 Organizing data into “channels” and then processing
that data using a CNN goes beyond just images.
 For example, this data representation was a key to
DeepMind’s series of AlphaGo programs showing
that neural networks could learn to play Go.
 The input to the neural network is a 19 × 19 × 17
image stack comprising 17 binary feature planes.
 8 planes for white stones in the 8 prior moves
 8 planes for black stones in the 8 prior moves
 1 plane to represents the color to play (player turn)

74
Board of GO

75
Implementing the MCO – 1D case
Implementing the Multichannel
Convolution Operation
 The convolution in one dimension is conceptually
identical to the convolution in 2D: we take in a
one-dimensional input and a one-dimensional
convolutional filter as inputs and then create the
output by sliding the filter along the input.
 Building up to the full operation from that starting
point will turn out mostly to be a matter of adding
a bunch of for loops.

76
Implementing the MCO – 1D case
 Padding in 1D
def _pad_1d(inp: ndarray, num: int) -> ndarray:
z = np.array([0])
z = np.repeat(z, num)
return np.concatenate([z, inp, z])
input_1d = np.array([1,2,3,4,5])
param_1d = np.array([1,1,1])
_pad_1d(input_1d, 1)
>>> array([0, 1, 2, 3, 4, 5, 0])

77
Implementing the MCO – 1D case

Convolutions: The Forward Pass

def conv_1d(inp: ndarray, param: ndarray) -> ndarray:
# assert correct dimensions
assert_dim(inp, 1)
assert_dim(param, 1)
# pad the input
param_len = param.shape[0]
param_mid = param_len // 2
input_pad = _pad_1d(inp, param_mid)
# initialize the output
out = np.zeros(inp.shape)
# perform the 1d convolution
for o in range(out.shape[0]):
for p in range(param_len): out[o] += param[p] * input_pad[o+p]
# ensure shapes didn't change
assert_same_shape(inp, out)
return out
78
Implementing the MCO – 1D case
input_1d = np.array([1,2,3,4,5])
param_1d = np.array([1,1,1])
conv_1d(input_1d, param_1d)
>>> array([ 3., 6., 9., 12., 9.])

Convolutions: The Forward Pass

79
Implementing the MCO – 1D case
Convolutions: The Backward Pass
 We want to compute:
 The partial derivative of the loss with respect
to each element of the input to the convolution
operation.
 The partial derivative of the loss with respect

to each element of the filter.

 We need to write a function that takes in an
output_grad with the same shape as the input and
produces an input_grad and a param_grad.
80
Implementing the MCO – 1D case
 Computing an input_grad

For illustration purpose only!

def conv_1d_sum(inp: ndarray, param: ndarray) -> ndarray:
out = conv_1d(inp, param)
return np.sum(out)
# randomly choose to increase 5th element by 1
input_1d= np.array([1,2,3,4,5])
input_1d_2 = np.array([1,2,3,4,6])
param_1d = np.array([1,1,1])
print(conv_1d_sum(input_1d, param_1d))
print(conv_1d_sum(input_1d_2, param_1d))
>>> 39.0
>>> 41.0

 So, the gradient of the fifth element of the

input should be 41 – 39 = 2.
81
Implementing the MCO – 1D case
 Intuitively, the gradient of the fifth element
of the input is 2 as t5 appears twice in the
output of the convolution operation:
 O1 = t0w1 + t1w2 + t2w3
 O2 = t1w1 + t2w2 + t3w3

 O3 = t2w1 + t3w2 + t4w3

 O4 = t3w1 + t4w2 + t5w3

 O5 = t4w1 + t5w2 + t6w3

 What is the gradient of the fourth element?????

82
Implementing the MCO – 1D case
 Notice the pattern in:
L
 o4 * w3  o5 * w2  o6 * w1
grad grad grad

t5
L
 o3grad * w3  o4grad * w2  o5grad * w1
t4
L
 o2 * w3  o3 * w2  o4 * w1
grad grad grad

t3
 The indices on the output increase at the same
time the indices on the weights decrease.
83
Implementing the MCO – 1D case
 The indices on the output increase at the same
time the indices on the weights decrease.

input_grad = np.zeros_like(inp)
for o in range(inp.shape[0]):
for p in range(param.shape[0]):
input_grad[o] += output_pad[o+param_len-p-1] * param[p]

84
Implementing the MCO – 1D case
 Computing the parameter gradient

For illustration purpose only!

input_1d = np.array([1,2,3,4,5])
# randomly choose to increase first element by 1
param_1d = np.array([1,1,1])
param_1d_2 = np.array([2,1,1])
print(conv_1d_sum(input_1d, param_1d))
print(conv_1d_sum(input_1d, param_1d_2))
>>> 39.0
>>> 49.0
 So, the gradient of the first parameter
should be 49 – 39 = 10.

85
Implementing the MCO – 1D case
 Just as we did for the input, by closely examining
the output and seeing which elements of the filter
affect it, we can clearly see the pattern:
wgrad
1  t0 * o
grad
1  t1 * o
grad
2  t2 * o
grad
3  t3 * o
grad
4  t4 * o
grad
5

 And since, for the sum, all of the ograd elements

are just 1, and t0 is 0, we have:
w grad
1  t1  t2  t3  t4  1  2  3  4  10

86
Implementing the MCO – 1D case
 Coding this is easier, since “the indices are
moving in the same direction.”
param_grad = np.zeros_like(param)
for o in range(inp.shape[0]):
for p in range(param.shape[0]):
param_grad[p] += input_pad[o+p] * output_grad[o]

87
Implementing the Multichannel
Convolution Operation

Batches of inputs

88
Implementing the MCO – batches
 Let’s now add the capability for these
convolution functions to work with batches
of inputs — 2D inputs whose first
dimension represents the batch size of the
input and whose second dimension
represents the length of the 1D sequence:

input_1d_batch = np.array([[0,1,2,3,4,5,6],
[1,2,3,4,5,6,7]])

89
Implementing the MCO – batches
 The only difference in implementing the
forward pass with batches is that we have
to pad and compute the output for each
observation individually and then stack
the results to get a batch of outputs.

def conv_1d_batch(inp: ndarray, param: ndarray) -> ndarray:

outs = [conv_1d(obs, param) for obs in inp]
return np.stack(outs)

90
Implementing the MCO – batches
 The backward pass is similar for computing
the input gradients.

# "input_grad" is the function containing the for loop from earlier:

# it takes in a 1d input, a 1d filter, and a 1d output_gradient and
# computes the input grad
grads = [input_grad(inp[i], param, out_grad[i]) for i in range(batch_size)]
return np.stack(grads)

91
Implementing the MCO – batches
 The backward pass involves adding an outer for loop
to the code to compute the parameter gradients.

param_grad = np.zeros_like(param)
for i in range(inp.shape[0]): # inp.shape[0] = 2
for o in range(inp.shape[1]): # inp.shape[1] = 5
for p in range(param.shape[0]): # param.shape[0] = 3
param_grad[p] += input_pad[i][o+p] * output_grad[i][o]
return param_grad

92
Implementing the Multichannel
Convolution Operation

2D Convolutions

93
Implementing the MCO – 2D case
1. On the forward pass, we:
 Appropriately pad the input.
 Use the padded input and the parameters to compute the output.
2. On the backward pass, to compute the input gradient
and the parameter gradient we:
 Appropriately pad the output gradient.
 Use this padded output gradient, along with the input and the
parameters, to compute both the input gradient and the
parameter gradient.

94
Implementing the MCO – 2D case
 Coding the forward pass
out = np.zeros_like(inp)
for o_w in range(img_size): # loop through the image height
for o_h in range(img_size): # loop through the image width
for p_w in range(param_size): # loop through the param width
for p_h in range(param_size): # loop through the param height
out[o_w][o_h] += param[p_w][p_h] * input_pad[o_w+p_w][o_h+p_h]

Replacing the 1D loops

for o in range(out.shape[0]):
for p in range(param_len): out[o] += param[p] * input_pad[o+p]
95
Implementing the MCO – 2D case
 Coding the backward pass (input)

input_grad = np.zeros_like(inp)
for i_w in range(img_width):
for i_h in range(img_height):
for p_w in range(param_size):
for p_h in range(param_size):
input_grad[i_w][i_h] +=
output_pad[i_w+param_size-p_w-1][i_h+param_size-p_h-1] *
param[p_w][p_h]

96
Implementing the MCO – 2D case
 Coding the backward pass (parameter)

param_grad = np.zeros_like(param)
for i in range(batch_size): # equal to inp.shape[0]
for o_w in range(img_size):
for o_h in range(img_size):
for p_w in range(param_size):
for p_h in range(param_size):
param_grad[p_w][p_h] += input_pad[i][o_w+p_w][o_h+p_h]
* output_grad[i][o_w][o_h]

97
Implementing the MCO – channels
 So far, our code convolves filters over a
two-dimensional input and produces a two-
dimensional output.
 Now we modify it to account for cases
where both the input and the output are
multichannel.
 All we need to do is to add two outer for
loops to the code we’ve already seen —
one loop for the input channels and
another for the output channels.
98
Implementing the MCO – channels
 Forward pass
def _compute_output_obs(obs: ndarray, param: ndarray) -> ndarray:
…
out = np.zeros((out_channels,) + obs.shape[1:])
for c_in in range(in_channels):
for c_out in range(out_channels):
for o_w in range(img_size):
for o_h in range(img_size):
for p_w in range(param_size):
for p_h in range(param_size):
out[c_out][o_w][o_h] += param[c_in][c_out][p_w][p_h]*
obs_pad[c_in][o_w+p_w][o_h+p_h]
return out

99
Implementing the MCO – channels
 Forward pass

def _output(inp: ndarray, param: ndarray) -> ndarray:

'''
obs: [batch_size, channels, img_width, img_height]
param: [in_channels, out_channels, param_width, param_height]
'''
outs = [_compute_output_obs(obs, param) for obs in inp]
return np.stack(outs)

100
Implementing the MCO – channels
Backward pass
 The backward pass is similar and follows the
same conceptual principles as the backward
pass in the simple 2D case:
1. For the input gradients, we compute the gradients of
each observation individually—padding the output
gradient to do so—and then stack the gradients.
2. We also use the padded output gradient for the
parameter gradient, but we loop through the
observations as well and use the appropriate values
from each one to update the parameter gradient.

101
Implementing the MCO – channels
 Backward pass
def _compute_grads_obs(input_obs: ndarray, output_grad_obs: ndarray,
param: ndarray) -> ndarray:
…
for c_in in range(in_channels):
for c_out in range(out_channels):
for i_w in range(input_obs.shape[1]):
for i_h in range(input_obs.shape[2]):
for p_w in range(param_size):
for p_h in range(param_size):
input_grad[c_in][i_w][i_h] += output_obs_pad[c_out][i_w
+param_size-p_w-1][i_h+param_size-p_h-1] * param[c_in][c_out][p_w][p_h]
return input_grad

102
Implementing the MCO – channels
 Backward pass

def _input_grad(inp: ndarray, output_grad: ndarray,

param: ndarray) -> ndarray:

grads = [_compute_grads_obs(inp[i], output_grad[i], param)

for i in range(output_grad.shape[0])]
return np.stack(grads)

103
Implementing the MCO – channels
 Backward pass
def _param_grad(inp: ndarray, output_grad: ndarray,
param: ndarray) -> ndarray:
…

for i in range(inp.shape[0]):
for c_in in range(in_channels):
for c_out in range(out_channels):
for o_w in range(img_shape[0]):
for o_h in range(img_shape[1]):
for p_w in range(param_size):
for p_h in range(param_size):
param_grad[c_in][c_out][p_w][p_h] += inp_pad[i][c_in][o_w+p_w][o_h+p_h]
* output_grad[i][c_out][o_w][o_h]
return param_grad

104
The Flatten Operation
 The output of a convolution operation is a 3D
ndarray for each observation, of dimension
(channels, img_height, img_width).
 We flatten this 3D ndarray into a 1D vector.
class Flatten(Operation):
def __init__(self):
super().__init__()
def _output(self) -> ndarray:
return self.input.reshape(self.input.shape[0], -1)
def _input_grad(self, output_grad: ndarray) -> ndarray:
return output_grad.reshape(self.input.shape)
105
The Full Conv2D Layer

106
Summary
 We learnt
 What CNNs are
 Their similarities and differences from fully
connected neural networks
 How they work at the lowest level
 How to implement the core multichannel
convolution operation from scratch in Python.
 Forward pass: output method
 Backward pass: input_grad and param_grad methods

107

CNN Short
No ratings yet
CNN Short
61 pages
Convolutional Neural Networks in Python: Beginner's Guide to Convolutional Neural Networks in Python
From Everand
Convolutional Neural Networks in Python: Beginner's Guide to Convolutional Neural Networks in Python
Frank Millstein
No ratings yet
UNIT2-CNN
No ratings yet
UNIT2-CNN
34 pages
Convolutional Neural Networks : Covnets
No ratings yet
Convolutional Neural Networks : Covnets
22 pages
L09-10 DL and CNN
No ratings yet
L09-10 DL and CNN
56 pages
MLP and CNN
No ratings yet
MLP and CNN
56 pages
Chapter14 CNN
No ratings yet
Chapter14 CNN
54 pages
PNAL9_CNNs
No ratings yet
PNAL9_CNNs
61 pages
DL CNN
No ratings yet
DL CNN
7 pages
Convolutional Neural Networks: Convolutions, Pooling and Cnns. Neural Architectures For Computer Vision
No ratings yet
Convolutional Neural Networks: Convolutions, Pooling and Cnns. Neural Architectures For Computer Vision
64 pages
UNIT 2 Self Notes
No ratings yet
UNIT 2 Self Notes
10 pages
Module 3
No ratings yet
Module 3
67 pages
Convolutional Neural Networks Notes
No ratings yet
Convolutional Neural Networks Notes
29 pages
Data Warehouse
No ratings yet
Data Warehouse
3 pages
Introduction to Deep Learning
No ratings yet
Introduction to Deep Learning
47 pages
UNIT-III DLL full unit
No ratings yet
UNIT-III DLL full unit
63 pages
Convolutional Neural Network - 5
No ratings yet
Convolutional Neural Network - 5
21 pages
Super VIP Cheetsheet - Deep Learning, AI, ML
No ratings yet
Super VIP Cheetsheet - Deep Learning, AI, ML
47 pages
MLT UNIT-4 & 5 imp sol
No ratings yet
MLT UNIT-4 & 5 imp sol
22 pages
Lab 5 - Intro To Convolutional Neural Networks
No ratings yet
Lab 5 - Intro To Convolutional Neural Networks
52 pages
Building A Convolutional Neural Network Using Tensorflow Keras
No ratings yet
Building A Convolutional Neural Network Using Tensorflow Keras
10 pages
Module-4 dl
No ratings yet
Module-4 dl
22 pages
DL_UNIT_IV
No ratings yet
DL_UNIT_IV
18 pages
CII4Q3 VISI KOMPUTER - Deep Learning - CNN
No ratings yet
CII4Q3 VISI KOMPUTER - Deep Learning - CNN
106 pages
L11 Learning III Neural Network Architectures
No ratings yet
L11 Learning III Neural Network Architectures
35 pages
Lecture_3
No ratings yet
Lecture_3
48 pages
CO2_CNN_3
No ratings yet
CO2_CNN_3
31 pages
Convolutional Neural Network: by Gagandeep Kaur
100% (1)
Convolutional Neural Network: by Gagandeep Kaur
107 pages
unit-3-CNN-2024
No ratings yet
unit-3-CNN-2024
58 pages
DL_MOD3
No ratings yet
DL_MOD3
102 pages
NN 07
No ratings yet
NN 07
24 pages
UNIT - 2
No ratings yet
UNIT - 2
31 pages
Lec5 CNN RNN Attention
No ratings yet
Lec5 CNN RNN Attention
71 pages
Unit III
No ratings yet
Unit III
89 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
35 pages
Cnn
No ratings yet
Cnn
32 pages
Convolutional_Networks_2024
No ratings yet
Convolutional_Networks_2024
44 pages
Deep Learning Notes For Easy Access
No ratings yet
Deep Learning Notes For Easy Access
14 pages
Convolutional Neural Network - Wikipedia
No ratings yet
Convolutional Neural Network - Wikipedia
21 pages
ML Lec 13 CNN
No ratings yet
ML Lec 13 CNN
44 pages
Unit III
No ratings yet
Unit III
89 pages
Lecture 6
No ratings yet
Lecture 6
17 pages
Super VIP Cheatsheet - Deep Learning
No ratings yet
Super VIP Cheatsheet - Deep Learning
47 pages
3 # Deep Learning
No ratings yet
3 # Deep Learning
36 pages
Deep LearningUNIT-IV
No ratings yet
Deep LearningUNIT-IV
16 pages
Intro_DL_02
No ratings yet
Intro_DL_02
49 pages
Convolutional Neuralnetworks: Abin - Roozgard
No ratings yet
Convolutional Neuralnetworks: Abin - Roozgard
54 pages
AD3501-DL-UNIT 2 NOTES
No ratings yet
AD3501-DL-UNIT 2 NOTES
29 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
11 pages
Deep Learning Unit2
No ratings yet
Deep Learning Unit2
43 pages
NN 06
No ratings yet
NN 06
18 pages
Convolutional Neural Network (CNN) : Assignment On
No ratings yet
Convolutional Neural Network (CNN) : Assignment On
8 pages
CNN2
No ratings yet
CNN2
70 pages
IC - Lez4-5-6 - Convolutional Nets
No ratings yet
IC - Lez4-5-6 - Convolutional Nets
85 pages
Neural Networks Unit 3
No ratings yet
Neural Networks Unit 3
93 pages
Deep Learning: Seungsang Oh
No ratings yet
Deep Learning: Seungsang Oh
39 pages
What Should You Consider or Pay Attention To When Preparing A Data Set
No ratings yet
What Should You Consider or Pay Attention To When Preparing A Data Set
7 pages
Ch-3 Convolutional Neural Networks (CNNs)
No ratings yet
Ch-3 Convolutional Neural Networks (CNNs)
11 pages
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
From Everand
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
Fouad Sabry
No ratings yet
Flood Fill: Flood Fill: Exploring Computer Vision's Dynamic Terrain
From Everand
Flood Fill: Flood Fill: Exploring Computer Vision's Dynamic Terrain
Fouad Sabry
No ratings yet
Quantum Mechanics Lecture Notes
No ratings yet
Quantum Mechanics Lecture Notes
6 pages
PDF FORMAT - UNIT5 Questions From Dielectrics and Magnetism
No ratings yet
PDF FORMAT - UNIT5 Questions From Dielectrics and Magnetism
21 pages
PDF Format - Unit4 Questions From Semiconductor Physics
No ratings yet
PDF Format - Unit4 Questions From Semiconductor Physics
17 pages
PDF Format - Unit1 Interference and Diffraction
No ratings yet
PDF Format - Unit1 Interference and Diffraction
14 pages
Optical Fibers
No ratings yet
Optical Fibers
7 pages
r19 Mfcs - Unit-1 (Ref-2)
No ratings yet
r19 Mfcs - Unit-1 (Ref-2)
40 pages
Unit - I Mefa
No ratings yet
Unit - I Mefa
103 pages
Unit - Ii Mefa
No ratings yet
Unit - Ii Mefa
59 pages
Mean Unit-3
No ratings yet
Mean Unit-3
42 pages
MEAN Unit-4
No ratings yet
MEAN Unit-4
60 pages
4.1.3 - SampleQuestions - Electrical-Engineering - V1
No ratings yet
4.1.3 - SampleQuestions - Electrical-Engineering - V1
3 pages
Unit 3 Slides - Getting Started With Neural Networks
No ratings yet
Unit 3 Slides - Getting Started With Neural Networks
70 pages
Unit 5c - Generative Adversarial Networks
No ratings yet
Unit 5c - Generative Adversarial Networks
33 pages
MEAN Unit-2
No ratings yet
MEAN Unit-2
49 pages
LECUMES_DECEMBER
No ratings yet
LECUMES_DECEMBER
7 pages
Thoughts
No ratings yet
Thoughts
4 pages
Experiment 302 Heat and Calorimetry
No ratings yet
Experiment 302 Heat and Calorimetry
2 pages
REVIEW OF FRIEND's VIDEO
No ratings yet
REVIEW OF FRIEND's VIDEO
6 pages
Wiese - 2021
No ratings yet
Wiese - 2021
17 pages
Inplant Training (Front Page)
No ratings yet
Inplant Training (Front Page)
6 pages
KTGKI
No ratings yet
KTGKI
4 pages
PV33 6048 TLV
No ratings yet
PV33 6048 TLV
1 page
Media Studies Paper 2
No ratings yet
Media Studies Paper 2
12 pages
Flow Characteristics in Local Scour
No ratings yet
Flow Characteristics in Local Scour
16 pages
Environmental Microbiology 3rd Edition Walter Reineke Michael Schlmann instant download
No ratings yet
Environmental Microbiology 3rd Edition Walter Reineke Michael Schlmann instant download
91 pages
Dry Holes Analysis Leads To Exploration - Assessmen
No ratings yet
Dry Holes Analysis Leads To Exploration - Assessmen
4 pages
Workplan for Construction of Bosaso Stadium
No ratings yet
Workplan for Construction of Bosaso Stadium
2 pages
Unit 6 Predicates
No ratings yet
Unit 6 Predicates
5 pages
Catalogo - Sistema Resight - ZEISS
No ratings yet
Catalogo - Sistema Resight - ZEISS
2 pages
Power and Energy 5
No ratings yet
Power and Energy 5
24 pages
Idukki - KP Horary Software - Prashna Kundali Software - Free KP Astrology Software
No ratings yet
Idukki - KP Horary Software - Prashna Kundali Software - Free KP Astrology Software
3 pages
The Philosophy of Emotion
No ratings yet
The Philosophy of Emotion
23 pages
GE D60 Relay Manual
No ratings yet
GE D60 Relay Manual
704 pages
Can Communication Codes Guidelines
No ratings yet
Can Communication Codes Guidelines
12 pages
Design and Fabrication of Agricultural Cutter Using 4 Bar Mechanism
100% (1)
Design and Fabrication of Agricultural Cutter Using 4 Bar Mechanism
6 pages
Tensile and Fracture Properties of Coir Fiber Green Composites Bone Plate Fixation
No ratings yet
Tensile and Fracture Properties of Coir Fiber Green Composites Bone Plate Fixation
11 pages
E-Mail Wrting Techniques
No ratings yet
E-Mail Wrting Techniques
5 pages
Tutorial 1 &2
No ratings yet
Tutorial 1 &2
2 pages
Biology Fsc-Part 1 Name: Time Allowed: 20 Min Total Marks: 15
No ratings yet
Biology Fsc-Part 1 Name: Time Allowed: 20 Min Total Marks: 15
3 pages
USAA Leadership Communications
100% (1)
USAA Leadership Communications
10 pages
Topic-3 Questions
No ratings yet
Topic-3 Questions
10 pages
Nwafor
No ratings yet
Nwafor
16 pages
2212090 Kayg Anat30014 Synopticdiarytb1 Annotated
No ratings yet
2212090 Kayg Anat30014 Synopticdiarytb1 Annotated
28 pages
Materi Awareness ISO 37001:2018
No ratings yet
Materi Awareness ISO 37001:2018
160 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Unit 4a - Convolutional Neural Networks

Uploaded by

Unit 4a - Convolutional Neural Networks

Uploaded by

Convolutional Neural

 Solution: Capsule networks!

feature map share the

feature maps use

(Chapter 8 from Chollet book 2e)

 Whereas the densely connected model

1 Read the picture files.

Keras' utility function image_dataset_from_directory()

 Let’s consider a large convnet trained on

 and a densely connected classifier.

 The first part is called the convolutional base of

Trained Trained New

Input Input Input

 Let’s put this into practice by using the

requires running the convolutional base once for every

include_top=False, input_shape=(180, 180, 3))

train_features, train_labels = get_features_and_labels(train_dataset)

inputs = keras.Input(shape=(5, 5, 512))

 We reach a validation accuracy of about

 Now, let us look at feature extraction

 Now, we reach a validation accuracy of

1. Add our custom network on top

history = model.fit(train_dataset, epochs=30,

 Here, we got a test accuracy of 98.5%.

Chapter 5 from Weidman’s book

 The same set of weights W are used to detect

locations in the input image where the pattern

in the input have any effect on the output, whereas

Convolutions: The Forward Pass

Convolutions: The Forward Pass

to each element of the filter.

For illustration purpose only!

 So, the gradient of the fifth element of the

 O3 = t2w1 + t3w2 + t4w3

 O4 = t3w1 + t4w2 + t5w3

 O5 = t4w1 + t5w2 + t6w3

 What is the gradient of the fourth element?????

For illustration purpose only!

 And since, for the sum, all of the ograd elements

def conv_1d_batch(inp: ndarray, param: ndarray) -> ndarray:

# "input_grad" is the function containing the for loop from earlier:

Replacing the 1D loops

def _output(inp: ndarray, param: ndarray) -> ndarray:

def _input_grad(inp: ndarray, output_grad: ndarray,

grads = [_compute_grads_obs(inp[i], output_grad[i], param)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.