0% found this document useful (0 votes)
31 views

DLCV Ch2 Example Exercise

This document provides examples and exercises for using deep learning for computer vision tasks. It begins with examples of setting up neural network architectures with different activation functions and training a multi-layer perceptron model on the MNIST dataset. It then demonstrates how to implement a convolution operation by generating a random filter, convolving portions of an input image with the filter, and visualizing the results. The goal is to introduce common deep learning and computer vision techniques like neural network design, activation functions, training models, and performing basic image operations like convolution.

Uploaded by

Mario Parot
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

DLCV Ch2 Example Exercise

This document provides examples and exercises for using deep learning for computer vision tasks. It begins with examples of setting up neural network architectures with different activation functions and training a multi-layer perceptron model on the MNIST dataset. It then demonstrates how to implement a convolution operation by generating a random filter, convolving portions of an input image with the filter, and visualizing the results. The goal is to introduce common deep learning and computer vision techniques like neural network design, activation functions, training models, and performing basic image operations like convolution.

Uploaded by

Mario Parot
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Deep Learning for

Computer Vision

CH 2_EXERCISE

Prof. G.S. Jison Hsu 徐繼聖


• Artificial Vision Laboratory
• National Taiwan University of
Science and Technology

Deep Learning for Computer Vision


Example 2.1
Please download 2-1_Activation_Function.zip and unzip it.
• Given an architecture with input size=2, hidden layer=2 and output size=1.
• Each layer is followed by the Leaky ReLU activation function.
• For more activation functions, please refer to the following link:
https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity
• In this case, we can know how to establish a neural network in Pytorch; and
how to use the activation function.

Deep Learning for Computer Vision 2


Example 2.1

Change this to use the Leaky ReLU activation function.

Deep Learning for Computer Vision 3


Exercise 2.1
Please download 2-1_Activation_Function.zip and unzip it.
• Given an architecture with input size=2, hidden layer=2 and output size=1.
• Each layer is followed by the ELU activation function.
• Compare the results made by the ReLU and the ELU.
• Please write down the results, codes and your observations in the Word and
upload to Moodle.

Deep Learning for Computer Vision 4


Example 2.2 Train A MLP Model
Please download 2-2_MLP_MNIST.zip and unzip it.
• Given an architecture with input size=28x28, hidden layer=3 and output size=10.
• The 1st hidden layer includes 500 neurons, the 2nd layer includes 250 neurons and
the 3rd layer includes 125 neurons.
• Each layer is followed by the ReLU activation function.
• Use MNIST dataset as the input data: 90% for training and 10% for testing.
• Please follow the instructure to train a MLP base handwritten digits classifier.

Deep Learning for Computer Vision 5


Example 2.2 Train A MLP Model

Normalize the image

Download the MNIST dataset for training and testing

Deep Learning for Computer Vision 6


Example 2.2 Train A MLP Model

Define the MLP network

Use the ReLU activation function

Deep Learning for Computer Vision 7


Example 2.2 Train A MLP Model

We can observe the losses for


each epoch and evaluate the
performance on the test set.

Deep Learning for Computer Vision 8


Exercise 2.2 Train A MLP Model
Please download 2-2_MLP_MNIST.zip and unzip it.
1. Given an architecture with input size=28x28, hidden layer=4, and output
size=10.
2. The 1st hidden layer includes 256 neurons, the 2nd layer includes 512
neurons, the 3rd layer includes 125 neurons, and the 4th layer includes 64
neurons.
3. Use the MNIST dataset to train the model and evaluate the performance.
(where the MNIST is the handwritten digits dataset that includes 60,000
training data and 10,000 testing data.)
4. Compare the result from Example 2.2 to yours.
5. Please write down the results, codes and your observations in the Word and
upload to Moodle.
Deep Learning for Computer Vision 9
Example 2.3 Convolution Filter
Please download 2-3_Convolution_Example.zip and unzip it.
1. Given a face image Iimg, please generate Fconv , a 5x5 convolution filter with
random coefficients in uniform distribution over [0,1].
2. Use the “same zeros padding” strategy with a unit stride to convolve the input
face image.
3. Compute the output image by convolving Iimg with Fconv and calculate the L2 loss
between the original image and the one after convolving.
KERNAL_SIZE = 5 Iimg Fconv
STRIDE = 1
PADDING = (KERNAL_SIZE - STRIDE)/2
PADDING = int(PADDING)
img = cv2.imread('./006_01_01_051_08.png')
img = cv2.resize(img,(28,32))

img = cv2.cvtColor(img,cv2.COLOR_RGB2GRAY)
Conv_Filter = np.random.rand(KERNAL_SIZE,KERNAL_SIZE)
Conv_Filter = Conv_Filter/np.sum(Conv_Filter)
img_F = img Input Image Conv kernel
Deep Learning for Computer Vision 10
Example 2.3 Convolution Filter

KERNAL_SIZE = 5
STRIDE = 1
PADDING = (KERNAL_SIZE - STRIDE)/2 cv2.cvtcolor(): method is used to
PADDING = int(PADDING)
img = cv2.imread('./006_01_01_051_08.png') convert an image from one color
img = cv2.resize(img,(28,32))
space to another.
img = cv2.cvtColor(img,cv2.COLOR_RGB2GRAY)
Conv_Filter = np.random.rand(KERNAL_SIZE,KERNAL_SIZE)
#Conv_Filter =
np.random.rand(): Return a
np.random.normal(mean,std,(KERNAL_SIZE,KERNAL_SIZE)) #Normal sample (or samples) from the
distribution
Conv_Filter = Conv_Filter/np.sum(Conv_Filter) “standard normal” distribution.
img_F = img

Deep Learning for Computer Vision 11


Example 2.3 Convolution Filter
for h in range(int((H-KERNAL_SIZE)/STRIDE)+1):
for w in range(int((W-KERNAL_SIZE)/STRIDE)+1):
aa = img_F[h*STRIDE:h*STRIDE + (KERNAL_SIZE), w*STRIDE:w*STRIDE +
(KERNAL_SIZE)]*Conv_Filter

new_feature[h,w] = np.sum(aa)

img_S = img_F.astype(np.uint8)
img_new = new_feature.astype(np.uint8)
Implement the element-wise
multiplication of the portion of
cv2.rectangle(img_S, (int(w*STRIDE), int(h*STRIDE)), (int((w*STRIDE +
KERNAL_SIZE)), int((h*STRIDE + KERNAL_SIZE))), (255, 0, 0), 1) the image with Conv filter.
cv2.namedWindow('Conv_process', cv2.WINDOW_NORMAL) Do the summation to generate
cv2.resizeWindow("Conv_process", 300, 300)
cv2.imshow('Conv_process',img_S) the output feature value.
cv2.namedWindow('Conv_result', cv2.WINDOW_NORMAL) np.astype(): Copy of the array,
cv2.resizeWindow("Conv_result", 300, 300)
cv2.imshow('Conv_result',img_new)
cast to a specified type.
cv2.waitKey(100) cv2.rectangle(): Used to draw a
rectangle on any image.
Deep Learning for Computer Vision 12
Example 2.3 Convolution Filter
The kernels can help us bring reasonable changes in the image,
so the image needs to resize appropriately

for h in range(int((H-KERNAL_SIZE)/STRIDE)+1):
for w in range(int((W-KERNAL_SIZE)/STRIDE)+1):
aa = img_F[h*STRIDE:h*STRIDE + (KERNAL_SIZE), w*STRIDE:w*STRIDE +
(KERNAL_SIZE)]*Conv_Filter

new_feature[h,w] = np.sum(aa) • np. sum(): Sum of array elements over a given axis
• The astype() method: Returns a new DataFrame where
img_S = img_F.astype(np.uint8)
img_new = new_feature.astype(np.uint8) the data types have been changed to the specified type

cv2.rectangle(img_S, (int(w*STRIDE), int(h*STRIDE)), (int((w*STRIDE +


KERNAL_SIZE)), int((h*STRIDE + KERNAL_SIZE))), (255, 0, 0), 1)

Deep Learning for Computer Vision 13


Example 2.3 Convolution Filter
cv2.namedWindow('Conv_process', cv2.WINDOW_NORMAL)
cv2.resizeWindow("Conv_process", 300, 300)
cv2.imshow('Conv_process',img_S)

cv2.namedWindow('Conv_result', cv2.WINDOW_NORMAL)
cv2.resizeWindow("Conv_result", 300, 300)
cv2.imshow('Conv_result',img_new)

cv2.waitKey(100)

• cv2.namedWindow: Creates a window that can be used as a placeholder for images


and trackbars
• cv2.resizeWindow (): Used to resize images using different interpolation techniques
• cv2 imshow (): Add an image in the window
• cv2.waitkey (): Allows you to wait for a specific time in milliseconds until you press
any button
Deep Learning for Computer Vision 14
Example 2.3 Convolution Filter
%% Percentage of information loss
imgd = reshape(double(img), numel(img),1);
convimgd = reshape(double(Conv_Img), numel(Conv_Img),1);

Diff = sum(sum(abs(double(imgd)./norm(double(imgd)) - double(convimgd)./norm(double(convimgd)))));


PercnetageDiff = Diff/sum(sum(double(imgd)./norm(double(imgd)))*100);

fprintf('The information loss using %dx%d convolution kernel is %.6f%%\n\n', KERNAL_SIZE,


KERNAL_SIZE, PercnetageDiff);
Original Image Convolution Processing Feature Map

Deep Learning for Computer Vision 15


Exercise 2.3 Convolution Filter
1. Please download 2-3_Convolution_Example.zip and design your convolution filter Fconv
with the kernel size=3, 5, 7, and fill them with the random coefficients in a normal
distribution (std=1, mean=5).
2. Compute the output image Iout by convolving the input image Iimg with the designed
convolution filter Fconv .
3. Calculate the information loss (MSE loss) between the input image Iimg and the
convolved output Iout.
4. Please show your code, results, and observations in a Word file and upload to Moodle.
KERNAL_SIZE = 3
STRIDE = 1
PADDING = (KERNAL_SIZE - STRIDE)/2
PADDING = int(PADDING)
img = cv2.imread('./006_01_01_051_08.png')
img = cv2.resize(img,(28,32))
Convert the filter
img = cv2.cvtColor(img,cv2.COLOR_RGB2GRAY)
Conv_Filter = np.random.rand(KERNAL_SIZE,KERNAL_SIZE) coefficients to a
Conv_Filter = Conv_Filter/np.sum(Conv_Filter)
normal distribution.
img_F = img
Deep Learning for Computer Vision 16
Example 2.4 Max Pooling
• Given a face image Iimg, please generate Fconv , a 3x3 convolution filter with
random coefficients in normal distribution (std = 0, mean=2) , and use “same
zeros padding” with unit stride. Use “max pooling” with 3x3 kernel size and
unit stride. Please compute the output by convolving Iimg with Fconv and calculate
the information loss between original image and the one after convolving.

Convolution、Max Pooling

Input Image Ouput Image

Use max pooling

Deep Learning for Computer Vision 17


Exercise 2.4 Average Pooling
Please download 2-4_Pooling_Example.zip from the Moodle.
• Given a face image Iimg, please generate Fconv , a 3x3 convolution filter with
random coefficients in normal distribution (std = 0, mean=2).
• Use “same zeros padding” with unit stride to pool the feature maps.
• Use “average pooling” with 3x3 kernel size and unit stride.
• Compute the output by convolving Iimg with Fconv .
• Please write down the results, codes and your observations in the Word and
upload to Moodle.

Deep Learning for Computer Vision 18


Example 2.5 Train the LeNet Network
Please download 2-5_CNN_MNIST.zip and unzip it.
• Given a LeNet network shown in the figure below.
• Use the “Softmax with Cross-Entropy” loss as the objective function.
• Train a handwritten digits (MNIST) classifier using the LeNet network, compute
the confusion matrix for each class, and show the precision and recall rates.

LeNet (1998)
Deep Learning for Computer Vision 19
Example 2.5 Train the LeNet Network

Define the LeNet network

Deep Learning for Computer Vision 20


Example 2.5 Train the LeNet Network
In MNIST dataset, we have 10,000 test images with the ground-truth digits 1 to 10, where the GT
denotes the ground-truth of each image, and Pred denotes the predicted digit number.
GT_1 GT_2 GT_3 GT_4 GT_5 GT_6 GT_7 GT_8 GT_9 GT_10
Pred_1 938 0 15 5 1 19 18 5 7 11
Pred_2 0 1099 27 4 5 6 4 27 16 7
Pred_3 5 3 855 24 5 9 16 23 21 7
Pred_4 2 5 25 874 1 76 1 2 38 9
Pred_5 0 0 21 2 882 12 22 8 15 99
Pred_6 25 1 8 49 2 686 26 2 47 16
Pred_7 8 3 28 1 14 26 869 0 16 1
Pred_8 1 2 14 20 1 13 0 906 10 37
Pred_9 1 22 34 27 6 36 2 8 779 9
Pred_10 0 0 5 4 65 9 0 47 25 813

These numbers denotes the no. of data. (10,000 in total) GT: ground truth
Pred: prediction
Deep Learning for Computer Vision 21
Example 2.5 Train the LeNet Network
To compute the confusion matrix, we take class “1” as the example.
GT_1 GT_2 GT_3 GT_4 GT_5 GT_6 GT_7 GT_8 GT_9 GT_10
Pred_1 938 0 15 5 1 19 18 5 7 11
Pred_2 0 1099 27 4 5 6 4 27 16 7
Pred_3 5 3 855 24 5 9 16 23 21 7
Pred_4 2 5 25 874 1 76 1 2 38 9
Pred_5 0 0 21 2 882 12 22 8 15 99
Pred_6 25 1 8 49 2 686 26 2 47 16
Pred_7 8 3 28 1 14 26 869 0 16 1
Pred_8 1 2 14 20 1 13 0 906 10 37
Pred_9 1 22 34 27 6 36 2 8 779 9
Pred_10 0 0 5 4 65 9 0 47 25 813
TP = 938 Number of True Positive 𝑇𝑃 938
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = = = 0.92
FP = 0 + 15 + 5 + … + 11 = 81 𝑇𝑃 + 𝐹𝑃 938 + 81
TN = 1099 + 27 + 4 + … + 25 + 813 = 8939 𝑇𝑃 938
FN = 0 + 5 + 2 + … + 1 + 0 = 42 𝑅𝑒𝑐𝑎𝑙𝑙 = = = 0.957
𝑇𝑃 + 𝐹𝑁 938 + 42
Deep Learning for Computer Vision 22
Example 2.5 Train the LeNet Network
Let’s use the class “4” as the example.
GT_1 GT_2 GT_3 GT_4 GT_5 GT_6 GT_7 GT_8 GT_9 GT_10
Pred_1 938 0 15 5 1 19 18 5 7 11
Pred_2 0 1099 27 4 5 6 4 27 16 7
Pred_3 5 3 855 24 5 9 16 23 21 7
Pred_4 2 5 25 874 1 76 1 2 38 9
Pred_5 0 0 21 2 882 12 22 8 15 99
Pred_6 25 1 8 49 2 686 26 2 47 16
Pred_7 8 3 28 1 14 26 869 0 16 1
Pred_8 1 2 14 20 1 13 0 906 10 37
Pred_9 1 22 34 27 6 36 2 8 779 9
Pred_10 0 0 5 4 65 9 0 47 25 813
TP = 874 Number of True Positive 𝑇𝑃 874
FP = 2 + 5 + 25 + … + 38 = 159 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = = = 0.846
𝑇𝑃 + 𝐹𝑃 874 + 159
TN = 938 + 0 + 15 + … + 25 + 813 = 8831 𝑇𝑃 874
FN = 5 + 4 + 24 + … + 27 + 4 = 136 𝑅𝑒𝑐𝑎𝑙𝑙 = = = 0.865
𝑇𝑃 + 𝐹𝑁 874 + 136
Deep Learning for Computer Vision 23
Example 2.5 Train the LeNet Network
TP, FP, TN, FN, Precision and Recall distributions in each class.
No. TP No. FP No. TN No. FN Precision Recall
Class 1 938 81 8939 42 0.921 0.957
Class 2 1099 96 8769 36 0.920 0.968
Class 3 855 113 8855 177 0.883 0.828
Class 4 874 159 8831 136 0.846 0.865
Class 5 882 179 8839 100 0.831 0.898
Class 6 686 176 8932 206 0.796 0.769
Class 7 869 97 8945 89 0.900 0.907
Class 8 906 98 8874 122 0.902 0.881
Class 9 779 145 8881 195 0.843 0.800
Class 10 813 155 8836 196 0.840 0.806

Deep Learning for Computer Vision 24


Exercise 2.5 Train the LeNet Network
Please download 2-5_CNN_MNIST.zip and unzip it.
1. Given a LeNet network as the base model, please add in FC 2-1 (with output
dimension = 27) so that it can be structured as “Modified LeNet” as shown in the
figure below.
2. Define the hyper-parameters with the following settings: epochs = 20, learning rate
= 0.001, and use the “Softmax with Cross-Entropy” loss as the objective function.
3. Train the “Modified LeNet” on the MNIST dataset, and compute the confusion
matrix for each class, and show the precision and recall rate.

FC2-1 (New layer)


FC3 (Output Layer)
FC3 (Output Layer)
Conv12

Conv12
Conv11

Conv11
Pool1

Pool1

Pool1

Pool1
FC2

FC2
FC1

FC1
Deep Learning for Computer Vision
LeNet (1998) Modified LeNet 25

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy