DLCV Ch2 Example Exercise
DLCV Ch2 Example Exercise
Computer Vision
CH 2_EXERCISE
img = cv2.cvtColor(img,cv2.COLOR_RGB2GRAY)
Conv_Filter = np.random.rand(KERNAL_SIZE,KERNAL_SIZE)
Conv_Filter = Conv_Filter/np.sum(Conv_Filter)
img_F = img Input Image Conv kernel
Deep Learning for Computer Vision 10
Example 2.3 Convolution Filter
KERNAL_SIZE = 5
STRIDE = 1
PADDING = (KERNAL_SIZE - STRIDE)/2 cv2.cvtcolor(): method is used to
PADDING = int(PADDING)
img = cv2.imread('./006_01_01_051_08.png') convert an image from one color
img = cv2.resize(img,(28,32))
space to another.
img = cv2.cvtColor(img,cv2.COLOR_RGB2GRAY)
Conv_Filter = np.random.rand(KERNAL_SIZE,KERNAL_SIZE)
#Conv_Filter =
np.random.rand(): Return a
np.random.normal(mean,std,(KERNAL_SIZE,KERNAL_SIZE)) #Normal sample (or samples) from the
distribution
Conv_Filter = Conv_Filter/np.sum(Conv_Filter) “standard normal” distribution.
img_F = img
new_feature[h,w] = np.sum(aa)
img_S = img_F.astype(np.uint8)
img_new = new_feature.astype(np.uint8)
Implement the element-wise
multiplication of the portion of
cv2.rectangle(img_S, (int(w*STRIDE), int(h*STRIDE)), (int((w*STRIDE +
KERNAL_SIZE)), int((h*STRIDE + KERNAL_SIZE))), (255, 0, 0), 1) the image with Conv filter.
cv2.namedWindow('Conv_process', cv2.WINDOW_NORMAL) Do the summation to generate
cv2.resizeWindow("Conv_process", 300, 300)
cv2.imshow('Conv_process',img_S) the output feature value.
cv2.namedWindow('Conv_result', cv2.WINDOW_NORMAL) np.astype(): Copy of the array,
cv2.resizeWindow("Conv_result", 300, 300)
cv2.imshow('Conv_result',img_new)
cast to a specified type.
cv2.waitKey(100) cv2.rectangle(): Used to draw a
rectangle on any image.
Deep Learning for Computer Vision 12
Example 2.3 Convolution Filter
The kernels can help us bring reasonable changes in the image,
so the image needs to resize appropriately
for h in range(int((H-KERNAL_SIZE)/STRIDE)+1):
for w in range(int((W-KERNAL_SIZE)/STRIDE)+1):
aa = img_F[h*STRIDE:h*STRIDE + (KERNAL_SIZE), w*STRIDE:w*STRIDE +
(KERNAL_SIZE)]*Conv_Filter
new_feature[h,w] = np.sum(aa) • np. sum(): Sum of array elements over a given axis
• The astype() method: Returns a new DataFrame where
img_S = img_F.astype(np.uint8)
img_new = new_feature.astype(np.uint8) the data types have been changed to the specified type
cv2.namedWindow('Conv_result', cv2.WINDOW_NORMAL)
cv2.resizeWindow("Conv_result", 300, 300)
cv2.imshow('Conv_result',img_new)
cv2.waitKey(100)
Convolution、Max Pooling
LeNet (1998)
Deep Learning for Computer Vision 19
Example 2.5 Train the LeNet Network
These numbers denotes the no. of data. (10,000 in total) GT: ground truth
Pred: prediction
Deep Learning for Computer Vision 21
Example 2.5 Train the LeNet Network
To compute the confusion matrix, we take class “1” as the example.
GT_1 GT_2 GT_3 GT_4 GT_5 GT_6 GT_7 GT_8 GT_9 GT_10
Pred_1 938 0 15 5 1 19 18 5 7 11
Pred_2 0 1099 27 4 5 6 4 27 16 7
Pred_3 5 3 855 24 5 9 16 23 21 7
Pred_4 2 5 25 874 1 76 1 2 38 9
Pred_5 0 0 21 2 882 12 22 8 15 99
Pred_6 25 1 8 49 2 686 26 2 47 16
Pred_7 8 3 28 1 14 26 869 0 16 1
Pred_8 1 2 14 20 1 13 0 906 10 37
Pred_9 1 22 34 27 6 36 2 8 779 9
Pred_10 0 0 5 4 65 9 0 47 25 813
TP = 938 Number of True Positive 𝑇𝑃 938
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = = = 0.92
FP = 0 + 15 + 5 + … + 11 = 81 𝑇𝑃 + 𝐹𝑃 938 + 81
TN = 1099 + 27 + 4 + … + 25 + 813 = 8939 𝑇𝑃 938
FN = 0 + 5 + 2 + … + 1 + 0 = 42 𝑅𝑒𝑐𝑎𝑙𝑙 = = = 0.957
𝑇𝑃 + 𝐹𝑁 938 + 42
Deep Learning for Computer Vision 22
Example 2.5 Train the LeNet Network
Let’s use the class “4” as the example.
GT_1 GT_2 GT_3 GT_4 GT_5 GT_6 GT_7 GT_8 GT_9 GT_10
Pred_1 938 0 15 5 1 19 18 5 7 11
Pred_2 0 1099 27 4 5 6 4 27 16 7
Pred_3 5 3 855 24 5 9 16 23 21 7
Pred_4 2 5 25 874 1 76 1 2 38 9
Pred_5 0 0 21 2 882 12 22 8 15 99
Pred_6 25 1 8 49 2 686 26 2 47 16
Pred_7 8 3 28 1 14 26 869 0 16 1
Pred_8 1 2 14 20 1 13 0 906 10 37
Pred_9 1 22 34 27 6 36 2 8 779 9
Pred_10 0 0 5 4 65 9 0 47 25 813
TP = 874 Number of True Positive 𝑇𝑃 874
FP = 2 + 5 + 25 + … + 38 = 159 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = = = 0.846
𝑇𝑃 + 𝐹𝑃 874 + 159
TN = 938 + 0 + 15 + … + 25 + 813 = 8831 𝑇𝑃 874
FN = 5 + 4 + 24 + … + 27 + 4 = 136 𝑅𝑒𝑐𝑎𝑙𝑙 = = = 0.865
𝑇𝑃 + 𝐹𝑁 874 + 136
Deep Learning for Computer Vision 23
Example 2.5 Train the LeNet Network
TP, FP, TN, FN, Precision and Recall distributions in each class.
No. TP No. FP No. TN No. FN Precision Recall
Class 1 938 81 8939 42 0.921 0.957
Class 2 1099 96 8769 36 0.920 0.968
Class 3 855 113 8855 177 0.883 0.828
Class 4 874 159 8831 136 0.846 0.865
Class 5 882 179 8839 100 0.831 0.898
Class 6 686 176 8932 206 0.796 0.769
Class 7 869 97 8945 89 0.900 0.907
Class 8 906 98 8874 122 0.902 0.881
Class 9 779 145 8881 195 0.843 0.800
Class 10 813 155 8836 196 0.840 0.806
Conv12
Conv11
Conv11
Pool1
Pool1
Pool1
Pool1
FC2
FC2
FC1
FC1
Deep Learning for Computer Vision
LeNet (1998) Modified LeNet 25