How CNNs Work
How CNNs Work
How CNNs Work
CNN
X
CNN
O
Trickier cases
CNN
X
translation scaling rotation weight
CNN
O
Deciding is hard
?
What computers see
-1
-1
-1
-1
-1
-1
-1
1
-1
-1
-1
-1
-1
-1
1
-1
-1
-1
-1
-1
-1
1
-1
1
-1
-1
-1
-1
1
-1
-1
-1
-1
1
-1
1
-1
-1
1
-1
-1
-1
-1
1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
? -1
-1
-1
-1
-1
-1
-1
-1
1
-1
-1
-1
-1
-1
-1
1
-1
-1
-1
-1
-1
1
-1
1
-1
-1
-1
-1
1
-1
-1
-1
1
1
-1
1
-1
1
-1
-1
-1
1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1
-1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
What computers see
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 X -1 -1 -1 -1 X X -1
-1 X X -1 -1 X X -1 -1
-1 -1 X 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 X -1 -1
-1 -1 X X -1 -1 X X -1
-1 X X -1 -1 -1 -1 X -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
Computers are literal
x
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 -1 1 -1 1 1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1
-1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
ConvNets match pieces of the image
=
=
Features match pieces of the image
1 -1 -1 1 -1 1 -1 -1 1
-1 1 -1 -1 1 -1 -1 1 -1
-1 -1 1 1 -1 1 1 -1 -1
1 -1 -1 1 -1 1 -1 -1 1
-1 1 -1 -1 1 -1 -1 1 -1
-1 -1 1 1 -1 1 1 -1 -1
1 -1 -1 1 -1 1 -1 -1 1
-1 1 -1 -1 1 -1 -1 1 -1
-1 -1 1 1 -1 1 1 -1 -1
1 -1 -1 1 -1 1 -1 -1 1
-1 1 -1 -1 1 -1 -1 1 -1
-1 -1 1 1 -1 1 1 -1 -1
1 -1 -1 1 -1 1 -1 -1 1
-1 1 -1 -1 1 -1 -1 1 -1
-1 -1 1 1 -1 1 1 -1 -1
1 -1 -1 1 -1 1 -1 -1 1
-1 1 -1 -1 1 -1 -1 1 -1
-1 -1 1 1 -1 1 1 -1 -1
Filtering: The math behind the match
1 -1 -1
-1 1 -1
-1 -1 1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
Filtering: The math behind the match
1. Line up the feature and the image patch.
2. Multiply each image pixel by the corresponding
feature pixel.
3. Add them up.
4. Divide by the total number of pixels in the feature.
Filtering: The math behind the match
1 -1 -1
-1 1 -1 1 x 1 =1
-1 -1 1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
Filtering: The math behind the match
1 -1 -1
-1 1 -1 1 x 1 =1 1
-1 -1 1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
Filtering: The math behind the match
1 -1 -1
-1 1 -1 -1 x -1 = 1 1 1
-1 -1 1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
Filtering: The math behind the match
1 -1 -1
-1 1 -1 -1 x -1 = 1
1 1 1
-1 -1 1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
Filtering: The math behind the match
1 -1 -1
-1 1 -1 -1 x -1 = 1
1 1 1
-1 -1 1
1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
Filtering: The math behind the match
1 -1 -1
-1 1 -1 1 x 1 =1 1 1 1
-1 -1 1
1 1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
Filtering: The math behind the match
1 -1 -1
-1 1 -1 -1 x -1 = 1
1 1 1
-1 -1 1
1 1 1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
Filtering: The math behind the match
1 -1 -1
-1 1 -1 -1 x -1 = 1
1 1 1
-1 -1 1
1 1 1
-1 -1 -1 -1 -1 -1 -1 -1 -1 1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
Filtering: The math behind the match
1 -1 -1
-1 1 -1 -1 x -1 = 1
1 1 1
-1 -1 1
1 1 1
-1 -1 -1 -1 -1 -1 -1 -1 -1 1 1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
Filtering: The math behind the match
1 -1 -1
-1 1 -1 1 x 1 =1 1 1 1
-1 -1 1
1 1 1
-1 -1 -1 -1 -1 -1 -1 -1 -1 1 1 1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
Filtering: The math behind the match
1 -1 -1
1 1 1
-1 1 -1 1+1+1+1+1+1+1+1+1
1 1 1 9
=1
-1 -1 1
1 1 1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 1 -1 -1 -1 1 -1 -1 1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
Filtering: The math behind the match
1 -1 -1
-1 1 -1 1 x 1 =1 1
-1 -1 1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
Filtering: The math behind the match
1 -1 -1
-1 1 -1 -1 x 1 = -1 1 1 -1
-1 -1 1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
Filtering: The math behind the match
1 -1 -1
-1 1 -1 1 1 -1
-1 -1 1
1 1 1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
Filtering: The math behind the match
1 -1 -1 1 1 -1
-1 1 -1 1+1−1+1+1+1−1+1+1
-1 -1 1
1 1 1 9
= .55
-1 1 1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 1 -1 -1 -1 1 -1 -1 1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1 .55
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
Convolution: Trying every possible match
1 -1 -1
-1 1 -1
-1 -1 1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1 0.77 -0.11 0.11 0.33 0.55 -0.11 0.33
-1 -1 -1 -1 -1 -1 -1 -1 -1
Convolution: Trying every possible match
=
0.11 -0.11 1.00 -0.33 0.11 -0.11 0.55
-1 -1 -1 1 -1 1 -1 -1 -1 1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 0.33 0.33 -0.33 0.55 -0.33 0.33 0.33
-1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 1
0.55 -0.11 0.11 -0.33 1.00 -0.11 0.11
-1 -1 1 -1 -1 -1 1 -1 -1
-0.11 0.11 -0.11 0.33 -0.11 1.00 -0.11
-1 1 -1 -1 -1 -1 -1 1 -1
0.33 -0.11 0.55 0.33 0.11 -0.11 0.77
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 0.77 -0.11 0.11 0.33 0.55 -0.11 0.33
-1 1 -1 -1 -1 -1 -1 1 -1
=
-0.11 1.00 -0.11 0.33 -0.11 0.11 -0.11
-1
-1
-1
-1
1
-1
-1
1
-1
-1
-1
1
1
-1
-1
-1
-1
-1
1 -1 -1 0.11 -0.11 1.00 -0.33 0.11 -0.11 0.55
-1
-1
-1
-1
-1
-1
-1
1
1
-1
-1
1
-1
-1
-1
-1
-1
-1
-1 1 -1 0.33 0.33 -0.33 0.55 -0.33 0.33 0.33
=
-1
-1
-1
-1
1
-1
-1
1
-1
-1
-1
1
1
-1
-1
-1
-1
-1
1 -1 1 0.11 -0.55 0.55 -0.77 0.55 -0.55 0.11
-1
-1
-1
-1
-1
-1
-1
1
1
-1
-1
1
-1
-1
-1
-1
-1
-1
-1 1 -1 -0.11 0.33 -0.77 1.00 -0.77 0.33 -0.11
-1 1 -1 -1 -1 -1 -1 1 -1
=
-0.11 0.11 -0.11 0.33 -0.11 1.00 -0.11
-1
-1
-1
-1
1
-1
-1
1
-1
-1
-1
1
1
-1
-1
-1
-1
-1
-1 -1 1 0.55 -0.11 0.11 -0.33 1.00 -0.11 0.11
-1
-1
-1
-1
-1
-1
-1
1
1
-1
-1
1
-1
-1
-1
-1
-1
-1
-1 1 -1 0.33 0.33 -0.33 0.55 -0.33 0.33 0.33
-1
-1
-1
-1
-1
-1
-1
1
1
-1
-1
1
-1
-1
-1
-1
-1
-1
-1 1 -1 -0.11 0.33 -0.77 1.00 -0.77 0.33 -0.11
0.55 -0.11 0.11 -0.33 1.00 -0.11 0.11 0.55 0.33 1.00 0.11
-0.11 0.11 -0.11 0.33 -0.11 1.00 -0.11
0.33 0.55 0.11 0.77
0.33 -0.11 0.55 0.33 0.11 -0.11 0.77
0.11 -0.55 0.55 -0.77 0.55 -0.55 0.11 0.55 0.55 0.55 0.11
-0.55 0.55 -0.55 0.33 -0.55 0.55 -0.55
0.33 0.11 0.11 0.33
0.33 -0.55 0.11 -0.11 0.11 -0.55 0.33
0.11 -0.11 1.00 -0.33 0.11 -0.11 0.55 1.00 1.00 0.11 0.55
0.55 -0.11 0.11 -0.33 1.00 -0.11 0.11 0.55 0.33 1.00 0.11
-0.11 0.11 -0.11 0.33 -0.11 1.00 -0.11
0.33 0.55 0.11 0.77
0.33 -0.11 0.55 0.33 0.11 -0.11 0.77
0.11 -0.55 0.55 -0.77 0.55 -0.55 0.11 0.55 0.55 0.55 0.11
-0.55 0.55 -0.55 0.33 -0.55 0.55 -0.55
0.33 0.11 0.11 0.33
0.33 -0.55 0.11 -0.11 0.11 -0.55 0.33
0.11 -0.11 1.00 -0.33 0.11 -0.11 0.55 1.00 1.00 0.11 0.55
0.77 -0.11 0.11 0.33 0.55 -0.11 0.33 0.77 0 0.11 0.33 0.55 0 0.33
0.77 -0.11 0.11 0.33 0.55 -0.11 0.33 0.77 0 0.11 0.33 0.55 0 0.33
-0.11 1.00 -0.11 0.33 -0.11 0.11 -0.11 0 1.00 0 0.33 0 0.11 0
0.11 -0.11 1.00 -0.33 0.11 -0.11 0.55 0.11 0 1.00 0 0.11 0 0.55
0.33 0.33 -0.33 0.55 -0.33 0.33 0.33 0.33 0.33 0 0.55 0 0.33 0.33
0.55 -0.11 0.11 -0.33 1.00 -0.11 0.11 0.55 0 0.11 0 1.00 0 0.11
-0.11 0.11 -0.11 0.33 -0.11 1.00 -0.11 0 0.11 0 0.33 0 1.00 0
0.33 -0.11 0.55 0.33 0.11 -0.11 0.77 0.33 0 0.55 0.33 0.11 0 0.77
ReLU layer
A stack of images becomes a stack of images with no
negative values.
0.77 -0.11 0.11 0.33 0.55 -0.11 0.33
0.77 0 0.11 0.33 0.55 0 0.33
0.33 0.33 -0.33 0.55 -0.33 0.33 0.33 0.33 0.33 0 0.55 0 0.33 0.33
0.55 -0.11 0.11 -0.33 1.00 -0.11 0.11 0.55 0 0.11 0 1.00 0 0.11
-0.11 0.11 -0.11 0.33 -0.11 1.00 -0.11 0 0.11 0 0.33 0 1.00 0
0.33 -0.11 0.55 0.33 0.11 -0.11 0.77 0.33 0 0.55 0.33 0.11 0 0.77
0.33 -0.55 0.11 -0.11 0.11 -0.55 0.33 0.33 0 0.11 0 0.11 0 0.33
-0.55 0.55 -0.55 0.33 -0.55 0.55 -0.55 0 0.55 0 0.33 0 0.55 0
0.11 -0.55 0.55 -0.77 0.55 -0.55 0.11 0.11 0 0.55 0 0.55 0 0.11
-0.11 0.33 -0.77 1.00 -0.77 0.33 -0.11 0 0.33 0 1.00 0 0.33 0
0.11 -0.55 0.55 -0.77 0.55 -0.55 0.11 0.11 0 0.55 0 0.55 0 0.11
-0.55 0.55 -0.55 0.33 -0.55 0.55 -0.55 0 0.55 0 0.33 0 0.55 0
0.33 -0.55 0.11 -0.11 0.11 -0.55 0.33 0.33 0 0.11 0 0.11 0 0.33
0.33 -0.11 0.55 0.33 0.11 -0.11 0.77 0.33 0 0.55 0.33 0.11 0 0.77
-0.11 0.11 -0.11 0.33 -0.11 1.00 -0.11 0 0.11 0 0.33 0 1.00 0
0.55 -0.11 0.11 -0.33 1.00 -0.11 0.11 0.55 0 0.11 0 1.00 0 0.11
0.33 0.33 -0.33 0.55 -0.33 0.33 0.33 0.33 0.33 0 0.55 0 0.33 0.33
0.11 -0.11 1.00 -0.33 0.11 -0.11 0.55 0.11 0 1.00 0 0.11 0 0.55
-0.11 1.00 -0.11 0.33 -0.11 0.11 -0.11 0 1.00 0 0.33 0 0.11 0
0.77 -0.11 0.11 0.33 0.55 -0.11 0.33 0.77 0 0.11 0.33 0.55 0 0.33
Layers get stacked
The output of one becomes the input of the next.
1.00 0.33 0.55 0.33
Convolution
0.33 1.00 0.33 0.55
Pooling
ReLU
-1 1 -1 -1 -1 -1 -1 1 -1 0.33 0.55 0.11 0.77
-1 -1 1 -1 -1 -1 1 -1 -1
0.55 0.33 0.55 0.33
-1 -1 -1 1 -1 1 -1 -1 -1
0.33 1.00 0.55 0.11
-1 -1 -1 -1 1 -1 -1 -1 -1
0.55 0.55 0.55 0.11
-1 -1 -1 1 -1 1 -1 -1 -1
0.33 0.11 0.11 0.33
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1 0.33 0.55 1.00 0.77
1.00 0.55
-1 -1 -1 -1 -1 -1 -1 -1 -1
Convolution
Convolution
Convolution
0.55 1.00
-1 1 -1 -1 -1 -1 -1 1 -1
Pooling
Pooling
ReLU
ReLU
ReLU
-1 -1 1 -1 -1 -1 1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1 1.00 0.55
-1 -1 -1 -1 1 -1 -1 -1 -1
0.55 0.55
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1 0.55 1.00
-1 -1 -1 -1 -1 -1 -1 -1 -1
1.00 0.55
Fully connected layer
Every value gets a vote
1.00
0.55
1.00
0.55
0.55 1.00
0.55
1.00 0.55
1.00
1.00
0.55
Fully connected layer
Vote depends on how strongly a value predicts X or O
1.00
X
0.55
0.55
1.00
1.00
0.55
0.55
O
0.55
0.55
1.00
1.00
0.55
Fully connected layer
Vote depends on how strongly a value predicts X or O
0.55
X
1.00
1.00
0.55
0.55
0.55
0.55
O
0.55
1.00
0.55
0.55
1.00
Fully connected layer
Future values vote on X or O
0.9
X
0.65
0.45
0.87
0.96
0.73
0.23
O
0.63
0.44
0.89
0.94
0.53
Fully connected layer
Future values vote on X or O
0.9
X
0.65
0.45
0.87
0.96
0.73
0.23
O
0.63
0.44
0.89
0.94
0.53
Fully connected layer
Future values vote on X or O
0.9
X
0.65
0.45
0.87
.92
0.96
0.73
0.23
O
0.63
0.44
0.89
0.94
0.53
Fully connected layer
Future values vote on X or O
0.9
X
0.65
0.45
0.87
.92
0.96
0.73
0.23
O
0.63
0.44
0.89
0.94
0.53
Fully connected layer
Future values vote on X or O
0.9
X
0.65
0.45
0.87
.92
0.96
0.73
0.23
O .51
0.63
0.44
0.89
0.94
0.53
Fully connected layer
Future values vote on X or O
0.9
X
0.65
0.45
0.87
.92
0.96
0.73
0.23
O .51
0.63
0.44
0.89
0.94
0.53
Fully connected layer
A list of feature values becomes a list of votes.
0.9
X
0.65
0.45
0.87
0.96
0.73
0.23
O
0.63
0.44
0.89
0.94
0.53
Fully connected layer
These can also be stacked.
0.9
X
0.65
0.45
0.87
0.96
0.73
0.23
O
0.63
0.44
0.89
0.94
0.53
Putting it all together
A set of pixels becomes a set of votes.
.92
Convolution
Convolution
Convolution
connected
connected
Pooling
Pooling
ReLU
ReLU
ReLU
-1 -1 -1 -1 -1 -1 -1 -1 -1
Fully
Fully
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
O
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
.51
Learning
Q: Where do all the magic numbers come from?
Features in convolutional layers
Voting weights in fully connected layers
A: Backpropagation
Backprop
Error = right answer – actual answer
.92
Convolution
Convolution
Convolution
connected
connected
Pooling
Pooling
ReLU
ReLU
ReLU
-1 -1 -1 -1 -1 -1 -1 -1 -1
Fully
Fully
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
O
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
.51
Backprop Right answer Actual answer Error
X 1
O
.92
Convolution
Convolution
Convolution
connected
connected
Pooling
Pooling
ReLU
ReLU
ReLU
-1 -1 -1 -1 -1 -1 -1 -1 -1
Fully
Fully
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
O
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
.51
Backprop Right answer Actual answer Error
X 1 0.92
O
.92
Convolution
Convolution
Convolution
connected
connected
Pooling
Pooling
ReLU
ReLU
ReLU
-1 -1 -1 -1 -1 -1 -1 -1 -1
Fully
Fully
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
O
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
.51
Backprop Right answer Actual answer Error
X 1 0.92 0.08
O
.92
Convolution
Convolution
Convolution
connected
connected
Pooling
Pooling
ReLU
ReLU
ReLU
-1 -1 -1 -1 -1 -1 -1 -1 -1
Fully
Fully
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
O
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
.51
Backprop Right answer Actual answer Error
X 1 0.92 0.08
O 0 0.51 0.49
.92
Convolution
Convolution
Convolution
connected
connected
Pooling
Pooling
ReLU
ReLU
ReLU
-1 -1 -1 -1 -1 -1 -1 -1 -1
Fully
Fully
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
O
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
.51
Backprop Right answer Actual answer Error
X 1 0.92 0.08
O 0 0.51 0.49
Total 0.57
.92
Convolution
Convolution
Convolution
connected
connected
Pooling
Pooling
ReLU
ReLU
ReLU
-1 -1 -1 -1 -1 -1 -1 -1 -1
Fully
Fully
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
O
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
.51
Gradient descent
For each feature pixel
and voting weight,
adjust it up and down
error
a bit and see how the
error changes.
weight
Gradient descent
For each feature pixel
and voting weight,
adjust it up and down
error
a bit and see how the
error changes.
weight
Hyperparameters (knobs)
Convolution
Number of features
Size of features
Pooling
Window size
Window stride
Fully Connected
Number of neurons
Architecture
How many of each type of layer?
In what order?
Not just images
Any 2D (or 3D) data.
Things closer together are more closely related than
things far away.
Images
Columns of pixels
Rows of pixels
Time steps
Intensity in each
frequency band
Sound
Text
Position in
sentence
dictionary
Words in
Limitations
ConvNets only capture local “spatial” patterns in data.
If the data can’t be made to look like an image,
ConvNets are less useful.
Name, age,
Customer data address, email,
purchases,
browsing activity,…
A 22 1A a@a 1 aa a1.a 123 aa1
Customers
B 33 2B b@b 2 bb b2.b 234 bb2