DL 4
DL 4
DL 4
Overview
● Definition: CNNs are a class of deep neural networks specifically designed for
processing structured grid data, such as images. They are highly effective for image
recognition, object detection, and other visual tasks.
● Inspiration: CNNs were inspired by the organization of the animal visual cortex,
where individual neurons respond to specific regions of visual stimuli.
CNN Architecture
1. Convolutional Layers:
○ The core building blocks of a CNN are the convolutional layers. These layers
apply filters (kernels) that scan across the input image, detecting features
like edges, textures, and patterns.
○ Each filter creates a feature map by applying convolution across the image,
capturing different aspects of the input.
○ Mathematical Operation: Convolution is a mathematical operation where
the filter is applied to local regions of the input, computing a weighted sum
(dot product) that forms a new pixel value in the feature map.
2. Activation Function:
○ Typically, ReLU (Rectified Linear Unit) is applied after each convolutional
operation to introduce non-linearity.
○ This non-linearity allows the network to learn complex patterns and
hierarchies in the data.
3. Pooling Layers:
○ Pooling layers reduce the spatial size of feature maps, effectively
downsampling the data to reduce the number of parameters and
computations.
○ Max Pooling is the most common pooling method, where the maximum
value is selected from a window of values, capturing the most prominent
feature within that window.
1
Unit 4 Notes
● Parameter Sharing: Filters are shared across different parts of the input, reducing
the number of parameters and enhancing computational efficiency.
● Translation Invariance: By learning spatial hierarchies of features, CNNs are
robust to minor translations or shifts in the input data.
● Hierarchical Feature Learning: Lower layers learn low-level features (e.g., edges),
while higher layers learn more complex, high-level features (e.g., shapes or objects).
Applications
2. LeNet
● Overview: LeNet-5, developed by Yann LeCun in 1998, is one of the earliest CNN
architectures, primarily designed for handwritten digit recognition (e.g., MNIST
dataset).
● Architecture:
○ Consists of 7 layers, including two convolutional layers and two fully
connected layers.
○ Utilizes tanh as the activation function, which was common at the time.
● Significance: LeNet demonstrated the effectiveness of CNNs for image processing
and is considered foundational in the development of deep learning for computer
vision.
2
Unit 4 Notes
3. AlexNet
● Overview: ZF-Net, created by Matthew Zeiler and Rob Fergus, won the ILSVRC 2013
competition, improving upon AlexNet.
● Architecture:
○ Similar structure to AlexNet but with optimized filters and smaller
receptive fields in the early layers.
○ Introduced deconvolutional layers for visualizing the learned features and
understanding how CNNs process images.
● Significance: ZF-Net helped in understanding CNNs better and laid the groundwork
for techniques in visualizing network behavior.
5. VGGNet
● Overview: VGGNet, developed by the Visual Geometry Group at Oxford, was known
for its simplicity and uniform architecture.
● Architecture:
3
Unit 4 Notes
○ Consists of very deep networks (16 or 19 layers) with a simple design of 3x3
convolutional filters stacked multiple times.
○ Achieved great performance in image classification by increasing network
depth.
● Impact: VGGNet showed that deep networks with small, uniform filters could
achieve state-of-the-art results. It’s still popular for feature extraction and transfer
learning.
4
Unit 4 Notes
Overview
Key Takeaways