DL 4

Unit 4 Notes
Deep Learning : Unit 4 notes
1. Convolutional Neural Networks (CNNs)
Overview
● Definition: CNNs are a class of deep neural networks specifically designed for
processing structured grid data, such as images. They are highly effective for image
recognition, object detection, and other visual tasks.
● Inspiration: CNNs were inspired by the organization of the animal visual cortex,
where individual neurons respond to specific regions of visual stimuli.
CNN Architecture
1. Convolutional Layers:
○ The core building blocks of a CNN are the convolutional layers. These layers
apply filters (kernels) that scan across the input image, detecting features
like edges, textures, and patterns.
○ Each filter creates a feature map by applying convolution across the image,
capturing different aspects of the input.
○ Mathematical Operation: Convolution is a mathematical operation where
the filter is applied to local regions of the input, computing a weighted sum
(dot product) that forms a new pixel value in the feature map.
2. Activation Function:
○ Typically, ReLU (Rectified Linear Unit) is applied after each convolutional
operation to introduce non-linearity.
○ This non-linearity allows the network to learn complex patterns and
hierarchies in the data.
3. Pooling Layers:
○ Pooling layers reduce the spatial size of feature maps, effectively
downsampling the data to reduce the number of parameters and
computations.
○ Max Pooling is the most common pooling method, where the maximum
value is selected from a window of values, capturing the most prominent
feature within that window.
1
Unit 4 Notes
4. Fully Connected (FC) Layers:

○ After several convolutional and pooling layers, CNNs usually have one or
more fully connected layers at the end.
○ These layers take the high-level features extracted by the convolutional
layers and use them for classification or other tasks.
5. Output Layer:
○ The final layer typically uses softmax activation for classification tasks, which
outputs probabilities for each class.
Key Features of CNNs
● Parameter Sharing: Filters are shared across different parts of the input, reducing
the number of parameters and enhancing computational efficiency.
● Translation Invariance: By learning spatial hierarchies of features, CNNs are
robust to minor translations or shifts in the input data.
● Hierarchical Feature Learning: Lower layers learn low-level features (e.g., edges),
while higher layers learn more complex, high-level features (e.g., shapes or objects).
Applications
● CNNs are widely used in image classification, object detection, image

segmentation, and video processing. They are also applied to non-image tasks,
such as text and audio processing, where structured grid data is involved.
2. LeNet
● Overview: LeNet-5, developed by Yann LeCun in 1998, is one of the earliest CNN
architectures, primarily designed for handwritten digit recognition (e.g., MNIST
dataset).
● Architecture:
○ Consists of 7 layers, including two convolutional layers and two fully
connected layers.
○ Utilizes tanh as the activation function, which was common at the time.
● Significance: LeNet demonstrated the effectiveness of CNNs for image processing
and is considered foundational in the development of deep learning for computer
vision.
2
Unit 4 Notes
3. AlexNet
● Overview: AlexNet, developed by Alex Krizhevsky in 2012, marked a breakthrough

in computer vision by winning the ImageNet Large Scale Visual Recognition
Challenge (ILSVRC).
● Architecture:
○ 8 layers: Five convolutional layers followed by three fully connected layers.
○ Introduced ReLU activation instead of tanh or sigmoid, addressing the
vanishing gradient problem.
○ Used dropout as a regularization technique and data augmentation to
reduce overfitting.
● Impact: AlexNet’s success popularized deep CNNs and demonstrated that large
networks could achieve remarkable accuracy with sufficient data and computational
power.
4. ZF-Net (Zeiler and Fergus Network)
● Overview: ZF-Net, created by Matthew Zeiler and Rob Fergus, won the ILSVRC 2013
competition, improving upon AlexNet.
● Architecture:
○ Similar structure to AlexNet but with optimized filters and smaller
receptive fields in the early layers.
○ Introduced deconvolutional layers for visualizing the learned features and
understanding how CNNs process images.
● Significance: ZF-Net helped in understanding CNNs better and laid the groundwork
for techniques in visualizing network behavior.
5. VGGNet
● Overview: VGGNet, developed by the Visual Geometry Group at Oxford, was known
for its simplicity and uniform architecture.
● Architecture:
3
Unit 4 Notes
○ Consists of very deep networks (16 or 19 layers) with a simple design of 3x3
convolutional filters stacked multiple times.
○ Achieved great performance in image classification by increasing network
depth.
● Impact: VGGNet showed that deep networks with small, uniform filters could
achieve state-of-the-art results. It’s still popular for feature extraction and transfer
learning.
6. GoogLeNet (Inception Network)
● Overview: GoogLeNet (Inception v1), developed by Google, introduced the

Inception module, which allowed for efficient deep networks without increasing
the parameter count excessively.
● Architecture:
○ Inception modules: Multiple convolution filters (1x1, 3x3, 5x5) are applied in
parallel, capturing multi-scale information.
○ Auxiliary classifiers: Added at intermediate layers to ensure gradient flow
and improve network training.
● Significance: GoogLeNet achieved high accuracy with a relatively low number of
parameters, demonstrating an efficient design for deep learning.
7. ResNet (Residual Network)
● Overview: ResNet, introduced by Kaiming He et al. in 2015, tackled the problem of

vanishing gradients in very deep networks by using residual connections.
● Architecture:
○ Residual connections: These shortcuts allow the model to bypass certain
layers, helping in training very deep networks (up to 152 layers in
ResNet-152).
○ Each residual block allows gradients to flow more easily through the network.
● Impact: ResNet achieved groundbreaking results in image classification and
inspired many variations. It remains a foundational architecture for deep learning
tasks.
4
Unit 4 Notes
8. Learning Vectorial Representations of Words (Word Embeddings)
Overview
● Concept: Learning vector representations of words, or word embeddings, involves

mapping words into continuous vector spaces. Words with similar meanings are
placed closer together in this vector space.
● Techniques:
○ Word2Vec:
■ Uses Skip-gram and CBOW (Continuous Bag of Words) architectures
to learn word representations.
■ Trains embeddings by predicting context words from a target word or
vice versa.
○ GloVe (Global Vectors for Word Representation):
■ Trains word vectors by factorizing word co-occurrence matrices,
capturing global statistical information from large corpora.
○ FastText:
■ Builds upon Word2Vec but also considers subword information, which
helps in better handling rare and out-of-vocabulary words.
● Applications:
○ Natural Language Processing (NLP): Used in sentiment analysis, machine
translation, question answering, and other NLP tasks.
○ Transfer Learning: Pre-trained embeddings (like Word2Vec and GloVe) can
be used in downstream NLP models for better generalization and reduced
training time.
Key Takeaways
● Word embeddings enable NLP models to capture semantic relationships between

words, helping in tasks that rely on understanding language context.
● Embedding techniques like Word2Vec, GloVe, and FastText have become standard
for creating effective word representations.

DL 4

Uploaded by

Copyright:

Available Formats

DL 4

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DL 4

Uploaded by

Copyright:

Available Formats

Unit 4 Notes

Deep Learning : Unit 4 notes

1. Convolutional Neural Networks (CNNs)

4. Fully Connected (FC) Layers:

Key Features of CNNs

● CNNs are widely used in image classification, object detection, image

● Overview: AlexNet, developed by Alex Krizhevsky in 2012, marked a breakthrough

4. ZF-Net (Zeiler and Fergus Network)

6. GoogLeNet (Inception Network)

● Overview: GoogLeNet (Inception v1), developed by Google, introduced the

7. ResNet (Residual Network)

● Overview: ResNet, introduced by Kaiming He et al. in 2015, tackled the problem of

8. Learning Vectorial Representations of Words (Word Embeddings)

● Concept: Learning vector representations of words, or word embeddings, involves

● Word embeddings enable NLP models to capture semantic relationships between

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.