DL 4

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Unit 4 Notes

Deep Learning : Unit 4 notes

1. Convolutional Neural Networks (CNNs)

Overview

● Definition: CNNs are a class of deep neural networks specifically designed for
processing structured grid data, such as images. They are highly effective for image
recognition, object detection, and other visual tasks.
● Inspiration: CNNs were inspired by the organization of the animal visual cortex,
where individual neurons respond to specific regions of visual stimuli.

CNN Architecture

1. Convolutional Layers:
○ The core building blocks of a CNN are the convolutional layers. These layers
apply filters (kernels) that scan across the input image, detecting features
like edges, textures, and patterns.
○ Each filter creates a feature map by applying convolution across the image,
capturing different aspects of the input.
○ Mathematical Operation: Convolution is a mathematical operation where
the filter is applied to local regions of the input, computing a weighted sum
(dot product) that forms a new pixel value in the feature map.
2. Activation Function:
○ Typically, ReLU (Rectified Linear Unit) is applied after each convolutional
operation to introduce non-linearity.
○ This non-linearity allows the network to learn complex patterns and
hierarchies in the data.
3. Pooling Layers:
○ Pooling layers reduce the spatial size of feature maps, effectively
downsampling the data to reduce the number of parameters and
computations.
○ Max Pooling is the most common pooling method, where the maximum
value is selected from a window of values, capturing the most prominent
feature within that window.

1
Unit 4 Notes

4. Fully Connected (FC) Layers:


○ After several convolutional and pooling layers, CNNs usually have one or
more fully connected layers at the end.
○ These layers take the high-level features extracted by the convolutional
layers and use them for classification or other tasks.
5. Output Layer:
○ The final layer typically uses softmax activation for classification tasks, which
outputs probabilities for each class.

Key Features of CNNs

● Parameter Sharing: Filters are shared across different parts of the input, reducing
the number of parameters and enhancing computational efficiency.
● Translation Invariance: By learning spatial hierarchies of features, CNNs are
robust to minor translations or shifts in the input data.
● Hierarchical Feature Learning: Lower layers learn low-level features (e.g., edges),
while higher layers learn more complex, high-level features (e.g., shapes or objects).

Applications

● CNNs are widely used in image classification, object detection, image


segmentation, and video processing. They are also applied to non-image tasks,
such as text and audio processing, where structured grid data is involved.

2. LeNet

● Overview: LeNet-5, developed by Yann LeCun in 1998, is one of the earliest CNN
architectures, primarily designed for handwritten digit recognition (e.g., MNIST
dataset).
● Architecture:
○ Consists of 7 layers, including two convolutional layers and two fully
connected layers.
○ Utilizes tanh as the activation function, which was common at the time.
● Significance: LeNet demonstrated the effectiveness of CNNs for image processing
and is considered foundational in the development of deep learning for computer
vision.

2
Unit 4 Notes

3. AlexNet

● Overview: AlexNet, developed by Alex Krizhevsky in 2012, marked a breakthrough


in computer vision by winning the ImageNet Large Scale Visual Recognition
Challenge (ILSVRC).
● Architecture:
○ 8 layers: Five convolutional layers followed by three fully connected layers.
○ Introduced ReLU activation instead of tanh or sigmoid, addressing the
vanishing gradient problem.
○ Used dropout as a regularization technique and data augmentation to
reduce overfitting.
● Impact: AlexNet’s success popularized deep CNNs and demonstrated that large
networks could achieve remarkable accuracy with sufficient data and computational
power.

4. ZF-Net (Zeiler and Fergus Network)

● Overview: ZF-Net, created by Matthew Zeiler and Rob Fergus, won the ILSVRC 2013
competition, improving upon AlexNet.
● Architecture:
○ Similar structure to AlexNet but with optimized filters and smaller
receptive fields in the early layers.
○ Introduced deconvolutional layers for visualizing the learned features and
understanding how CNNs process images.
● Significance: ZF-Net helped in understanding CNNs better and laid the groundwork
for techniques in visualizing network behavior.

5. VGGNet

● Overview: VGGNet, developed by the Visual Geometry Group at Oxford, was known
for its simplicity and uniform architecture.
● Architecture:

3
Unit 4 Notes

○ Consists of very deep networks (16 or 19 layers) with a simple design of 3x3
convolutional filters stacked multiple times.
○ Achieved great performance in image classification by increasing network
depth.
● Impact: VGGNet showed that deep networks with small, uniform filters could
achieve state-of-the-art results. It’s still popular for feature extraction and transfer
learning.

6. GoogLeNet (Inception Network)

● Overview: GoogLeNet (Inception v1), developed by Google, introduced the


Inception module, which allowed for efficient deep networks without increasing
the parameter count excessively.
● Architecture:
○ Inception modules: Multiple convolution filters (1x1, 3x3, 5x5) are applied in
parallel, capturing multi-scale information.
○ Auxiliary classifiers: Added at intermediate layers to ensure gradient flow
and improve network training.
● Significance: GoogLeNet achieved high accuracy with a relatively low number of
parameters, demonstrating an efficient design for deep learning.

7. ResNet (Residual Network)

● Overview: ResNet, introduced by Kaiming He et al. in 2015, tackled the problem of


vanishing gradients in very deep networks by using residual connections.
● Architecture:
○ Residual connections: These shortcuts allow the model to bypass certain
layers, helping in training very deep networks (up to 152 layers in
ResNet-152).
○ Each residual block allows gradients to flow more easily through the network.
● Impact: ResNet achieved groundbreaking results in image classification and
inspired many variations. It remains a foundational architecture for deep learning
tasks.

4
Unit 4 Notes

8. Learning Vectorial Representations of Words (Word Embeddings)

Overview

● Concept: Learning vector representations of words, or word embeddings, involves


mapping words into continuous vector spaces. Words with similar meanings are
placed closer together in this vector space.
● Techniques:
○ Word2Vec:
■ Uses Skip-gram and CBOW (Continuous Bag of Words) architectures
to learn word representations.
■ Trains embeddings by predicting context words from a target word or
vice versa.
○ GloVe (Global Vectors for Word Representation):
■ Trains word vectors by factorizing word co-occurrence matrices,
capturing global statistical information from large corpora.
○ FastText:
■ Builds upon Word2Vec but also considers subword information, which
helps in better handling rare and out-of-vocabulary words.
● Applications:
○ Natural Language Processing (NLP): Used in sentiment analysis, machine
translation, question answering, and other NLP tasks.
○ Transfer Learning: Pre-trained embeddings (like Word2Vec and GloVe) can
be used in downstream NLP models for better generalization and reduced
training time.

Key Takeaways

● Word embeddings enable NLP models to capture semantic relationships between


words, helping in tasks that rely on understanding language context.
● Embedding techniques like Word2Vec, GloVe, and FastText have become standard
for creating effective word representations.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy