Slide 1
Slide 1
These models are fundamental in deep learning and are widely used in image recognition, speech
processing, and time-series forecasting. Let’s begin!
What is CNN?
A Convolutional Neural Network (CNN) is a deep learning model specifically designed for working with
images and videos.
Unlike traditional neural networks, CNNs automatically detect patterns and features in an image,
making them ideal for tasks like image classification, object detection, and video analysis.
CNNs process an image through different layers to extract meaningful information. The key layers are:
Input Layer
Convolution Layer
This layer detects features like edges, textures, and shapes using small filters (kernels).
These filters slide over the image, creating a feature map that highlights important
patterns.
Activation Function (ReLU - Rectified Linear Unit)
The extracted features are passed to a fully connected layer (like a traditional neural
network).
This layer makes the final classification of the image.
LeNet: A pioneering CNN architecture by Yann LeCun, designed for handwritten digit recognition
(MNIST).
AlexNet: A deep CNN that won ImageNet 2012, introducing ReLU activation and dropout for better
training.
ResNet: A deep network using residual connections (skip connections) to tackle vanishing gradients in
training.
GoogleNet: A CNN with an Inception module, optimizing computational efficiency and accuracy.
VGG: A deep CNN with uniform 3x3 convolutional layers, known for simplicity and strong feature
extractionEach model improves accuracy and efficiency in image recognition.
CNNs offer:
✅ High accuracy in image-related tasks.
✅ Efficiency by reducing manual feature extraction.
✅ Robustness in detecting objects in different conditions.
A Recurrent Neural Network (RNN) is a deep learning model designed to process sequential data, such
as text, speech, and time-series information.
Unlike CNNs, which work on images, RNNs remember past information while processing new inputs.
This makes RNNs great for tasks like speech recognition and text prediction.
Long Short-Term Memory (LSTM) is a special type of RNN designed to handle long sequences of data
without losing important information.
1. Cell State –It is the first component of LSTM which runs through the entire LSTM unit. It kind of
can be thought of as a conveyer belt.
This cell state is responsible for remembering and forgetting. This is based on the context of the
input. This means that some of the previous information should be remembered while some of
them should be forgotten and some of the new information should be added to the memory.
The first operation (X) is the pointwise operation which is nothing but multiplying the cell state
by an array of [-1, 0, 1]. The information multiplied by 0 will be forgotten by the LSTM. Another
operation is (+) which is responsible to add some new information to the state.
It does a dot product of h(t-1) and x(t) and with the help of the sigmoid layer, outputs a number
between 0 and 1 for each number in the cell state C(t-1). If the output is a ‘1’, it means we will
keep it. A ‘0’ means to forget it completely.
These models power AI applications like Google Translate, self-driving cars, and virtual assistants.
Thank you for your time! I hope you found this presentation informative. Do you have any questions?