DLQ Eyelashes
DLQ Eyelashes
Summary 3
Questions and Answers 3
General Deep Learning 3
Image Classification 21
Object Detection 22
Image Segmentation 23
GANs 26
Miscellaneous 26
CVDL Master Program 36
Question 3 [EASY]: What is the role of activation functions in neural networks, and
can you name a few common ones?
● Sigmoid
● Leaky ReLU
CNNs consist of convolutional layers, pooling layers, and fully connected layers.
The convolutional layers apply filters to the input data, capturing local features,
while pooling layers reduce spatial dimensions, helping to control overfitting. As a
result, CNNs have succeeded highly in image classification, object detection, and
semantic segmentation tasks.
Question 5 [EASY]: What is the purpose of loss functions, and can you name a few
common ones?
Question 6 [EASY]: What is the role of optimizers in training neural networks, and
can you name a few common ones?
Answer: Optimizers are algorithms used to update the weights and biases of a
neural network to minimize the loss function. They play a critical role in
determining the speed and effectiveness of model training. Some standard
optimizers include:
● Gradient Descent
● Stochastic Gradient Descent (SGD)
● Momentum
● AdaGrad
● RMSprop
● Adam
Question 7 [EASY]: What are some popular pre-trained deep learning models and
their applications?
Answer: Pre-trained models are deep learning models already trained on large
datasets and can be fine-tuned for specific tasks, leveraging transfer learning.
Some popular pre-trained models and their applications include:
● VGG, ResNet, Inception, and DenseNet for image classification and object
detection
● U-Net for image segmentation
Question 9 [EASY]: How can you apply data augmentation techniques to improve
the performance of a deep learning model?
Answer: Data augmentation is a technique used to increase the diversity and size
of the training dataset by applying random transformations to the input images.
Some common types of data augmentation used in computer vision tasks
include:
Question 12 [EASY]: What are some popular deep learning frameworks used for
computer vision tasks?
● Caffe: A deep learning framework the Berkeley Vision and Learning Center
developed, focusing on image classification and convolutional networks.
● MXNet: A flexible and efficient deep learning library that supports multiple
programming languages and platforms for various machine learning
tasks.
Question 13 [EASY]: What are some challenges and limitations of deep learning for
computer vision tasks?
Answer: Overfitting occurs when a model learns the noise in the training data
instead of the underlying pattern, resulting in poor performance on unseen data.
● Zero initialization
● Random initialization (uniform or normal distribution)
● Xavier/Glorot initialization
● He initialization
● LeCun initialization
Question 19 [MODERATE]: What is early stopping, and how does it help prevent
overfitting in deep learning models?
Answer: Pooling layers are used in CNNs to reduce the spatial dimensions of
feature maps while retaining important information. They apply a downsampling
operation, such as max pooling or average pooling, on non-overlapping regions
of the feature maps. Pooling layers provide several benefits, including:
Answer: Handling noisy or missing data is crucial to ensure deep learning models
can generalize well and provide reliable predictions. Some strategies to deal with
such data include:
● Mean Squared Error (MSE) and Mean Absolute Error (MAE): These metrics
measure the difference between the model's predictions and target values
in regression tasks.
Answer: Convolutional layers are the primary building blocks of CNNs and are
responsible for learning local patterns and features in images. They perform
convolution operations by sliding a set of filters or kernels across the input image,
resulting in feature maps that capture spatial and hierarchical information.
Convolutional layers enable the model to learn features at different scales,
orientations, and positions, providing a robust and translation-invariant
representation for various computer vision tasks.
Question 27 [MODERATE]: What are some methods for reducing overfitting in deep
learning models for computer vision?
Answer: Overfitting occurs when a deep learning model learns to perform very
well on the training data but fails to generalize to new, unseen data. Some
methods for reducing overfitting in computer vision models include:
Question 30 [MODERATE]: How does active learning help reduce the labeling effort
in deep learning for computer vision tasks?
Question 31 [MODERATE]: What are some methods for improving the efficiency of
deep learning models for computer vision tasks, especially for deployment on
resource-constrained devices?
Answer: Several methods can be used to improve the efficiency of deep learning
models for computer vision tasks, particularly when deploying on
resource-constrained devices like mobile phones or embedded systems. Some
popular methods include:
Question 32 [MODERATE]: What are some methods for handling class imbalance
in deep learning for computer vision tasks?
Question 33 [MODERATE]: What is the role of synthetic data in deep learning for
computer vision tasks?
In deep learning for computer vision, synthetic data can be used for various
purposes, such as:
While synthetic data can be a valuable resource in deep learning for computer
vision tasks, ensuring that the generated data represents real-world data and
that the model is balanced with the synthetic data's specific characteristics is
essential. Techniques like domain randomization or mixing synthetic and real
Answer: The development and deployment of deep learning models for computer
vision tasks come with several ethical considerations, including:
● Bias and fairness: Models can learn biases in the training data, leading to
unfair treatment of certain groups or individuals. Ensuring that the training
data is representative and diverse and that the model's performance is
evaluated across different demographic groups is crucial.
● Weight sharing: Using the same weights for multiple connections in the
model can reduce the number of unique parameters and memory
requirements.
Image Classification
Question 38 [MODERATE]: How do you handle a class imbalance in deep learning
problems?
Answer: Class imbalance occurs when certain classes in the dataset have
significantly fewer samples than others, leading to biased model predictions.
Some techniques to handle class imbalance include:
● Using data augmentation to generate more samples for the minority class
Object Detection
Question 39 [EASY]: What is the difference between object detection and object
recognition in computer vision?
Question 40 [EASY]: What are some popular object detection algorithms used in
computer vision?
● YOLO (You Only Look Once) and its variants (YOLOv2, YOLOv3, YOLOv4,
YOLOv5)
● RetinaNet
● EfficientDet
Question 41 [EASY]: What are anchor boxes in object detection, and what is their
purpose?
Answer: Anchor boxes, priors, or default boxes are pre-defined bounding boxes of
different shapes, sizes, and aspect ratios used in object detection algorithms like
YOLO and SSD. They aim to provide a set of initial reference boxes that adjust
during training to match the ground truth bounding boxes more closely. Anchor
boxes help the model learn to predict bounding boxes more efficiently by
Image Segmentation
Question 43 [EASY]: What is the difference between semantic segmentation and
instance segmentation in computer vision?
Question 44 [MODERATE]: What are dilated convolutions, and how can they help
capture larger contextual information in deep learning models?
Answer: Region Proposal Networks (RPNs) are a key Faster R-CNN object detection
algorithm component. They generate a set of candidate object bounding boxes
or region proposals, which are then passed to a classifier and regressor to predict
the object class and refine the bounding box coordinates. RPNs are fully
convolutional networks that leverage the feature maps of a CNN backbone to
efficiently generate region proposals in a sliding window fashion, significantly
speeding up the object detection process compared to earlier methods like
R-CNN and Fast R-CNN.
Question 47 [MODERATE]: What are U-Net and Mask R-CNN, and how do their
semantic and instance segmentation approaches differ?
On the other hand, Mask R-CNN is an extension of the Faster R-CNN object
detection algorithm, designed for instance segmentation. It adds a branch to the
Faster R-CNN architecture to predict binary masks for each object instance, class
labels, and bounding box coordinates. Mask R-CNN is used in instance
segmentation tasks, where the goal is to label each pixel with its corresponding
class and distinguish between different instances of the same object class.
Question 48 [DIFFICULT]: What are skip connections in deep learning models, and
how do they improve performance in segmentation tasks?
Answer: Spatial pyramid pooling (SPP) is a technique used in CNNs to enable the
processing of input images of varying sizes and aspect ratios. SPP divides the
feature map generated by the convolutional layers into a fixed number of
sub-regions at different scales, pooling the features within each sub-region. The
pooled features are then concatenated to form a fixed-length representation,
which can be fed into the subsequent fully connected layers or classifiers. SPP
allows CNNs to handle images of different sizes without resizing or cropping,
improving performance in object detection and recognition tasks.
Miscellaneous
Question 51 [EASY]: What is optical flow, and how is it used in computer vision?
Answer: Optical flow is the apparent motion of objects, surfaces, and edges in a
sequence of images or video frames caused by the relative motion between the
camera and the scene. It is used in computer vision tasks such as motion
estimation, object tracking, and video stabilization. Optical flow algorithms
estimate the motion vectors for each pixel in the image, which can be used to
understand the dynamics of the scene and predict the future positions of objects
or points of interest.
Question 52 [MODERATE]: What are Siamese networks, and how are they used in
computer vision tasks?
Question 53 [EASY]: Explain the concept of Recurrent Neural Networks (RNNs) and
their applications.
RNNs suffer from vanishing and exploding gradient problems, which have led to
the development of more advanced architectures like LSTM and GRU.
Question 54 [EASY]: What are Long Short-Term Memory (LSTM) networks, and how
do they differ from regular RNNs?
Answer: Long Short-Term Memory (LSTM) networks are a particular type of RNN
designed to address the vanishing and exploding gradient problems that hinder
the training of standard RNNs. LSTMs introduce a memory cell and three gating
mechanisms (input, output, and forget gates) that regulate the flow of
information within the cell. This design allows LSTMs to learn and remember
long-range dependencies more effectively than regular RNNs, making them
suitable for tasks like machine translation, text generation, and sentiment
analysis.
Answer: In supervised learning, models are trained using labeled data, where
each input sample is associated with a corresponding output label or target.
Supervised learning is the most common approach in deep learning for computer
vision tasks, such as image classification, object detection, and semantic
segmentation.
On the other hand, unsupervised learning involves training models using only the
input data without any corresponding output labels. The goal is to learn the
underlying structure or patterns in the data. Unsupervised learning methods, such
as clustering and dimensionality reduction, can be applied in computer vision
tasks like image segmentation, feature learning, and data compression.
Answer: One-shot and few-shot learning are both types of learning problems that
deal with small amounts of labeled data.
In one-shot learning, the model must learn to recognize new objects or classes
based on just one or very few examples, while few-shot learning involves learning
from a slightly larger but still limited number of examples, usually less than ten
per class. Both paradigms require models to generalize effectively from scarce
data and often rely on techniques like transfer learning, meta-learning, or
memory-augmented neural networks.
Question 60 [MODERATE]: What are word embeddings, and how are they used in
natural language processing (NLP) tasks?
Answer: The Transformer architecture, introduced in the paper "Attention is All You
Need" by Vaswani et al., is a deep learning model that relies solely on
self-attention mechanisms instead of traditional recurrent or convolutional layers.
It has significantly impacted NLP due to its ability to capture long-range
dependencies more effectively and its highly parallelizable structure, which
enables faster training. As a result, the Transformer has become the foundation
for many state-of-the-art models like BERT, GPT, and T5, which have advanced
the performance of various NLP tasks, including machine translation, text
classification, and question-answering.
Question 62 [MODERATE]: What is a seq2seq model, and what are its applications?
Question 63 [MODERATE]: What is an image captioning task, and what are some
deep learning approaches to solve it?
In deep learning for computer vision, RL can be applied using a deep neural
network, such as a CNN, as a function approximator for the agent's policy or value
function. The network can be trained using RL algorithms like Q-learning or policy
gradients to learn the optimal actions based on the visual input from the
environment. This combination of deep learning and reinforcement learning,
known as deep reinforcement learning (DRL), has succeeded in several computer
vision-related tasks, including self-driving cars, robot control, and video game
playing.
Question 69 [MODERATE]: What is the role of capsule networks in deep learning for
computer vision tasks?
In deep learning for computer vision tasks, capsule networks aim to address
some of the limitations of CNNs, such as the lack of explicit spatial relationships
between features and the difficulty in modeling viewpoint invariance. Capsule
networks have shown promising results in tasks like object recognition and
segmentation, but their performance and scalability are still active research
areas. Nevertheless, further development of capsule networks may lead to more
robust and interpretable models for computer vision tasks.
Answer: Graph neural networks (GNNs) are a type of neural network architecture
designed to process graph-structured data, which consists of nodes connected
by edges representing relationships between them. While GNNs are primarily used
in domains like social network analysis, recommendation systems, and molecular
chemistry, they can also be applied to specific computer vision tasks that involve
non-grid structured data or complex relationships between objects.
In deep learning for computer vision, GNNs can be used to model the
relationships between different regions or objects in an image or a video. For
example, in scene understanding tasks, GNNs can capture the relationships
In deep learning for computer vision tasks, adversarial training can be applied by
generating adversarial examples using the Fast Gradient Sign Method (FGSM) or
Projected Gradient Descent (PGD), which perturb the input images in a way that
maximizes the model's prediction error. These adversarial examples are then
mixed with the original training data and used to update the model's weights.
Adversarial training has been shown to improve the model's robustness against
adversarial attacks and can also lead to better performance on clean,
non-adversarial data in some cases.
● Autoencoders: Neural networks that learn to encode and decode the input
data, forcing the model to learn a compact and useful representation of
the data in the hidden layers.
Question 73 [DIFFICULT]: What are some techniques for incorporating spatial and
temporal information in deep learning models for computer vision tasks?
Answer: Spatial and temporal information is essential for many computer vision
tasks, particularly those involving video analysis or sequences of images. Some
techniques for incorporating spatial and temporal information in deep learning
models for computer vision tasks include:
By taking the program, you become part of CareerX, our Career Accelerator
Program curated to help you progress your career as an AI Professional.