Major Project Report (1) 91789 (1) 32

FACE MASK DETECTION USING CNN AND GABOR FILTER
A PROJECT REPORT
Submitted by
Alok Kumar Mishra (20010163)

Joyti Ranjan Behera (20010152)
Abinash Barada (20010132)
Aditya Prasad Tripathy (20010129)
In partial fulfilment for the award of the degree

Of
BACHELOR OF TECHNOLOGY
In
COMPUTER SCIENCE AND ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
C.V. RAMAN GLOBAL UNIVERSITY

BHUBNESWAR- ODISHA -752054
MAY 2024
i
C.V. RAMAN GLOBAL UNIVERSITY
BHUBANESWAR-ODISHA-752054
CERTIFICATE OF APPROVAL
This is to certify that we have examined the project entitled "Face mask
detection and recognition using CNN and Gabor filter" submitted by Alok
Kumar Mishra, Registration No.-20010163, Jyoti Ranjan Behera,
Registration No.-20010152, Abhinash Barada, Registration No.-20010132,
Aditya Prasad Tripathy, Registration No.-20010129, CGU-Odisha,
Bhubaneswar. We here by accord our approval of it as a major project work
carried out and presented in a manner required for its acceptance towards
completion of major project stage-I (8 th Semester) of Bachelor Degree of
Computer Science & Engineering for which it has been submitted. This
approval does not necessaril
-y endorse or accept every statement made, opinion expressed or conclusions
drawn as recorded in this major project, it only signifies the acceptance of the
major project for the purpose it has been submitted.
SUPERVISOR HEAD OF THE DEPARTMENT
Prof. MamtaRani Das Prof. Rojalina Priyadarshini
ii
ACKNOWLEDGEMENT
We would like to articulate our deep gratitude to our project guide
Prof. MamtaRani Das, Professor, Department of Computer
Science and Engineering, who has always been source
of motivation and firm support for carrying out the project.
We would also like to convey our sincerest gratitude and indebtedness
to all other faculty members and staff of Department of Computer
Science and Engineering, who bestowed their great effort and guidance
at appropriate times without it would have been very difficult on our
project work
An assemblage of this nature could never have been attempted with
our reference to and inspiration from the works of others whose details
are mentioned in the references section. We acknowledge our
indebtedness to all of them. Further, we would like to express our
feeling towards our parents and God who directly or indirectly
encouraged and motivated us during Assertion
iii
ABSTRACT
The global impact of the COVID-19 pandemic has been profound,

prompting urgent measures to mitigate its effects. Prior to the
discovery of a vaccine, it's imperative that we take proactive steps to
curtail the spread of the virus and prevent similar outbreaks in the
future. Among these measures, the use of face masks has emerged as a
crucial tool in limiting viral transmission. Research underscores the
effectiveness of mask-wearing in reducing the risk of contagion and
fostering a sense of security among the populace. However, manual
enforcement of mask mandates presents logistical challenges.
Enter the face mask detector, a cutting-edge solution leveraging Deep

Learning technology to ascertain whether individuals are adhering to
mask protocols. This innovative system operates seamlessly with both
static images and live video feeds. Its backbone is a sophisticated
Convolutional Neural Network (CNN) architecture meticulously
engineered to discern between masked and unmasked faces with
precision. Employing intricate algorithms grounded in convolutional
architectures, it excels in capturing nuanced pixel details.
Training the system involves the utilization of Fully Convolutional

Networks, enabling the semantic segmentation of face masks within
images. Notably, this detector boasts a streamlined structure,
facilitating rapid deployment and yielding swift outcomes. Its
applicability extends to closed-circuit television (CCTV) surveillance,
where it serves as a vigilant sentinel, flagging instances of non-
compliance with mask mandates in public spaces. This capability
enables mass screening in high-traffic areas such as transportation
hubs, marketplaces, educational institutions, and commercial centers,
bolstering efforts to uphold safety protocols and foster a secure
environment.
Furthermore, a standout feature lies in its capacity to capture snapshots

of individuals, both masked and unmasked, from live camera feeds.
This functionality proves invaluable in the identification of individuals
sought by law enforcement and other authorities, aiding in the
apprehension of wanted criminals attempting to evade capture.
iv
KEYWORDS: - Deep Learning, Convolutional Neural Network, Face
mask.
v
TABLE OF CONTENT
Title Page No
CERTIFICATE i
DECLARATION ii
ACKNOWLEDGEMENT iii
ABSTRACT iv
LIST OF ABBREVIATIONS v
LIST OF TABLES vi
LIST OF FIGURES vii
CHAPTER 1: INTRODUCTION 11
1.1: Background 11
1.2 Problem Statement & Solution 12
1.3: How Does the System Work? 12
1.4: Features of the Face Mask Detection System 14
1.5: Literature Review 14
CHAPTER 2: TECHNOLOGY USED 16
2.1: Deep Learning 16
2.1.1: Advantages of Deep Learning 16
2.1.2: What Are Artificial Neural Networks? 17
2.1.3: Convolution Neural Network 19
2.1.4: MobileNetV2 22
2.2: Object Detection 25
2.2.1: Single Shot Detector (SSD) 26
CHAPTER 3: METHODOLOGY 27
3.1: Outline 27
3.2: Phase 1 28
3.2.1: Dataset Collection & pre-processing 28
3.2.2: Training face mask model 28
3.3: Phase 2 29
3.3.1: Image Pre-processing 29
3.3.2: ROI Extraction 29
3.3.3: Prediction 30
vi
CHAPTER 4: SOFTWARE & PACKAGES REQUIRED 31
4.1 : Jupyter Notebook (Python IDE) 31
4.2 : TensorFlow 31
4.3 : Keras 31
4.4 : OpenCV 32
4.5 : Numpy 32
4.6 : Matplotlib 32
4.7 : Scikit-learn 32
CHAPTER 5: CRIMINAL DETECTION FROM LIVE VIDEO FEED 33
5.1 : Goal and Scope Definition 33
5.1.1 : Automation of Surveillance Process 33
5.1.2 : Detection of known criminals 33
5.1.3 : Integration with command centers. 34
CHAPTER 6: OBSERVATION & RESULTS 36
6.1 : Observation on Image 36
6.2 : Observation on Video 37
6.3 : Results 37
CHAPTER 7: APPLICATIONS & LIMITATION 38
7.1 : Applications 38
7.2 : Limitation 38
CHAPTER 8: CONCLUSION & FUTURE SCOPE 39
8.1 : Conclusion 39
8.2 : Future Scope 39
REFERENCES 40
vii
LIST OF TABLES
Chapter No. Table No. Table Caption Page No.

2 2.1 Architecture of Bottleneck Layer 13
2 2.2 Architecture of MobileNetV2 22
8
LIST OF FIGURES
Chapter No. Figure No. Figure Caption Page No.

2 2.1 Performance vs Data 16
2 2.2 Representation of a Neural Network 17
2 2.3 Operations performed in a neuron 18

2 2.4 Input Image 19
2 2.5 Operation in Convolution Layer 20
2 2.6 Fully Connected Layer 21
2 2.7 MobileNetV2 Architecture 22
2 2.8 Single Shot Detector Architecture 26
3 3.1 Methodology 25
3 3.2 Data Samples 28
3 3.3 Region of Interest Extraction 29
5 5.1 Implentation 35
6 6.1 Observation on Images: (a) Fully-Covered 36
Mask (b) No Mask (c) Partially-Covered
Mask
6 6.2 Multiple Detection on Image 36
6 6.3 Video Frames 37
6 6.4 Accuracy & Loss graph 37
9
LIST OF ABBREVIATIONS
CNN Convolutional Neural Network

CCTV Closed-Circuit Television
AI Artificial Intelligence
ML Machine Learning
API Application programming interface
USD United States dollar
ANN Artificial Neural Networks
ConvNet Convolutional Neural Network
RGB Red Green Blue
HSV Hue, Saturation and Value
CMYK Cyan Magenta Yellow and Black
FC Fully Connected
ReLU Rectified Linear Unit
GPU Graphics processing unit
MAdds Multiple Addition
SSD Single Shot Detector
ROI Region of Interest

Integrated development environment
IDE
Tensor processing unit
TPU
Operating System
OS
Berkeley Source Distribution
BSD
Scikit-learn
Sklearn
Area Under Curve
AUC
Frames Per Second
FPS
10
Chapter 1
INTRODUCTION
1.1 Background
Research has highlighted the effectiveness of face masks in reducing viral

transmission and instilling a sense of security. However, enforcing mask-wearing
policies manually is impractical. Enter Face Mask Detection Technology, a
solution powered by Artificial Neural Networks, which automatically identifies
individuals wearing or not wearing masks. This technology seamlessly integrates
with IP cameras, enabling monitoring of mask compliance.
The system, driven by Deep Learning and employing a Convolutional Neural

Network (CNN) architecture, accurately detects instances of improper mask usage.
It can be easily integrated with existing CCTV cameras, facilitating adherence to
safety regulations and fostering a safe work environment.
Face detection has become a crucial aspect of Image Processing and Computer
Vision. Advanced algorithms, leveraging convolutional architectures, aim to
enhance accuracy by extracting intricate pixel details. Our objective is to develop
a binary face classifier capable of detecting any face within the frame, irrespective
of its orientation. Modern Computer Vision algorithms are rapidly approaching
human-level performance in visual perception tasks.
From classification to video analytics, Computer Vision has revolutionized

modern technology.
In the battle against the COVID-19 pandemic, technology has played a vital role.
While 'work from home' has become the new norm for many, certain sectors face
challenges in adapting to this model. As the pandemic wanes, and sectors seek to
return to in-person work, concerns persist among employees. Multiple studies have
highlighted the effectiveness of face masks in reducing viral transmission and
providing a sense of security. However, manually enforcing mask policies and
tracking violations is impractical. Computer Vision offers a viable solution. By
leveraging image classification, object detection, object tracking, and video
analysis, we've developed a robust system capable of detecting face masks in
images and videos.
In summary, Face Mask Detection technology is poised to become indispensable

in both retail and corporate sectors. Major corporations across various industries
are embracing AI and ML to navigate the challenges posed by the pandemic.
Digital product development companies are rolling out mask detection API
services, enabling quick deployment of face mask detection systems. This
11
technology offers reliable and real-time detection of mask-wearing individuals,
with easy integration into existing business systems while prioritizing user data
safety and privacy.
The widespread adoption of face masks amid the pandemic has become the new
normal, with many countries mandating their use in public spaces. However, this
presents challenges for face detection algorithms and touchless access control
systems in buildings. As millions learn to make their own masks due to market
shortages, face detection algorithms must adapt to recognize individuals wearing
masks accurately.
1.2 Problem Statement & Solution

As the world seeks to transition back to pre-pandemic norms, there is a palpable
sense of unease among individuals, particularly those contemplating a return to in-
person activities. Research underscores the importance of wearing face masks in
reducing the risk of viral transmission and instilling a sense of security. While
mask-wearing is strongly advised, some individuals find it uncomfortable or
simply neglect to wear masks, contributing to a rise in cases.
Harnessing the power of AI, we can detect individuals wearing or not wearing
masks in public spaces, bolstering our safety measures. A mask detection system
could play a pivotal role in safeguarding public health. This system employs
predictive models to discern mask usage in images or videos. Implementation of
this technology at crowded venues such as colleges, airports, hospitals, and offices,
where the risk of COVID-19 transmission is heightened, can help mitigate
contagion.
Upon entry, individuals' facial data, including students, travelers, employees, and
workers, is captured by the system. If someone is identified without a mask, their
image is promptly relayed to authorities for swift intervention, while the individual
receives a notification prompting them to wear a mask. Additionally, the Face
Mask Detection System monitors employees' compliance with mask mandates,
issuing reminders to those not wearing masks.
By leveraging AI-driven mask detection technology, we can bolster safety

measures in public spaces, helping to curb the spread of COVID-19 and
safeguarding the well-being of communities.
1.3 How Does the System Work?

The facial recognition system for face masks harnesses AI technology to identify
individuals wearing or not wearing masks. Compatible with various surveillance
systems, it seamlessly integrates into your premises' existing infrastructure.
Authorized personnel or administrators can utilize the system to verify individuals'
identities, providing an additional layer of security.
12
In the event of someone entering the premises without a face mask, the system
promptly triggers an alert message to designated personnel. With a high accuracy
rate ranging from 95% to 97%, depending on digital capabilities, the system
effectively identifies individuals wearing face masks.
Furthermore, data transmission and storage are automated within the system,
facilitating convenient access to reports as needed. This ensures efficient
monitoring and management of mask compliance within the premises.
13
1.4 Features of the Face Mask Detection System:
• Implementation of the system is seamless within any existing organizational

framework.
• Personalized alerts can be dispatched to individuals wearing or not wearing face
masks, or those whose faces are unrecognizable to the administrative system.
• Hardware installation is unnecessary, as the system seamlessly integrates with
your pre-existing surveillance infrastructure.
• Compatibility extends to various cameras and hardware, including surveillance
cameras.
• Access is restricted for individuals not wearing masks, with immediate
notifications sent to authorities.
• The face mask detection system is customizable to meet specific business needs.
• Comprehensive analytics can be accessed based on reports generated by the
system.
• Accessibility and control are facilitated from any device via face mask detection
applications.
• Partially occluded faces, whether by masks, hair, or hands, are easily detected.
1.5 Literature Review

Face recognition technology enables the identification of individuals by analyzing
facial features, typically captured by hardware like video cameras. Utilizing
biometrics, face recognition apps or software map facial characteristics from
images or videos and compare them against a database of known faces. The facial
recognition market is projected to reach a value of USD 9.06 billion by 2024,
driven by government initiatives and the rising demand for surveillance systems to
bolster security measures.
In response to the global crisis, there's a burgeoning market demand for face mask
detection technology. This technology can detect faces even when wearing masks
and verify the identity of individuals. It employs an AI-powered pattern
recognition system that analyzes biometric data to extract facial features and
classify them accordingly. Additionally, it can identify individuals not wearing
masks and trigger alarms or notifications to alert security personnel or officials.
These alerts can be viewed through software, mobile apps, devices, or websites.
Given the current landscape, both government and private organizations are keen
to ensure compliance with mask-wearing mandates in public or private spaces. The
face mask detection platform swiftly identifies individuals wearing masks using
cameras and analytics. Moreover, the system is adaptable to incorporate the latest
technology and tools. For instance, contact numbers or email addresses can be
14
added to the system to send alerts to individuals not wearing masks. Furthermore,
alerts can be sent to individuals whose faces are not recognizable in the system.
Sl. Name of the Authors Publication Content

No. Paper Derived
1 An Automated M. M. Rahman, M. M. H. IEEE International IOT, Methodology, Data
System to Limit Manik, M. M. Islam, S. Electronics and Collection
COVID-19 Using Mahmud and J. -H. Kim Mechatronics
Facial Mask Conference
Detection in Smart (IEMTRONICS), 2020
City Network
2 Facial detection Sharma, Manik & IOP Conference Series: Face detection with
using deep learning Anuradha, J. & Manne, H Materials Science and SSD architecture,
& Kashyap Engineering, 2017 Image pre-processing
3 MobileNetV2: Mark Sandler, Andrew The IEEE Conference Face Mask Detector
Inverted Residuals Howard, Menglong on Computer Vision Architecture
and Linear Zhu, Andrey and Pattern Recognition
Bottlenecks Zhmoginov, Liang-Chieh (CVPR), 2018
Chen
4 SSD: Single Shot Liu, Wei & Anguelov, European SSD architecture
MultiBox Detector Dragomir & Erhan, Conference on
Dumitru & Szegedy, Computer
Christian & Reed, Scott & Vision,2016
Fu, Cheng-Yang & Berg,
Alexander
15
Chapter 2
TECHNOLOGY USED
2.1 Deep Learning
Deep learning is a branch of machine learning that employs complex algorithms

to derive significant insights from input data.
2.1.1 Advantages of Deep Learning
• Deep learning's capacity to detect patterns and anomalies within vast datasets
enables it to deliver precise and dependable analysis results efficiently. Take, for
instance, Amazon's extensive inventory of over 560 million items and its user base
exceeding 300 million. Managing such a magnitude of transactions would be
impractical for human accountants or even a large team without the aid of AI
technology.
• Unlike traditional machine learning, deep learning diminishes the reliance on

human expertise. It empowers data analysis even when developers are uncertain
about the specific insights they seek. For instance, consider a scenario where
algorithms aim to forecast customer retention, yet the precise characteristics
determining this prediction are ambiguous. Deep learning excels in uncovering
such insights autonomously.
Fig 2.1: Performance vs Data
16
2.1.2 What Are Artificial Neural Networks?
The human brain possesses a unique ability to interpret real-world context and
situations, a skill that computers struggle to replicate. Artificial Neural Networks
(ANN) serve as an attempt to emulate the brain's functionality, enabling computers
to learn and make decisions akin to humans.
ANN functions as a machine learning algorithm designed for classification,

regression, and clustering tasks. It serves as the foundational component of deep
neural networks and excels in learning complex non-linear hypotheses, particularly
in datasets with numerous features.
Comprising multiple layers of mathematical processing, ANN consists of

numerous units organized across these layers. Each unit, also known as a neuron,
processes information received from the outside world through input units in the
input layer. The data then undergoes transformation within hidden units before
being utilized by output units.
Illustrated below is an example with 1 input layer featuring 4 input units, followed
by 2 hidden layers. The first hidden layer consists of 4 neurons, while the second
contains 3 neurons. Lastly, there is 1 output layer housing 2 output units.
Fig 2.2: Representation of a Neural Network
At the outset, a neuron consolidates inputs from all neurons in the preceding layer
to which it's linked. In the depicted scenario, the neuron accepts 3 inputs. These
17
inputs are individually multiplied by corresponding weights (w1, w2, w3) and then
summed. Weights denote the connection strengths between neurons and are fine-
tuned during learning. Furthermore, a bias value is incorporated into this
summation. Subsequently, the neuron applies an activation function to the resultant
value.
Fig 2.3: Operations performed in a neuron
Essentially, a neuron receives input values from connected neurons, multiplies

them by respective weights, adds them together, and applies an activation function
before passing the result to other neurons. This process repeats throughout the
network, with each layer performing its computations until the final output is
obtained, ideally predicting the desired output. Initially, the network's predictions
may be random, but as it undergoes training epochs, the predictions gradually
converge towards the correct values.
This process, known as forward pass or propagation, begins with the activations
of input units, which are then forwarded to the hidden layer to compute the
activations of the hidden layer. This process continues until the activations of the
output layer are computed.
To optimize the network's weights and parameters, a process called

backpropagation is employed. During backpropagation, the weights are adjusted
to enable the neural network to learn how to accurately map random inputs to
outputs. This iterative process ensures the network continually improves its
performance over time.
18
2.1.3 Convolution Neural Network
A Convolutional Neural Network (ConvNet/CNN) serves as a sophisticated Deep

Learning architecture specialized in processing visual data, particularly images. Its
primary function revolves around analyzing input images, assigning significance
(via learnable weights and biases) to diverse components or entities within the
image, thereby facilitating their discrimination. Unlike conventional classification
algorithms, ConvNets necessitate minimal preprocessing efforts. Instead of relying
on manually crafted filters, ConvNets possess the capacity to autonomously learn
these filters or features during the training phase.
ConvNets excel in capturing both spatial and temporal relationships inherent in

images by employing relevant filters. This architectural design contributes to a
more precise fitting to image datasets by virtue of reducing parameter complexity
and enhancing weight reusability. In essence, ConvNets exhibit an advanced
capacity to comprehend the intricacies of images, thereby enabling more nuanced
analysis and classification.
Fig 2.4: Input Image
In the illustration, we observe an RGB image dissected into its three color channels
— Red, Green, and Blue. Various color spaces exist in which images are
represented, such as Grayscale, RGB, HSV, CMYK, among others.
The function of the ConvNet is to condense the images into a format that is more
19
manageable for processing, while retaining crucial features essential for accurate
predictions. This becomes crucial when designing an architecture that excels not
only in feature learning but also in scalability to handle large datasets.
Types of Layers in CNN: -
(a) Convolution Layer

The primary aim of the Convolution Operation is to extract high-level features,
such as edges, from the input image. ConvNets can employ multiple Convolutional
Layers for this purpose.
Traditionally, the initial Convolutional Layer is tasked with capturing low-level

features like edges, color, and gradient orientation. As additional layers are added,
the architecture evolves to encompass high-level features as well. This results in a
network that possesses a comprehensive understanding of images in the dataset,
akin to human perception.
There are two possible outcomes of the convolution operation: one where the
convolved feature is dimensionally reduced compared to the input, and the other
where the dimensionality either increases or remains unchanged. This is achieved
through the application of Valid Padding for the former scenario, and Same
Padding for the latter.
Fig 2.5: Operation in Convolution Layer
(b) Pooling Layer

Similar to the Convolutional Layer, the Pooling layer plays a crucial role in
reducing the spatial size of the Convolved Feature. This reduction helps decrease
20
the computational power required for data processing by reducing dimensionality.
Additionally, Pooling is effective in extracting dominant features that are invariant
to rotation and position, thereby aiding in the effective training of the model.
There are two main types of Pooling: Max Pooling and Average Pooling. Max
Pooling selects the maximum value from the portion of the image covered by the
Kernel, while Average Pooling calculates the average of all values within the
Kernel's area.
Max Pooling serves as a noise suppressor by discarding noisy activations and

contributing to de-noising along with dimensionality reduction. Conversely,
Average Pooling primarily focuses on dimensionality reduction as a noise
suppression mechanism. Therefore, Max Pooling typically outperforms Average
Pooling.
The Convolutional Layer and the Pooling Layer collectively constitute the i-th
layer of a Convolutional Neural Network. Depending on the complexity of the
images, the number of these layers may be increased to capture low-level details
even further, albeit at the expense of increased computational power.
Upon completing the aforementioned processes, the model gains an understanding

of the features. Moving forward, the final output is flattened and fed into a regular
Neural Network for classification purposes.
(c) Classification — Fully Connected Layer (FC Layer)
Fig 2.6: Fully Connected Layer
21
Incorporating a Fully-Connected layer provides a cost-effective means of learning
non-linear combinations of the high-level features extracted by the convolutional
layer. This layer is tasked with learning a potentially non-linear function within
that feature space.
After converting our input image into a suitable format for our Multi-Level
Perceptron, we proceed to flatten the image into a column vector. This flattened
output is then fed into a feed-forward neural network, with backpropagation
applied during each iteration of training. Through numerous epochs, the model
becomes adept at discerning dominant and subtle low-level features within images,
ultimately classifying them using the SoftMax Classification technique
2.1.4 MobileNetV2
Fig 2.7: MobileNetV2 Architecture
In MobileNetV2, two types of blocks are utilized: residual blocks with a stride of
1 and blocks with a stride of 2 for downsizing.
Each type of block consists of three layers:
1. The first layer is a 1×1 convolution with ReLU6 activation.

2. The second layer is a depth-wise convolution.
3. The third layer is another 1×1 convolution without any non-linearity. This
decision is based on the assertion that applying ReLU again would limit the
22
network's capabilities to that of a linear classifier within the non-zero volume part
of the output domain.
23
Table 2.1: Architecture of Bottleneck Layer
Additionally, MobileNetV2 introduces an expansion factor denoted as "t," which

remains constant at 6 for all primary experiments.
For instance, if the input consists of 64 channels, the internal output will contain
64×t=64×6=384 channels.
Overall Architecture
Table 2.2: Architecture of MobileNetV2
In MobileNetV2, several parameters influence the architecture and computational

cost:
- "t": Expansion factor

- "c": Number of output channels
- "n": Repeating number
- "s": Stride
- Spatial convolutions utilize 3×3 kernels.
For the primary network (width multiplier 1, input size 224×224), it incurs a
computational cost of 300 million multiply-adds and utilizes 3.4 million
parameters. (The width multiplier was introduced in MobileNetV1.)
Further exploration involves performance trade-offs across input resolutions

ranging from 96 to 224 and width multipliers varying from 0.35 to 1.4.
24
The network's computational cost can reach up to 585M MAdds, while the model
size varies between 1.7M and 6.9M parameters.
During training, 16 GPUs are utilized with a batch size of 96

2.2 Object Detection
- Image Classification: Predicts the class or type of a single object in an image.

- Input: An image containing a single object, like a photograph.
- Output: A class label (e.g., integers mapped to class labels).
- Object Localization: Identifies the presence of objects in an image and delineates

their location using bounding boxes.
- Input: An image with one or more objects, such as a photograph.
- Output: One or more bounding boxes (defined by point, width, and height).
- Object Detection: Locates objects using bounding boxes and predicts their types
or classes within an image.
- Input: An image with one or more objects, like a photograph.
- Output: One or more bounding boxes (defined by point, width, and height) and
a class label assigned to each bounding box.
25
2.2.1 Single Shot Detector (SSD)
Fig 2.8: Single Shot Detector Architecture
The Single Shot Detector (SSD) is a technique for object detection in images that
relies on a single deep neural network. SSD discretizes the output space of
bounding boxes into a predefined set of default boxes across different aspect ratios.
These default boxes are then scaled per feature map location. By combining
predictions from multiple feature maps with varying resolutions, the SSD network
can effectively handle objects of different sizes. This approach enables efficient
and accurate detection of objects in images using a unified framework.
Advantages of SSD:
1. Elimination of Proposal Generation: SSD eliminates the need for separate

proposal generation and subsequent pixel or feature resampling stages. Instead, all
computation is encapsulated within a single network, streamlining the detection
process.
2. Ease of Training: SSD is relatively easy to train compared to other methods,

making it accessible for practitioners. Its simplicity facilitates straightforward
integration into systems that require a detection component.
3. Competitive Accuracy: Despite forgoing an additional object proposal step, SSD

achieves competitive accuracy compared to methods that utilize such steps. It
delivers comparable performance while being significantly faster, enhancing
efficiency in both training and inference tasks.
26
Chapter 3
METHODOLOGY
3.1 Outline
Fig 3.1: Methodology
The project comprises two main phases:
1. Training:
- In this phase, a Convolutional Neural Network (CNN) model is trained using
an image dataset.
- Hyperparameters are fine-tuned to enhance the accuracy of the model.
- During training, the model's parameters are iteratively updated based on the
dataset.
- Once the model achieves the desired accuracy, it is saved for future use.
2. Deployment:
27
- During the deployment phase, images are collected from various sources.
- A face detector locates faces within the images, defining a Region of Interest
(ROI).
- The ROI, containing the detected face, is extracted from the image and provided
as input to the face mask detector.
- The face mask detector then analyzes the ROI to detect whether a mask is
present or absent.
- The output of the face mask detector indicates whether the individual in the
ROI is wearing a mask or not.
3.2 Phase 1
3.2.1 Dataset Collection & pre-processing
Data was sourced from Kaggle.com, comprising a total of 690 images depicting
people wearing masks and 686 images of individuals without masks. To train the
model effectively, 80% of images from each class were allocated for training,
while the remaining 20% were reserved for testing. This partitioning ensured a
balanced representation of both classes in both the training and testing datasets.
Following the data preparation phase, the model was trained using the designated
training dataset. Subsequently, it was evaluated against a separate testing dataset
to assess its performance on unseen data. This approach helped validate the
model's ability to generalize well to new instances and produce accurate
predictions.
Fig 3.2: Data Samples
3.2.2 Training face mask model

The MobileNet V2 pre-trained model was utilized to classify images as either
masked or unmasked. Training was conducted using the Keras framework, and
data augmentation was applied using the ImageDataGenerator class within Keras.
To ensure consistency, the pixel values of the images were normalized to fall
within the range of 0 and 1.
28
Given that the final layer of the neural network had two outputs corresponding to
the presence or absence of a mask (yielding a categorical representation), the data
labels were converted to categorical format accordingly.
Upon completion of training, the trained model was saved to disk for future use.
3.3 Phase 2
To integrate the Single Shot Detector (SSD) model for face detection, the model
along with its trained weights (parameters) is loaded into the application.
Subsequently, images or videos are captured from various sources such as CCTV
cameras or drone cameras.
Frames are extracted from the video using the OpenCV library, a popular computer
vision library in Python. These frames serve as the input to the SSD model for face
detection. The SSD model analyzes each frame to identify and localize faces within
the image.
The integration of the SSD model with the OpenCV library enables real-time face
detection from video streams or static images captured from different sources,
facilitating applications such as surveillance, monitoring, and security systems.
3.3.1 Image Pre-processing

Images captured by CCTV cameras undergo pre-processing before further
analysis. Firstly, they are converted to grayscale to reduce redundant information.
Next, they are reshaped to a uniform size of (64×64) for consistency. Then,
normalization is applied, adjusting pixel values to a range between 0 and 1. This
aids faster learning and enhances feature extraction for accurate face mask
detection.
3.3.2 ROI Extraction

The face detector is utilized to analyze the image or frame, identifying faces by
forming bounding boxes around them. These bounding boxes delineate the regions
containing detected faces, known as the Region of Interest (ROI). The ROI
represents the output of the face detector, providing precise localization of faces
within the image or frame.
Fig 3.3: Region of Interest Extraction
29
3.3.3 Prediction
The extracted Region of Interest (ROI), containing the detected face, serves as
input to the face mask detector model. This model conducts classification on the
ROI, predicting whether the face is wearing a mask or not from the two classes:
Mask and Non-Mask. The results of the prediction are then displayed on the screen,
providing immediate feedback. Additionally, the results can be saved locally for
future reference or analysis.
30
Chapter 4
SOFTWARE & PACKAGES REQUIRED
4.1 Jupyter Notebook (Python IDE)
Jupyter Notebook is a versatile open-source web application designed for creating

and sharing code and documents. It offers an integrated environment where users
can document their code, execute it, observe the outcomes, visualize data, and
analyze results without leaving the platform.
The flexibility of Jupyter Notebooks makes it ideal for various data science
workflows, including data cleaning, statistical modeling, machine learning model
development and training, data visualization, and more. Its compartmentalized
structure, with code written in separate cells, facilitates the prototyping phase by
allowing users to execute individual code blocks independently. This enables
efficient testing and debugging without rerunning the entire script from the
beginning.
4.2 TensorFlow
TensorFlow 2.0 is a powerful, open-source platform for end-to-end machine
learning. It's renowned for its deep learning capabilities and ease of use, offering
efficient execution of tensor operations, automatic gradient computation,
scalability across devices, and the ability to export programs to various runtimes.
With support for multiple programming languages and environments, TensorFlow
ensures accessibility and versatility for developers.
4.3 Keras
Keras, the high-level API for TensorFlow 2.0, simplifies machine learning tasks
with its user-friendly interface. Designed for productivity, it offers clear APIs,
minimal user actions, and comprehensive documentation, making it a top choice
for deep learning projects. Its focus on ease of use and rapid experimentation
empowers developers to iterate quickly and stay ahead in innovation.
31
4.4OpenCV
OpenCV, or Open-Source Computer Vision Library, is a BSD-licensed software
library renowned for its comprehensive suite of over 2500 optimized algorithms.
These algorithms cover a broad spectrum of computer vision and machine learning
tasks, including face detection, object recognition, motion tracking, 3D modeling,
image stitching, and more. With a large and active user community exceeding 47
thousand individuals, OpenCV is widely utilized across industries, research
groups, and governmental organizations worldwide, making it a cornerstone in the
field of computer vision.
4.5Numpy
NumPy is a fundamental open-source Python library for numerical computing,

featuring multi-dimensional array and matrix data structures. It facilitates various
mathematical operations on arrays, including trigonometric, statistical, and
algebraic routines. As an extension of Numeric and Numarray, NumPy offers a
comprehensive suite of mathematical, algebraic, and transformation functions.
Additionally, it includes random number generators and serves as a wrapper
around a C-based library, enhancing efficiency and performance. Pandas, another
popular library, heavily relies on NumPy objects, effectively extending its
capabilities.
4.6Matplotlib
Matplotlib is a Python plotting library that interfaces seamlessly with NumPy, enabling
users to create visualizations for numerical data. It offers an object-oriented API for
integrating plots into applications via GUI toolkits like Tkinter, wxPython, Qt, or GTK+.
Additionally, it features a procedural "pylab" interface resembling MATLAB's state
machine, although its usage is discouraged. Matplotlib is commonly utilized in
conjunction with SciPy for scientific computing tasks.
4.7Sklearn
Scikit-learn (Sklearn) is a powerful Python library for machine learning tasks, offering a
wide range of tools for classification, regression, clustering, and dimensionality reduction.
It provides a consistent interface and efficient algorithms, making it a go-to choice for
both beginners and experts in the field. Leveraging the capabilities of NumPy, SciPy, and
Matplotlib, scikit-learn offers a comprehensive suite of machine learning and statistical
modeling capabilities.
32
Chapter 5
CRIMINAL DETECTION FROM LIVE VIDEO
FEED
5.1 Goal and Scope Definition
The primary objective of this research initiative is to transform crime detection and
suspect recognition methodologies through an innovative system. By combining
traditional surveillance methods with cutting-edge technologies, such as cloud
computing, machine learning, and deep learning, this project aims to improve the
effectiveness and efficiency of crime prevention and response strategies. The
scope of the project includes developing a comprehensive surveillance framework
that integrates advanced techniques to address multifaceted challenges in law
enforcement and security.
5.1.1Automation of Surveillance Process
The system aims to automate surveillance processes by leveraging cutting-edge

technologies, reducing manual intervention and human error. Continuous
monitoring of video feeds enables real-time analysis and detection of suspicious
activities.
Through the integration of machine learning and deep learning algorithms, the
system seeks to enhance the accuracy of crime detection and suspect recognition.
By analyzing vast datasets and employing advanced pattern recognition
techniques, subtle cues indicative of criminal behavior can be identified with
greater precision. Additionally, the system aims to expedite response times by
promptly alerting relevant authorities.
Furthermore, the system aims to generate actionable insights to support law

enforcement agencies in crime prevention and response efforts. Real-time analysis
of surveillance data enables the identification of trends, patterns, and anomalies,
facilitating proactive intervention and strategic resource allocation.
5.1.2 Detection of Known Criminals
The module dedicated to the detection of known criminals represents a critical

component of
the overarching surveillance system. This module operates on the premise of
leveraging existing
33
datasets containing information about known offenders, including photographs,
biometric data,
and criminal records. The module comprises the following key elements:
Facial Recognition Algorithms: The system employs sophisticated facial

recognition algorithms,
such as HAAR Cascade and TensorFlow, to match live CCTV footage with known
criminal profiles.
These algorithms analyze facial features, including unique identifiers such as facial
structure, scars,
tattoos, and distinctive markings, to identify potential matches.
Continuous Monitoring: Leveraging live CCTV feeds from various surveillance
cameras, the
system continuously scans for individuals matching the profiles of known
criminals. By
deploying advanced image processing techniques, the system can rapidly analyze
video frames in
real-time, ensuring timely detection of suspects.
Immediate Alerting: Upon identifying a match with a known criminal profile, the
system
generates immediate alerts to security personnel and law enforcement agencies.
These alerts
include pertinent information, such as the individual's identity, location, and
associated risk level,
enabling swift response and intervention.
5.1.3 Integration with Command Centres
The system seamlessly integrates with command centers and

police dispatch systems, facilitating seamless communication and coordination
between
surveillance personnel and law enforcement authorities. This integration
streamlines the response
process, enabling authorities to deploy resources effectively and apprehend
suspects expeditiously.
By incorporating these elements, the module dedicated to the detection of known
criminals enhances
the proactive surveillance capabilities of the overarching system. Through
continuous monitoring,
rapid identification, and immediate alerting, the system empowers law
enforcement agencies to
mitigate potential threats and safeguard communities effectively.
34
35
Chapter 6
OBSERVATION & RESULTS
6.1 Observation on Image

The implemented model is capable of processing images containing one or more
faces. Upon receiving an image as input, the model conducts mask detection. If a
mask is detected, the model draws a green box around the face, indicating that it is
covered with a mask. Conversely, if the model does not detect a mask, it draws a
red box around the face. Even if the face is only partially covered, it will still be
indicated with a red box, signifying that the face is not adequately covered.
(a) (b) (c)
Fig 6.1: Observation on Images:
(a) Fully-Covered Mask (b) No Mask (c) Partially-Covered Mask

The model is designed to detect multiple faces simultaneously, displaying multiple
outputs within a single image or frame. In the provided image, green boxes indicate
individuals wearing masks, while red boxes denote those not wearing masks. This
capability allows for efficient detection and visualization of mask compliance
across multiple individuals within a given scene.
Fig 6.2: Multiple Detection on Image
36
6.2 Observation on Video
That's a great way to demonstrate the real-time functionality of the model. It's
impressive to see how it accurately detects masks in live video streams, providing
valuable insights into its practical application.
Fig 6.3: Video Frames
6.3 Results
It's crucial to strike a balance between training accuracy and overfitting, and it
seems like your model has achieved that balance effectively. With a high
accuracy of 98.7% and an AUC of 0.985 on unseen test data, it demonstrates
strong generalization capabilities. The fluctuations in testing loss within an
acceptable range indicate the model's robustness in handling unseen data. This
thorough evaluation underscores the reliability and effectiveness of the
developed architecture.
Fig 6.4: Accuracy & Loss graph
37
Chapter 7
APPLICATION & LIMITATION
7.1 Applications:
There are several applications of face mask detection. Which includes the
applications which are specific to covid19 and others are not related to covid19.
So, let us take a look at all the applications of Face Mask Detection.
1. Hospitals
The Face Mask Detection System proves invaluable in hospital settings, where it can
ensure that healthcare workers adhere to safety protocols by wearing masks consistently.
Upon detecting any staff member without a mask, the system issues a prompt reminder to
wear one. Similarly, for quarantined individuals required to wear masks, the system
monitors compliance and automatically alerts or reports to authorities in cases of non-
compliance.
2. Airports
Deploying the Face Mask Detection System at airports enables the identification of
individuals without masks. Tourists' facial data can be captured upon entry, and if any
individual is detected without a mask, their image is promptly forwarded to airport
authorities for immediate action. Additionally, if the person's facial data is stored in the
database, such as airport employees, alerts can be directly sent to their phones for
immediate notification and compliance.
3. Offices:
Implementing the Face Mask Detection System in office areas ensures compliance
with safety standards among employees. It detects individuals without masks and
sends them reminders to wear one. Additionally, reports can be generated and
downloaded or sent via email at the end of the day to identify individuals who are
not adhering to the guidelines or requirements, facilitating enforcement of safety
protocols.
4. Educational Institutes:
Implementing face mask detection in classrooms enhances safety by ensuring consistent
mask usage among children, reducing the risk of viral infections.
7.2 Limitation
One of the challenges faced in the system is the inability to accurately detect face
coverings other than masks, such as hands or cloth. This can lead to misclassifications
where the system erroneously predicts the presence of a mask when there isn't one. As a
consequence, the accuracy and precision of the system may be compromised in such cases.
38
Chapter 8
CONCLUSION
8.1 Conclusion
To combat the spread of COVID-19, we developed a face mask detector using SSD
architecture and transfer learning with convolutional neural networks. Our dataset
comprised 690 masked and 686 unmasked faces sourced from Kaggle. We evaluated
various metrics and selected MobileNetV2 for its superior performance, boasting 100%
precision and 99% recall. Its computational efficiency makes it ideal for deployment in
embedded systems. Our detector can be deployed in malls, airports, and crowded areas to
enforce safety measures. It excels at recognizing blurred and side face images, a feat
unattainable by traditional models. Future applications include integration into home
security systems.
8.2 Future Scope

It's clear that the need for face mask detection technology is growing rapidly, especially
with the recent surge in mask mandates worldwide. Retail companies, in particular, can
benefit from integrating this technology into their stores to ensure compliance with safety
measures. By releasing our Face Mask Detection tool as an open-source project, we can
provide a flexible solution that can be easily integrated with existing cameras and
surveillance systems. This tool can be deployed in various settings, including
supermarkets, public transports, offices, and stores, to detect individuals without masks in
real-time. Additionally, features such as notice messages, image capture, and alarm
systems can enhance the effectiveness of the tool in enforcing mask-wearing policies. By
connecting the software to entrance gates, businesses can further restrict access to only
those wearing face masks, thereby contributing to the overall safety of their premises.
There are a number of aspects work can be done:
Here's a condensed version:
1. Faster Inference: We're speeding up our model to 15 FPS on CPUs for real-time
monitoring without GPUs.
2. Mobile Integration: Moving our models to TensorFlow Lite for mobile

deployment, making them accessible via mobile apps.
3. TFRT Compatibility: Making our architecture compatible with TensorFlow

RunTime for better performance on edge devices.
4. Upgradable Models: Easy replacement of models for better accuracy and lower
latency, keeping our solution up-to-date.
39
References
1. M. M. Rahman, M. M. H. Manik, M. M. Islam, S. Mahmud, and J.-H. Kim, "An
Automated System to Limit COVID-19 Using Facial Mask Detection in Smart City
Network," in 2020 IEEE International IOT, Electronics and Mechatronics Conference
(IEMTRONICS), Vancouver, BC, Canada, 2020, pp. 1-5. doi:
10.1109/IEMTRONICS51293.2020.9216386.
2. Y. Sun, X. Wang, and X. Tang, "Deep learning face representation by joint

identification-verification," CoRR, vol. abs/1406.4773, 2014.
3. H. C. Kim, D. Kim, and S. Y. Bang, "Face recognition using LDA mixture model," in
Pattern Recognition, Proceedings. 16th International Conference on, IEEE, 2002, pp. 486-
489.
4. L. Liu et al., "Deep Learning for Generic Object Detection: A Survey," Int. J. Comput.
Vis., vol. 128, no. 2, pp. 261-318, Sep. 2018.
5. S. Ali, S. A. Alvi, and A. Ur Rehman, "The Usual Suspects: Machine Learning Based
Predictive Policing for Criminal Identification," in 2019 13th International Conference on
Open Source Systems and Technologies (ICOSST), ISBN: 978-1-7281-4613-3, DOI:
10.1109/ICOSST48232.2019.9043925.
6. J. Li, X. Jiang, T. Sun, and K. Xu, "Efficient Violence Detection Using 3D

Convolutional Neural Networks," in 2019 16th IEEE International Conference on
Advanced Video and Signal Based Surveillance (AVSS), ISBN: 978-1-7281-0990-9,
DOI: 10.1109/AVSS.2019.8909883.
7. M. Ramzan, A. Abid, H. Khan, S. Awan, A. Ismail, M. Ahmed, M. Ilyas, and A.

Mahmood, "A Review on state-of-the-art Violence Detection Techniques," IEEE Access,
DOI: 10.1109/ACCESS.2019.2932114.
8. L. Elluri, V. Mandalapu, and N. Roy, "Developing Machine Learning based Predictive

Models for Smart Policing," in 2019 IEEE International Conference on Smart Computing
(SMARTCOMP), ISBN: 978-1-7281-1689-1, DOI:
10.1109/SMARTCOMP.2019.00053.
9. M. Soliman, M. Kamal, M. Nashed, Y. Mostafa, B. Chawky, and D. Khattab, "Violence

Recognition from Videos using Deep learning."
40

Major Project Report (1) 91789 (1) 32

Uploaded by

Copyright:

Available Formats

Major Project Report (1) 91789 (1) 32

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Major Project Report (1) 91789 (1) 32

Uploaded by

Copyright:

Available Formats

FACE MASK DETECTION USING CNN AND GABOR FILTER

Alok Kumar Mishra (20010163)

In partial fulfilment for the award of the degree

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

C.V. RAMAN GLOBAL UNIVERSITY

SUPERVISOR HEAD OF THE DEPARTMENT

Prof. MamtaRani Das Prof. Rojalina Priyadarshini

The global impact of the COVID-19 pandemic has been profound,

Enter the face mask detector, a cutting-edge solution leveraging Deep

Training the system involves the utilization of Fully Convolutional

Furthermore, a standout feature lies in its capacity to capture snapshots

Chapter No. Table No. Table Caption Page No.

2 2.2 Architecture of MobileNetV2 22

Chapter No. Figure No. Figure Caption Page No.

2 2.3 Operations performed in a neuron 18

CNN Convolutional Neural Network

CMYK Cyan Magenta Yellow and Black

ReLU Rectified Linear Unit

GPU Graphics processing unit

MAdds Multiple Addition

SSD Single Shot Detector

ROI Region of Interest

Research has highlighted the effectiveness of face masks in reducing viral

The system, driven by Deep Learning and employing a Convolutional Neural

From classification to video analytics, Computer Vision has revolutionized

In summary, Face Mask Detection technology is poised to become indispensable

1.2 Problem Statement & Solution

By leveraging AI-driven mask detection technology, we can bolster safety

1.3 How Does the System Work?

• Implementation of the system is seamless within any existing organizational

1.5 Literature Review

Sl. Name of the Authors Publication Content

Deep learning is a branch of machine learning that employs complex algorithms

2.1.1 Advantages of Deep Learning

• Unlike traditional machine learning, deep learning diminishes the reliance on

Fig 2.1: Performance vs Data

ANN functions as a machine learning algorithm designed for classification,

Comprising multiple layers of mathematical processing, ANN consists of

Fig 2.2: Representation of a Neural Network

Fig 2.3: Operations performed in a neuron

Essentially, a neuron receives input values from connected neurons, multiplies

To optimize the network's weights and parameters, a process called

A Convolutional Neural Network (ConvNet/CNN) serves as a sophisticated Deep

ConvNets excel in capturing both spatial and temporal relationships inherent in

Fig 2.4: Input Image

Types of Layers in CNN: -

(a) Convolution Layer

Traditionally, the initial Convolutional Layer is tasked with capturing low-level

Fig 2.5: Operation in Convolution Layer

(b) Pooling Layer

Max Pooling serves as a noise suppressor by discarding noisy activations and

Upon completing the aforementioned processes, the model gains an understanding

(c) Classification — Fully Connected Layer (FC Layer)

Fig 2.6: Fully Connected Layer

Fig 2.7: MobileNetV2 Architecture

Each type of block consists of three layers:

1. The first layer is a 1×1 convolution with ReLU6 activation.

Additionally, MobileNetV2 introduces an expansion factor denoted as "t," which

Table 2.2: Architecture of MobileNetV2