Major Project Report (1) 91789 (1) 32
Major Project Report (1) 91789 (1) 32
Major Project Report (1) 91789 (1) 32
A PROJECT REPORT
Submitted by
BACHELOR OF TECHNOLOGY
In
COMPUTER SCIENCE AND ENGINEERING
i
C.V. RAMAN GLOBAL UNIVERSITY
BHUBANESWAR-ODISHA-752054
CERTIFICATE OF APPROVAL
This is to certify that we have examined the project entitled "Face mask
detection and recognition using CNN and Gabor filter" submitted by Alok
Kumar Mishra, Registration No.-20010163, Jyoti Ranjan Behera,
Registration No.-20010152, Abhinash Barada, Registration No.-20010132,
Aditya Prasad Tripathy, Registration No.-20010129, CGU-Odisha,
Bhubaneswar. We here by accord our approval of it as a major project work
carried out and presented in a manner required for its acceptance towards
completion of major project stage-I (8 th Semester) of Bachelor Degree of
Computer Science & Engineering for which it has been submitted. This
approval does not necessaril
-y endorse or accept every statement made, opinion expressed or conclusions
drawn as recorded in this major project, it only signifies the acceptance of the
major project for the purpose it has been submitted.
ii
ACKNOWLEDGEMENT
We would like to articulate our deep gratitude to our project guide
Prof. MamtaRani Das, Professor, Department of Computer
Science and Engineering, who has always been source
of motivation and firm support for carrying out the project.
We would also like to convey our sincerest gratitude and indebtedness
to all other faculty members and staff of Department of Computer
Science and Engineering, who bestowed their great effort and guidance
at appropriate times without it would have been very difficult on our
project work
An assemblage of this nature could never have been attempted with
our reference to and inspiration from the works of others whose details
are mentioned in the references section. We acknowledge our
indebtedness to all of them. Further, we would like to express our
feeling towards our parents and God who directly or indirectly
encouraged and motivated us during Assertion
iii
ABSTRACT
v
TABLE OF CONTENT
Title Page No
CERTIFICATE i
DECLARATION ii
ACKNOWLEDGEMENT iii
ABSTRACT iv
LIST OF ABBREVIATIONS v
LIST OF TABLES vi
LIST OF FIGURES vii
CHAPTER 1: INTRODUCTION 11
1.1: Background 11
1.2 Problem Statement & Solution 12
1.3: How Does the System Work? 12
1.4: Features of the Face Mask Detection System 14
1.5: Literature Review 14
CHAPTER 2: TECHNOLOGY USED 16
2.1: Deep Learning 16
2.1.1: Advantages of Deep Learning 16
2.1.2: What Are Artificial Neural Networks? 17
2.1.3: Convolution Neural Network 19
2.1.4: MobileNetV2 22
2.2: Object Detection 25
2.2.1: Single Shot Detector (SSD) 26
CHAPTER 3: METHODOLOGY 27
3.1: Outline 27
3.2: Phase 1 28
3.2.1: Dataset Collection & pre-processing 28
3.2.2: Training face mask model 28
3.3: Phase 2 29
3.3.1: Image Pre-processing 29
3.3.2: ROI Extraction 29
3.3.3: Prediction 30
vi
CHAPTER 4: SOFTWARE & PACKAGES REQUIRED 31
4.1 : Jupyter Notebook (Python IDE) 31
4.2 : TensorFlow 31
4.3 : Keras 31
4.4 : OpenCV 32
4.5 : Numpy 32
4.6 : Matplotlib 32
4.7 : Scikit-learn 32
CHAPTER 5: CRIMINAL DETECTION FROM LIVE VIDEO FEED 33
5.1 : Goal and Scope Definition 33
5.1.1 : Automation of Surveillance Process 33
5.1.2 : Detection of known criminals 33
5.1.3 : Integration with command centers. 34
CHAPTER 6: OBSERVATION & RESULTS 36
6.1 : Observation on Image 36
6.2 : Observation on Video 37
6.3 : Results 37
CHAPTER 7: APPLICATIONS & LIMITATION 38
7.1 : Applications 38
7.2 : Limitation 38
CHAPTER 8: CONCLUSION & FUTURE SCOPE 39
8.1 : Conclusion 39
8.2 : Future Scope 39
REFERENCES 40
vii
LIST OF TABLES
8
LIST OF FIGURES
9
LIST OF ABBREVIATIONS
FC Fully Connected
10
Chapter 1
INTRODUCTION
1.1 Background
Face detection has become a crucial aspect of Image Processing and Computer
Vision. Advanced algorithms, leveraging convolutional architectures, aim to
enhance accuracy by extracting intricate pixel details. Our objective is to develop
a binary face classifier capable of detecting any face within the frame, irrespective
of its orientation. Modern Computer Vision algorithms are rapidly approaching
human-level performance in visual perception tasks.
In the battle against the COVID-19 pandemic, technology has played a vital role.
While 'work from home' has become the new norm for many, certain sectors face
challenges in adapting to this model. As the pandemic wanes, and sectors seek to
return to in-person work, concerns persist among employees. Multiple studies have
highlighted the effectiveness of face masks in reducing viral transmission and
providing a sense of security. However, manually enforcing mask policies and
tracking violations is impractical. Computer Vision offers a viable solution. By
leveraging image classification, object detection, object tracking, and video
analysis, we've developed a robust system capable of detecting face masks in
images and videos.
The widespread adoption of face masks amid the pandemic has become the new
normal, with many countries mandating their use in public spaces. However, this
presents challenges for face detection algorithms and touchless access control
systems in buildings. As millions learn to make their own masks due to market
shortages, face detection algorithms must adapt to recognize individuals wearing
masks accurately.
Harnessing the power of AI, we can detect individuals wearing or not wearing
masks in public spaces, bolstering our safety measures. A mask detection system
could play a pivotal role in safeguarding public health. This system employs
predictive models to discern mask usage in images or videos. Implementation of
this technology at crowded venues such as colleges, airports, hospitals, and offices,
where the risk of COVID-19 transmission is heightened, can help mitigate
contagion.
Upon entry, individuals' facial data, including students, travelers, employees, and
workers, is captured by the system. If someone is identified without a mask, their
image is promptly relayed to authorities for swift intervention, while the individual
receives a notification prompting them to wear a mask. Additionally, the Face
Mask Detection System monitors employees' compliance with mask mandates,
issuing reminders to those not wearing masks.
12
In the event of someone entering the premises without a face mask, the system
promptly triggers an alert message to designated personnel. With a high accuracy
rate ranging from 95% to 97%, depending on digital capabilities, the system
effectively identifies individuals wearing face masks.
Furthermore, data transmission and storage are automated within the system,
facilitating convenient access to reports as needed. This ensures efficient
monitoring and management of mask compliance within the premises.
13
1.4 Features of the Face Mask Detection System:
In response to the global crisis, there's a burgeoning market demand for face mask
detection technology. This technology can detect faces even when wearing masks
and verify the identity of individuals. It employs an AI-powered pattern
recognition system that analyzes biometric data to extract facial features and
classify them accordingly. Additionally, it can identify individuals not wearing
masks and trigger alarms or notifications to alert security personnel or officials.
These alerts can be viewed through software, mobile apps, devices, or websites.
Given the current landscape, both government and private organizations are keen
to ensure compliance with mask-wearing mandates in public or private spaces. The
face mask detection platform swiftly identifies individuals wearing masks using
cameras and analytics. Moreover, the system is adaptable to incorporate the latest
technology and tools. For instance, contact numbers or email addresses can be
14
added to the system to send alerts to individuals not wearing masks. Furthermore,
alerts can be sent to individuals whose faces are not recognizable in the system.
3 MobileNetV2: Mark Sandler, Andrew The IEEE Conference Face Mask Detector
Inverted Residuals Howard, Menglong on Computer Vision Architecture
and Linear Zhu, Andrey and Pattern Recognition
Bottlenecks Zhmoginov, Liang-Chieh (CVPR), 2018
Chen
4 SSD: Single Shot Liu, Wei & Anguelov, European SSD architecture
MultiBox Detector Dragomir & Erhan, Conference on
Dumitru & Szegedy, Computer
Christian & Reed, Scott & Vision,2016
Fu, Cheng-Yang & Berg,
Alexander
15
Chapter 2
TECHNOLOGY USED
2.1 Deep Learning
• Deep learning's capacity to detect patterns and anomalies within vast datasets
enables it to deliver precise and dependable analysis results efficiently. Take, for
instance, Amazon's extensive inventory of over 560 million items and its user base
exceeding 300 million. Managing such a magnitude of transactions would be
impractical for human accountants or even a large team without the aid of AI
technology.
16
2.1.2 What Are Artificial Neural Networks?
The human brain possesses a unique ability to interpret real-world context and
situations, a skill that computers struggle to replicate. Artificial Neural Networks
(ANN) serve as an attempt to emulate the brain's functionality, enabling computers
to learn and make decisions akin to humans.
Illustrated below is an example with 1 input layer featuring 4 input units, followed
by 2 hidden layers. The first hidden layer consists of 4 neurons, while the second
contains 3 neurons. Lastly, there is 1 output layer housing 2 output units.
At the outset, a neuron consolidates inputs from all neurons in the preceding layer
to which it's linked. In the depicted scenario, the neuron accepts 3 inputs. These
17
inputs are individually multiplied by corresponding weights (w1, w2, w3) and then
summed. Weights denote the connection strengths between neurons and are fine-
tuned during learning. Furthermore, a bias value is incorporated into this
summation. Subsequently, the neuron applies an activation function to the resultant
value.
This process, known as forward pass or propagation, begins with the activations
of input units, which are then forwarded to the hidden layer to compute the
activations of the hidden layer. This process continues until the activations of the
output layer are computed.
18
2.1.3 Convolution Neural Network
In the illustration, we observe an RGB image dissected into its three color channels
— Red, Green, and Blue. Various color spaces exist in which images are
represented, such as Grayscale, RGB, HSV, CMYK, among others.
The function of the ConvNet is to condense the images into a format that is more
19
manageable for processing, while retaining crucial features essential for accurate
predictions. This becomes crucial when designing an architecture that excels not
only in feature learning but also in scalability to handle large datasets.
There are two possible outcomes of the convolution operation: one where the
convolved feature is dimensionally reduced compared to the input, and the other
where the dimensionality either increases or remains unchanged. This is achieved
through the application of Valid Padding for the former scenario, and Same
Padding for the latter.
There are two main types of Pooling: Max Pooling and Average Pooling. Max
Pooling selects the maximum value from the portion of the image covered by the
Kernel, while Average Pooling calculates the average of all values within the
Kernel's area.
The Convolutional Layer and the Pooling Layer collectively constitute the i-th
layer of a Convolutional Neural Network. Depending on the complexity of the
images, the number of these layers may be increased to capture low-level details
even further, albeit at the expense of increased computational power.
21
Incorporating a Fully-Connected layer provides a cost-effective means of learning
non-linear combinations of the high-level features extracted by the convolutional
layer. This layer is tasked with learning a potentially non-linear function within
that feature space.
After converting our input image into a suitable format for our Multi-Level
Perceptron, we proceed to flatten the image into a column vector. This flattened
output is then fed into a feed-forward neural network, with backpropagation
applied during each iteration of training. Through numerous epochs, the model
becomes adept at discerning dominant and subtle low-level features within images,
ultimately classifying them using the SoftMax Classification technique
2.1.4 MobileNetV2
In MobileNetV2, two types of blocks are utilized: residual blocks with a stride of
1 and blocks with a stride of 2 for downsizing.
23
Table 2.1: Architecture of Bottleneck Layer
For instance, if the input consists of 64 channels, the internal output will contain
64×t=64×6=384 channels.
Overall Architecture
For the primary network (width multiplier 1, input size 224×224), it incurs a
computational cost of 300 million multiply-adds and utilizes 3.4 million
parameters. (The width multiplier was introduced in MobileNetV1.)
24
The network's computational cost can reach up to 585M MAdds, while the model
size varies between 1.7M and 6.9M parameters.
- Object Detection: Locates objects using bounding boxes and predicts their types
or classes within an image.
- Input: An image with one or more objects, like a photograph.
- Output: One or more bounding boxes (defined by point, width, and height) and
a class label assigned to each bounding box.
25
2.2.1 Single Shot Detector (SSD)
The Single Shot Detector (SSD) is a technique for object detection in images that
relies on a single deep neural network. SSD discretizes the output space of
bounding boxes into a predefined set of default boxes across different aspect ratios.
These default boxes are then scaled per feature map location. By combining
predictions from multiple feature maps with varying resolutions, the SSD network
can effectively handle objects of different sizes. This approach enables efficient
and accurate detection of objects in images using a unified framework.
Advantages of SSD:
26
Chapter 3
METHODOLOGY
3.1 Outline
1. Training:
- In this phase, a Convolutional Neural Network (CNN) model is trained using
an image dataset.
- Hyperparameters are fine-tuned to enhance the accuracy of the model.
- During training, the model's parameters are iteratively updated based on the
dataset.
- Once the model achieves the desired accuracy, it is saved for future use.
2. Deployment:
27
- During the deployment phase, images are collected from various sources.
- A face detector locates faces within the images, defining a Region of Interest
(ROI).
- The ROI, containing the detected face, is extracted from the image and provided
as input to the face mask detector.
- The face mask detector then analyzes the ROI to detect whether a mask is
present or absent.
- The output of the face mask detector indicates whether the individual in the
ROI is wearing a mask or not.
3.2 Phase 1
3.2.1 Dataset Collection & pre-processing
Data was sourced from Kaggle.com, comprising a total of 690 images depicting
people wearing masks and 686 images of individuals without masks. To train the
model effectively, 80% of images from each class were allocated for training,
while the remaining 20% were reserved for testing. This partitioning ensured a
balanced representation of both classes in both the training and testing datasets.
Following the data preparation phase, the model was trained using the designated
training dataset. Subsequently, it was evaluated against a separate testing dataset
to assess its performance on unseen data. This approach helped validate the
model's ability to generalize well to new instances and produce accurate
predictions.
28
Given that the final layer of the neural network had two outputs corresponding to
the presence or absence of a mask (yielding a categorical representation), the data
labels were converted to categorical format accordingly.
Upon completion of training, the trained model was saved to disk for future use.
3.3 Phase 2
To integrate the Single Shot Detector (SSD) model for face detection, the model
along with its trained weights (parameters) is loaded into the application.
Subsequently, images or videos are captured from various sources such as CCTV
cameras or drone cameras.
Frames are extracted from the video using the OpenCV library, a popular computer
vision library in Python. These frames serve as the input to the SSD model for face
detection. The SSD model analyzes each frame to identify and localize faces within
the image.
The integration of the SSD model with the OpenCV library enables real-time face
detection from video streams or static images captured from different sources,
facilitating applications such as surveillance, monitoring, and security systems.
aids faster learning and enhances feature extraction for accurate face mask
detection.
29
3.3.3 Prediction
The extracted Region of Interest (ROI), containing the detected face, serves as
input to the face mask detector model. This model conducts classification on the
ROI, predicting whether the face is wearing a mask or not from the two classes:
Mask and Non-Mask. The results of the prediction are then displayed on the screen,
providing immediate feedback. Additionally, the results can be saved locally for
future reference or analysis.
30
Chapter 4
SOFTWARE & PACKAGES REQUIRED
The flexibility of Jupyter Notebooks makes it ideal for various data science
workflows, including data cleaning, statistical modeling, machine learning model
development and training, data visualization, and more. Its compartmentalized
structure, with code written in separate cells, facilitates the prototyping phase by
allowing users to execute individual code blocks independently. This enables
efficient testing and debugging without rerunning the entire script from the
beginning.
4.2 TensorFlow
TensorFlow 2.0 is a powerful, open-source platform for end-to-end machine
learning. It's renowned for its deep learning capabilities and ease of use, offering
efficient execution of tensor operations, automatic gradient computation,
scalability across devices, and the ability to export programs to various runtimes.
With support for multiple programming languages and environments, TensorFlow
ensures accessibility and versatility for developers.
4.3 Keras
Keras, the high-level API for TensorFlow 2.0, simplifies machine learning tasks
with its user-friendly interface. Designed for productivity, it offers clear APIs,
minimal user actions, and comprehensive documentation, making it a top choice
for deep learning projects. Its focus on ease of use and rapid experimentation
empowers developers to iterate quickly and stay ahead in innovation.
31
4.4OpenCV
OpenCV, or Open-Source Computer Vision Library, is a BSD-licensed software
library renowned for its comprehensive suite of over 2500 optimized algorithms.
These algorithms cover a broad spectrum of computer vision and machine learning
tasks, including face detection, object recognition, motion tracking, 3D modeling,
image stitching, and more. With a large and active user community exceeding 47
thousand individuals, OpenCV is widely utilized across industries, research
groups, and governmental organizations worldwide, making it a cornerstone in the
field of computer vision.
4.5Numpy
4.6Matplotlib
Matplotlib is a Python plotting library that interfaces seamlessly with NumPy, enabling
users to create visualizations for numerical data. It offers an object-oriented API for
integrating plots into applications via GUI toolkits like Tkinter, wxPython, Qt, or GTK+.
Additionally, it features a procedural "pylab" interface resembling MATLAB's state
machine, although its usage is discouraged. Matplotlib is commonly utilized in
conjunction with SciPy for scientific computing tasks.
4.7Sklearn
Scikit-learn (Sklearn) is a powerful Python library for machine learning tasks, offering a
wide range of tools for classification, regression, clustering, and dimensionality reduction.
It provides a consistent interface and efficient algorithms, making it a go-to choice for
both beginners and experts in the field. Leveraging the capabilities of NumPy, SciPy, and
Matplotlib, scikit-learn offers a comprehensive suite of machine learning and statistical
modeling capabilities.
32
Chapter 5
CRIMINAL DETECTION FROM LIVE VIDEO
FEED
5.1 Goal and Scope Definition
The primary objective of this research initiative is to transform crime detection and
suspect recognition methodologies through an innovative system. By combining
traditional surveillance methods with cutting-edge technologies, such as cloud
computing, machine learning, and deep learning, this project aims to improve the
effectiveness and efficiency of crime prevention and response strategies. The
scope of the project includes developing a comprehensive surveillance framework
that integrates advanced techniques to address multifaceted challenges in law
enforcement and security.
Through the integration of machine learning and deep learning algorithms, the
system seeks to enhance the accuracy of crime detection and suspect recognition.
By analyzing vast datasets and employing advanced pattern recognition
techniques, subtle cues indicative of criminal behavior can be identified with
greater precision. Additionally, the system aims to expedite response times by
promptly alerting relevant authorities.
34
35
Chapter 6
OBSERVATION & RESULTS
36
6.2 Observation on Video
That's a great way to demonstrate the real-time functionality of the model. It's
impressive to see how it accurately detects masks in live video streams, providing
valuable insights into its practical application.
6.3 Results
It's crucial to strike a balance between training accuracy and overfitting, and it
seems like your model has achieved that balance effectively. With a high
accuracy of 98.7% and an AUC of 0.985 on unseen test data, it demonstrates
strong generalization capabilities. The fluctuations in testing loss within an
acceptable range indicate the model's robustness in handling unseen data. This
thorough evaluation underscores the reliability and effectiveness of the
developed architecture.
37
Chapter 7
APPLICATION & LIMITATION
7.1 Applications:
There are several applications of face mask detection. Which includes the
applications which are specific to covid19 and others are not related to covid19.
So, let us take a look at all the applications of Face Mask Detection.
1. Hospitals
The Face Mask Detection System proves invaluable in hospital settings, where it can
ensure that healthcare workers adhere to safety protocols by wearing masks consistently.
Upon detecting any staff member without a mask, the system issues a prompt reminder to
wear one. Similarly, for quarantined individuals required to wear masks, the system
monitors compliance and automatically alerts or reports to authorities in cases of non-
compliance.
2. Airports
Deploying the Face Mask Detection System at airports enables the identification of
individuals without masks. Tourists' facial data can be captured upon entry, and if any
individual is detected without a mask, their image is promptly forwarded to airport
authorities for immediate action. Additionally, if the person's facial data is stored in the
database, such as airport employees, alerts can be directly sent to their phones for
immediate notification and compliance.
3. Offices:
Implementing the Face Mask Detection System in office areas ensures compliance
with safety standards among employees. It detects individuals without masks and
sends them reminders to wear one. Additionally, reports can be generated and
downloaded or sent via email at the end of the day to identify individuals who are
not adhering to the guidelines or requirements, facilitating enforcement of safety
protocols.
4. Educational Institutes:
Implementing face mask detection in classrooms enhances safety by ensuring consistent
mask usage among children, reducing the risk of viral infections.
7.2 Limitation
One of the challenges faced in the system is the inability to accurately detect face
coverings other than masks, such as hands or cloth. This can lead to misclassifications
where the system erroneously predicts the presence of a mask when there isn't one. As a
consequence, the accuracy and precision of the system may be compromised in such cases.
38
Chapter 8
CONCLUSION
8.1 Conclusion
To combat the spread of COVID-19, we developed a face mask detector using SSD
architecture and transfer learning with convolutional neural networks. Our dataset
comprised 690 masked and 686 unmasked faces sourced from Kaggle. We evaluated
various metrics and selected MobileNetV2 for its superior performance, boasting 100%
precision and 99% recall. Its computational efficiency makes it ideal for deployment in
embedded systems. Our detector can be deployed in malls, airports, and crowded areas to
enforce safety measures. It excels at recognizing blurred and side face images, a feat
unattainable by traditional models. Future applications include integration into home
security systems.
1. Faster Inference: We're speeding up our model to 15 FPS on CPUs for real-time
monitoring without GPUs.
4. Upgradable Models: Easy replacement of models for better accuracy and lower
latency, keeping our solution up-to-date.
39
References
1. M. M. Rahman, M. M. H. Manik, M. M. Islam, S. Mahmud, and J.-H. Kim, "An
Automated System to Limit COVID-19 Using Facial Mask Detection in Smart City
Network," in 2020 IEEE International IOT, Electronics and Mechatronics Conference
(IEMTRONICS), Vancouver, BC, Canada, 2020, pp. 1-5. doi:
10.1109/IEMTRONICS51293.2020.9216386.
3. H. C. Kim, D. Kim, and S. Y. Bang, "Face recognition using LDA mixture model," in
Pattern Recognition, Proceedings. 16th International Conference on, IEEE, 2002, pp. 486-
489.
4. L. Liu et al., "Deep Learning for Generic Object Detection: A Survey," Int. J. Comput.
Vis., vol. 128, no. 2, pp. 261-318, Sep. 2018.
5. S. Ali, S. A. Alvi, and A. Ur Rehman, "The Usual Suspects: Machine Learning Based
Predictive Policing for Criminal Identification," in 2019 13th International Conference on
Open Source Systems and Technologies (ICOSST), ISBN: 978-1-7281-4613-3, DOI:
10.1109/ICOSST48232.2019.9043925.
40