Smart Social Distancing Technique
Smart Social Distancing Technique
Smart Social Distancing Technique
A PROJECT REPORT
Submitted by
VINOTH D
412719405001
MASTER OF ENGINEERING
in
DECEMBER 2020
1
ANNA UNIVERSITY :: CHENNAI 600 025
BONAFIDE CERTIFICATE
SIGNATURE SIGNATURE
Dr.S. SURENDRAN, M.E.,Ph.D. Mr. SUDHEER REDDY BANDI,M.E.,
Department of CSE,
Department of CSE,
Tagore Engineering College,
Tagore Engineering College,
Rathinamangalam, Vandalur.
Rathinamangalam, Vandalur.
Chennai – 600127
Chennai – 600127
2
ACKNOWLEDGMENT
My heartfelt thanks go to Prof. Dr. M. Mala, M.A., M.Phil., Chairperson of Tagore Engineering
College, Rathinamangalam, for providing us with all necessary infrastructure and other facilities, for
their support to complete this project successfully.
I extend my sincere gratitude to Dr. L. Raja, M.E, Ph.D., Principal, Tagore Engineering College for
this degree of encouragement and moral support during the course of this project.
I am extremely happy for expressing my heartful gratitude to the Head of our department Dr. S.
SURENDRAN, M.E., Ph.D. for his valuable suggestions which helped us to complete the project
successfully.
I sincere gratitude to my supervisor Mr. SUDHEER REDDY BANDI, M.E., for extending all
possible help for this project work.
My sincere thanks to all teaching and non-teaching staff who have rendered help during various
stage of my work.
3
ABSTRACT
Social distancing, also called “physical distancing,” means keeping a safe space between yourself and
other people who are not from your household. To practice social or physical distancing, stay at least
6 feet (about 2 arms’ length) from other people who are not from your household in both indoor and
outdoor spaces. Social distancing should be practiced in combination with other everyday preventive
actions to reduce the spread of COVID-19.
Social distancing is crucial in our fight against COVID-19; it may remain an essential part of our
lives for months and reshape how we interact with the outside world forever. The current way of
social distancing at stores is not a viable long term solution, and we need to come up with a better
way to restore the casual life while ensuring the safety of everyone. The Smart Social Distancing
application uses AI to detect social distancing violations in real-time. It works with the existing
cameras installed at any workplace, hospital, school, or elsewhere.
To ensure users’ privacy, the Smart Social Distancing application avoids transmitting the videos over
the internet to the Cloud. Instead, it is designed to be deployed on small, low-power edge devices that
process the data locally. The Smart Social Distancing application uses edge IOT devices such as
NVIDIA Jetson Nano to monitor social distancing in real-time in a privacy preserving manner.
4
LIST OF ABBREVIATIONS
AI – Artificial Intelligence
IOT – Internet Of Things.
COCO – Common Objects in Context.
COVID-19 – Corona Virus 2019
UI – User Interface
5
TABLE OF CONTENTS
ABSTRACT ......................................................................................................................................... 4
LIST OF ABBREVIATIONS ............................................................................................................... 5
1 CHAPTER .................................................................................................................................. 8
INTRODUCTION ........................................................................................................................... 8
1. PURPOSE ............................................................................ Error! Bookmark not defined.
1.1 WHY SOCIAL DISTANCING ........................................................................................ 8
1.2 OBJECTIVE .................................................................................................................... 8
2 CHAPTER ................................................................................................................................ 10
LITERATURE STUDY ................................................................................................................ 10
2.1 INTRODUCTION ......................................................................................................... 10
3 CHAPTER 3 ............................................................................................................................. 12
PROBLEM DEFINITION ............................................................................................................ 12
3.1 Existing System.............................................................................................................. 12
3.2 Proposed System ............................................................................................................ 13
3.2.1 Advantages of Jetson Nano ................................................................................ 13
3.2.2 Disadvantages of Jetson Nano ........................................................................... 13
4 CHAPTER ................................................................................................................................ 14
SYSTEM REQUIREMENTS SPECIFICATION ......................................................................... 14
4.1 Hardware Requirements ................................................................................................. 14
4.2 Software Requirements .................................................................................................. 14
4.3 DOMAIN DESCRIPTION ............................................................................................ 14
4.3.1 Machine Learning .............................................................................................. 14
4.3.2 Applications of Machine Learning .................................................................... 15
4.3.3 Steps Involved in Machine Learning ................................................................. 15
4.3.4 Types of Learning .............................................................................................. 15
Supervised (inductive) learning ......................................................................... 15
Unsupervised learning........................................................................................................ 15
Reinforcement learning ...................................................................................................... 15
Supervised Learning .......................................................................................................... 15
Regression .......................................................................................................................... 15
Classification ...................................................................................................................... 15
Unsupervised Learning .................................................................................................. 16
Reinforcement Learning ................................................................................................ 16
4.4 LANGUAGE DESCRIPTION ...................................................................................... 16
4.4.1 About Python Language..................................................................................... 16
4.4.2 Python Programming Characteristics................................................................. 16
4.4.3 Applications of Python Programming Web Applications .................................. 17
Scientific and Numeric Computing.................................................................................... 17
Creating software Prototypes ............................................................................................. 17
Good Language to Teach Programming ............................................................................ 17
4.4.4 Data Set and Model ............................................................................................ 17
Common Object in Context ............................................................................................... 17
6
SSD Mobilenet V2 object detection on Jetson Nano at 20+ FPS ...................................... 22
4.4.5 Container ............................................................................................................ 22
Docker ................................................................................................................................ 23
5 CHAPTER ................................................................................................................................ 24
UML Representation ..................................................................................................................... 24
....................................................................................................................................................... 24
Sequence Diagram ......................................................................................................................... 25
....................................................................................................................................................... 25
Activity Diagram ........................................................................................................................... 26
6 CHAPTER ................................................................................................................................ 27
Core Architecture .......................................................................................................................... 27
6.1.1 Computer Vision Engine .................................................................................... 27
6.1.2 Data Pre-Processing ........................................................................................... 27
6.1.3 Model Inference ................................................................................................. 28
6.1.4 Bounding boxes post-processing ....................................................................... 28
6.1.5 Compute distances ............................................................................................. 28
6.1.6 User Interface (UI) and Web Application .......................................................... 29
7 CHAPTER ................................................................................................................................ 30
TESTING ...................................................................................................................................... 30
7.1.1 Software Testing................................................................................................. 30
General ............................................................................................................................... 30
7.1.2 Test Case ............................................................................................................ 30
7.1.3 Testing Techniques ............................................................................................. 30
8 CHAPTER ................................................................................................................................ 33
Smart Social Distancing roadmap ................................................................................................. 33
9 APPENDIX .............................................................................................................................. 34
Source Code: ................................................................................................................................. 34
SCREENSHOT ............................................................................................................................. 85
REFERENCES .............................................................................................................................. 86
7
CHAPTER 1
INTRODUCTION
Smart Social Distancing is to quantify social distancing measures using edge computer vision
systems. Since all computation runs on the device, it requires minimal setup and minimizes privacy
and security concerns. It can be used in retail, workplaces, schools, construction sites, healthcare
facilities, factories, etc.
We can run this application on edge devices such as NVIDIA's Jetson Nano. This application
measures social distancing rates and gives proper notifications each time someone ignores social
distancing rules. By generating and analyzing data, this solution outputs statistics about high-traffic
areas that are at high risk of exposure to COVID-19 or any other contagious virus.
COVID-19 spreads mainly among people who are in close contact (within about 6 feet) for a
prolonged period. Spread happens when an infected person coughs, sneezes, or talks, and droplets
from their mouth or nose are launched into the air and land in the mouths or noses of people nearby.
The droplets can also be inhaled into the lungs. Recent studies indicate that people who are infected
but do not have symptoms likely also play a role in the spread of COVID-19. Since people can spread
the virus before they know they are sick, it is important to stay at least 6 feet away from others when
possible, even if you—or they—do not have any symptoms. Social distancing is especially important
for people who are at higher risk for severe illness from COVID-19.
1.2 OBJECTIVE
Social distancing is crucial in our fight against COVID-19; it may remain an essential part of our
lives for months and reshape how we interact with the outside world forever. The current way of
social distancing at stores is not a viable long-term solution, and we need to come up with a better
way to restore the shopping experience while ensuring the safety of everyone.
We can always be smarter in socially distancing ourselves, make shopping more efficient and safer
during COVID19 and beyond. With the help of Artificial Intelligence (AI), the same technology that
is the backbone of self-driving Teslas and Netflix recommendations, combined with edge computing,
8
the technology that is reshaping the Internet of Things (IoT), we can practice social distancing with
minimal disruption to our daily lives. Imagine a seamless integration of social distancing to our
shopping experience powered by big data and AI. Using available data stores can better implement
social distancing. For example, stores can change aisle traffic in real-time, identify hotspots and re-
distribute products to eliminate them and vary the number of cashiers to reduce long wait times while
eliminating the risk of exposure to shoppers and workers.
9
CHAPTER 2
LITERATURE STUDY
2.1 INTRODUCTION
The purpose of literature survey is to give the complete information about the reference papers. The
goal of the literature review is to specify the technical related papers which form the foundation of
this project. Literature survey is the documentation of a comprehensive review of the published and
unpublished work from secondary sources data in the areas of specific interest to the researcher. The
library is a rich storage base for secondary data and researchers used to spend several weeks and
sometimes months going through books, journals, newspapers, magazines, conference proceedings,
doctoral dissertations, master's theses, government publications and financial reports to find
information on their research topic. With computerized databases now readily available and
accessible the literature search is much speedier and easier. The researcher could start the literature
survey even as the information from the unstructured and structured interviews is being gathered.
Reviewing the literature on the topic area at this time helps the researcher to focus further interviews
more meaningfully on certain aspects found to be important is the published studies even if these had
not surfaced duringthe earlier questioning. So, the literature survey is important for gathering the
secondary data for the research which might be proved very helpful in the research. The literature
survey can be conducted for several reasons. The literature review can be in any area of the business.
Year: 2020
Description:
The Smartphone app is able to identify people who have been in close proximity – within 2m for
at least 30 minutes – to corona virus patients using wireless Bluetooth technology, said its developers,
the Government Technology Agency (GovTech) and the Ministry of Health (MOH), on Friday (March
20). "This is especially useful in cases where the infected persons do not know everyone whom they
had been in close proximity with for an extended duration," said its developers.
While use of the app is not compulsory, those who use it have to turn on the Bluetooth settings in
their phones for tracing to be done. They also need to enable push notifications and location
permissions in the app, which is available on the Apple App Store or the Google Play store.
10
Title: Novel Economical Social Distancing Smart Device for COVID-19
Year: 2020
Description:
Spiritual intelligence is the science of human energy management that clarifies and in the era of
COVID-19 in which everywhere there is a panic like situation and according to the World Health
Organization Social Distancing will be proven to be the only solution. In this research paper, an
innovative localization method was proposing to track humans' position in an outdoor environment
based on sensors is proposed. With the help of artificial intelligence, this novel smart device is handy
for maintaining a social distancing as well as detecting COVID-19 symptom patients and thereby
safety. In these COVID-19 environments, where everyone is conscious about their safety, we came
up with the idea of this novel device. Most of the time, people on the roadside watched their front but
were not able to look after what is going on behind them. The device will give alert to the person if
someone in the critical range of six feet around him. The method is reasonably accurate and can be
very useful in maintaining social distancing. The sensor model used is described, and the expected
errors in distance estimates are analyzed and modeled. Finally, the experimental results are presented.
11
CHAPTER 3
PROBLEM DEFINITION
3.1 Existing System
Coronavirus Disease 2019 or COVID19 has caused chaos and fear all over the world. All non-
essential businesses and services have been shut down in 42 states due to the COVID19 pandemic,
and many businesses are struggling to survive. This pandemic will reshape how we live our lives for
many years to come. Essential businesses such as grocery stores and pharmacies remain open, but
they are hotspots for COVID19. The workers are at a high risk of contracting the virus, and so are the
shoppers. White House’s response coordinator Deborah Birx said in a statement that “This is the
moment not to be going to the grocery store, not going to the pharmacy, but doing everything you can
to keep your family and your friends safe.” People are afraid to go out shopping for their essential
needs, but with few alternatives, many have no choice but to make the trip. So how do we protect
shoppers and workers at the store?
So we are experiencing a deadly pandemic, but we still need access to essential goods and services.
We also know that, according to the Center for Disease Control (CDC), to stay safe during this
pandemic, we need to practice social distancing rules and keep a minimum distance of 6 feet from
people outside of our household. Essential businesses are allowed to be open so long that they follow
social distancing rules. There is a problem; however, stores have limited space, and people might
break social distancing rules without even realizing it. So how do we work with the limited options
we have to make shopping for essential goods safe?
In response to social distancing rules, stores were asked to limit the number of people who can be
inside at the same time. Shoppers are forced to keep their distance while waiting in line to get in or
during checkout. Some stores are marking where people should be standing, spacing everybody apart,
and some supermarkets are testing one-way aisles. Besides the entrance and checkout lines, there is
nowhere else in the store where these safety measures are enforced. Best we can hope for is that
shoppers will honor the social distancing rules while they shop. So far, creating long lines and
increasing wait time for getting groceries has been our best hope in minimizing the spread of
COVID19, but there are drawbacks to this approach. Longer wait times mean more time spent outside
the house, and more time spent in public means a higher risk of exposure to COVID19. Longer wait
times have also led to overbuying, which stresses the supply chain, results in waste, and causes
shortages for others. Many stores have brought in more staff to help manage the social distancing
etiquette. This is not only resource-intensive but puts the health of personnel at risk. More importantly,
12
what is the guarantee that all shoppers will always maintain a safe distance? Social distancing might
work in theory but might not be practical without the appropriate measures and tools. Aisles are
sometimes too narrow, high demand items might be placed next to each other, and shoppers might
simply forget or misjudge the safe distance, what if two customers want a product from the same
shelf? It will quickly become apparent that social distancing at stores is not as simple as limiting the
store’s total capacity. Ad hoc solutions to social distancing in the absence of relevant data could lead
to unexpected outcomes such as high-traffic areas and a higher risk of exposure.
Our approach uses artificial intelligence and edge AI devices such as Jetson Nano or Edge TPU to
track people in different environments and measure adherence to social distancing guidelines. It can
give proper notifications each time social distancing rules are violated. Our solution can be modified
to work in real-time by processing a USB or Camera Serial Interface. The demo runs on a video that
can be provided via the configuration file. The solution designed to run on small mini factor AI based
board. So we have consider using NVIDIA Jetson nano. NVIDIA® Jetson Nano™ lets you bring
incredible new capabilities to millions of small, power-efficient AI systems. It opens new worlds of
embedded IoT applications, including entry-level Network Video Recorders (NVRs), home robots,
and intelligent gateways with full analytics capabilities. Jetson Nano is also the perfect tool to start
learning about AI and robotics in real-world settings, with ready-to-try projects and the support of an
active and passionate developer community.
4 x USB 3.0 A. For Better Connectivity to Depth Cameras and External Accessories.
4K Video Processing Capability Unlike Raspberry Pi
Multiple Monitor can be Hooked Up
Select-able Power Source.
microSD as Main Storage Device Limits Disk Performance. Using the M.2 Key-E with
PCIe x1 Slot for SSD or USB HDD / SSD can Solve this Problem, Check this Solution.
Less Support for Softwares as Architecture is AArch64, much software will not work out
of the box.
13
CHAPTER 4
SYSTEM REQUIREMENTS SPECIFICATION
14
4.3.2 Applications of Machine Learning
Vision processing
Language processing
Pattern recognition
Games
Data mining
Expert systems
Robotics
Regression
Regression trains on and predicts a continuous-valued response, for example predicting real estate
prices.
Classification
It attempts to find the appropriate class label, such as analyzing positive/negative sentiment, male and
female persons, benign and malignant tumors, secure and unsecure loans etc.
15
Unsupervised Learning
Unsupervised learning is used to detect anomalies, outliers, such as fraud or defective equipment, or
to group customers with similar behaviors for a sales campaign. It is the opposite of supervised
learning. There is no labeled data here. Unsupervised learning algorithms are extremely powerful
tools for analyzing data and for identifying patterns and trends. They are most commonly used for
clustering similar input into logical groups. Unsupervised learning algorithms include Kmeans,
Random Forests, Hierarchical clustering and so on.
Reinforcement Learning
Here learning data gives feedback so that the system adjusts to dynamic conditions in order to achieve
a certain objective. The system evaluates its performance based on the feedback responses and reacts
accordingly. The best known instances include self-driving cars and chess master algorithm AlphaGo.
Decision making (robot, chess machine)
It provides rich data types and easier to read syntax than any other programming languages.
It is a platform independent scripted language with full access to operating system API's
Compared to other programming languages, it allows more run- time flexibility
A module in Python may have one or more classes and free functions
Libraries in Pythons are cross-platform compatible with Linux, Macintosh, and Windows
For building large applications, Python can be compiled to byte-code Python supports
functional and structured programming as well as OOP
16
It supports interactive mode that allows interacting Testing and debugging of snippets of code
In Python, since there is no compilation step, editing, debugging and testing is fast.
You can create scalable Web Apps using frameworks and CMS (Content Management System) that
are built on Python. Some of the popular platforms for creating Web Apps are: Django, Flask,
Pyramid, Plone, Django CMS. Sites like Mozilla, Reddit, Instagram and PBS are written in Python.
Object segmentation
Recognition in context
Super pixel stuff segmentation
330K images (>200K labeled)
1.5 million object instances
17
80 object categories
91 stuff categories
5 captions per image
250,000 people with key points
In today’s world of deep learning if data is King, making sure it is in the right format might just be
Queen. Or at least Jack or 10. Anyway, it is important. After working hard to collect your images and
annotating all the objects, you must decide what format you are going to use to store all that info.
This may not seem like a big decision compared to all the other things you have to worry about, but
if you want to quickly see how different models perform on your data, it’s vital to get this step right.
Back in 2014 Microsoft created a dataset called COCO (Common Objects in COntext) to help
advance research in object recognition and scene understanding. COCO was one of the first large
scale datasets to annotate objects with more than just bounding boxes, and because of that it became
a popular benchmark to use when testing out new detection models. The format COCO uses to store
annotations has since become a de facto standard, and if you can convert your dataset to its style, a
whole world of state-of-the-art model implementations opens.
This is where pycococreator comes in. pycococreator takes care of all the annotation formatting
details and will help convert your data into the COCO format. Let’s see how to use it by working
with a toy dataset for detecting squares, triangles, and circles.
The shapes dataset has 500 128x128px jpeg images of random colored and sized circles, squares, and
triangles on a random colored background. It also has binary mask annotations encoded in png of
each of the shapes. This binary mask format is fairly easy to understand and create. That’s why it’s
the format your dataset needs to be in before you can use pycococreator to create your COCO-styled
version. You might be thinking, “why not just use the png binary mask format if it’s so easy to
18
understand.” Remember, the whole reason we’re trying to make a COCO dataset isn’t because it’s
the best way of representing annotated images, but because everyone else is using it. The example
script we’ll use to create the COCO-style dataset expects your images and annotations to have the
following structure:
shapes
│
└───train
│
└───annotations
│ │ <image_id>_<object_class_name>_<annotation_id>.png
│ │ ...
│
└───<subset><year>
│ <image_id>.jpeg
│ ...
In the shapes example, subset is “shapes_train”, year is “2018”, and object_class_name is “square”,
“triangle”, or “circle”. You would generally also have separate “validate” and “test” datasets.
{
"info": info,
"licenses": [license],
"categories": [category],
"images": [image],
"annotations": [annotation]
}
The “info”, “licenses”, “categories”, and “images” lists are straightforward to create, but the
“annotations” can be a bit tricky. Luckily we have pycococreator to handle that part for us. Let’s start
out by getting the easy stuff out of the way first. We’ll describe our dataset using python lists and
dictionaries and later export them to json.
19
INFO = {
"description": "Example Dataset",
"url": "https://github.com/waspinator/pycococreator",
"version": "0.1.0",
"year": 2018,
"contributor": "waspinator",
"date_created": datetime.datetime.utcnow().isoformat(' ')
}
LICENSES = [
{
"id": 1,
"name": "Attribution-NonCommercial-ShareAlike License",
"url": "http://creativecommons.org/licenses/by-nc-sa/2.0/"
}
]
CATEGORIES = [
{
'id': 1,
'name': 'square',
'supercategory': 'shape',
},
{
'id': 2,
'name': 'circle',
'supercategory': 'shape',
},
{
'id': 3,
'name': 'triangle',
'supercategory': 'shape',
},
]
Okay, with the first three done we can continue with images and annotations. All we have to do is
loop through each image jpeg and its corresponding annotation pngs and let pycococreator generate
the correctly formatted items. Lines 90 and 91 create our image entries, while lines 112-114 take
care of annotations.
20
# filter for jpeg images
for root, _, files in os.walk(IMAGE_DIR):
image_files = filter_for_jpeg(root, files)
if 'square' in annotation_filename:
class_id = 1
elif 'circle' in annotation_filename:
class_id = 2
else:
class_id = 3
annotation_info = pycococreatortools.create_annotation_info(
segmentation_id, image_id, category_info, binary_mask,
image.size, tolerance=2)
if annotation_info is not None:
coco_output["annotations"].append(annotation_info)
There are two types of annotations COCO supports, and their format depends on whether the
annotation is of a single object or a “crowd” of objects. Single objects are encoded using a list of
points along their contours, while crowds are encoded using column-major RLE (Run Length
Encoding). RLE is a compression method that works by replaces repeating values by the number of
times they repeat. For example 0 0 1 1 1 0 1 would become 2 3 1 1. Column-major just means that
instead of reading a binary mask array left-to-right along rows, we read them up-to-down along
columns. The tolerance option in pycococreatortools.create_annotation_info() changes how precise
contours will be recorded for individual objects. The higher the number, the lower the quality of
annotation, but it also means a lower file size. 2 is usually a good value to start with. COCO uses
21
JSON (JavaScript Object Notation) to encode information about a dataset. There are several
variations of COCO, depending on if its being used for object instances, object keypoints, or image
captions. We’re interested in the object instances format which goes something like this:
SSD-MobileNet-v2 is build using dataset COCO on Jetson Nano with docker technology to make the
model more portable.
The mobilenet V2 image detection is built upon the latest JetPack 4.3 - L4T R32.3.1 base image. To
make an inference with TensorRT engine file, the two important Python packages are required,
TensorRT and Pycuda. Building Pycuda Python package from source on Jetson Nano might take some
time, so it is decided to pack the pre-build package into a wheel file and make the Docker build
process much smoother. Notice that Pycuda prebuilt with JetPack 4.3 is not compatible with older
versions of Jetpack and vers visa. As of the TensorRT python package, it came from the Jetson Nano
at directory usr/lib/python3.6/dist-packages/tensorrt/. This process helps to avoid the installation of
TensorFlow GPU python module as the TensorRT engine file on top of Jetpack 4.3.
Now, for the limitation of the TensorRT engine file approach. It simply will not work across different
JetPack version. The reason came from how the engine file is built by searching through CUDA
kernels for the fastest implementation available, and thus it is necessary to use the same GPU and
software stack(CUDA, CuDnn, TensorRT, etc.) for building like that on which the optimized engine
will run. TensorRT engine file is like a dress tailored exclusively for the setup, but its performance is
amazing when fitted on the right person/dev board.
Container
A container is a standard unit of software that packages up code and all its dependencies, so the
application runs quickly and reliably from one computing environment to another. A Docker container
image is a lightweight, standalone, executable package of software that includes everything needed
to run an application: code, runtime, system tools, system libraries and settings.
22
Docker
Docker is a tool designed to make it easier to create, deploy, and run applications by using containers.
Containers allow a developer to package up an application with all the parts it needs, such as libraries
and other dependencies, and deploy it as one package. By doing so, thanks to the container, the
developer can rest assured that the application will run on any other Linux machine regardless of any
customized settings that machine might have that could differ from the machine used for writing and
testing the code.
23
CHAPTER 5
UML Representation
The below diagram depicts the relationship between the actor and the procedure involved.
24
The sequence diagram is used to depict the set of events in an order. For example, the activities are
ordered in the way they occur. Here, they start from the multiple frame and the activities end in the
user interface.
Sequence Diagram
One Pre- COCO User
Multiple
Frame Interface
Frame processing Dataset
Calibration
Method
25
Activity Diagram
COCO Dataset
Object Detection
User Interface
26
CHAPTER 6
Core Architecture
Video One
Video Frame Preprocessing
Input
Frames
Model Inference
SSD-Mobilenet
Calculate Dist
Model
Post Processing
Visualization
The computational core of this application lies in this module. There is Distancing class defined
in the core.py file takes a video frames as the input and return the coordinates of each detected object
as well as a matrix of distances measured between each pair. The coordinates will further be used by
the update method in the webgui class to draw bounding boxes around each person.
To prepare each frame to enter the object detection model, we applied some pre-processing such as
re-sizing and RGB transformation to the frames.
resized_image=cv.resize(cv_image,tuple(self.image_size[:2]))
27
6.2.1 Model Inference
In the constructor, we have a detector attribute that specifies the object detection model. We used
SSD-MobileNet-v2 as the default model for this application. The average inference time on Jetson
Nano was 44.7 milliseconds (22 frames per second).
The object detection model is built based on the config file and should have an inference method that
takes a proper image as input and returns a list of dictionaries. Each dictionary of this list contains
the information of a detected object, i.e., bounding box coordinates and object id.
tmp_objects_list = self.detector.inference(rgb_resized_image)
Since we used a general-purpose object detector (trained on COCO with 80 different classes) that has
not been trained for our specific task, the output of this model needs partial post-processing to be
robust for our specific purpose.
We applied three post-processing filterings to the raw bounding boxes to eliminate large boxes,
collapse duplicated boxes for a single object, and keep track of moving objects in different frames.
New_objects_list = self.ignore_large_boxes(objects_list)
new_objects_list = self.non_max_suppression_fast(new_objects_list,
float(self.config.get_section_dict("PostProcessor")[ "NMSThreshold"]))
tracked_boxes = self.tracker.update(new_objects_list)
After post-processing, we need to calculate the distances between every pair of persons detected
in each frame. We use Python’s scipy library to calculate the distances between each pair of bounding
box centroids. It is clear that the distances matrix is symmetric and has zeros on the diagonal.
The UI section is responsible for drawing bounding boxes around the detected persons at each
frame. It uses different colors to show the distance between each pair of people.
The WebGUI object implements a Flask application and serves as an interface for the user. WebGUI
constructor takes config and engine_instance parameters as inputs and acts as the central application
for the output view.
Processing the video begins with the WebGUI start method. Within this method, the engine instance
calls process_video to process the video frame by frame. This method returns a list of dictionaries for
each detected object, a matrix of distances, and the image itself with the desired output resolution.
These values are then passed to the update method of the WebGUI class that draws bounding boxes
with proper colors around each object, i.e., person.
29
CHAPTER 7
TESTING
7.1 Software Testing
General
In a generalized way, we can say that the system testing is a type of testing in which the main aim is
to make sure that system performs efficiently and seamlessly. The process of testing is applied to a
program with the main aim to discover an unprecedented error, an error which otherwise could have
damaged the future of the software. Test cases which brings up a high possibility of discovering and
error is considered successful. This successful test helps to answer the still unknown errors.
Testing, as already explained earlier, is the process of discovering all possible weak points in the
finalized software product. Testing helps to counter the working of sub-assemblies, components,
assembly, and the complete result. The software is taken through different exercises with the main
aim of making sure that software meets the business requirement and user-expectations and does not
fails abruptly. Several types of tests are used today. Each test type addresses a specific testing
requirement.
A test plan is a document which describes approach, its scope, its resources and the schedule of aimed
testing exercises. It helps to identify almost other test item, the features which are to be tested, its
tasks, how will everyone do each task, how much the tester is independent, the environment in which
the test is taking place, its technique of design plus the both the end criteria which is used, also rational
of choice of theirs, and whatever kind of risk which requires emergency planning. It can be also
referred to as the record of the process of test planning. Test plans are usually prepared with
signification input from test engineers.
In unit testing, the design of the test cases is involved that helps in the validation of the
internal program logic. The validation of all the decision branches and internal code
takes place. After the individual unit is completed it takes place. Plus, it is considered
after the individual united is completed before integration. The unit test thus performs
the basic level test at its component stage and test the business process, system
30
configurations etc. The unit test ensures that the unique path of the process gets
performed precisely to the documented specifications and contains clearly defined
inputs with the results which are expected.
The functional tests help in providing the systematic representation that functions tested
are available and specified by technical requirement, documentation of the system and
the user manual.
System testing, as the name suggests, is the type of testing in which ensure that the
software system meet the business requirements and aim. Testing of the configuration is
taken place here to ensure predictable result and thus analysis of it. System testing is
relied on the description of process and its flow, stressing on pre driven process and the
points of integration.
The white box testing is the type of testing in which the internal components of the
system software is open and can be processed by the tester. It is therefore a complex
type of testing process. All the data structure, components etc. are tested by the tester
himself to find out a possible bug or error. It is used in situation in which the black box
is incapable of finding out a bug. It is a complex type of testing which takes more time
to get applied.
The black box testing is the type of testing in which the internal components of the
software is hidden and only the input and output of the system is the keyfor the tester to
31
find out a bug. It is therefore a simple type of testing. A programmer with basic
knowledge can also process this type of testing. It is less time consuming as compared
to the white box testing. It is very successful for software which are less complex are
straight-forward in nature. It is also less costly than white box testing.
User Acceptance Testing is a critical phase of any project and requires significant
participation by the end user. It also ensures that the system meets the functional
requirements.
32
CHAPTER 8
Smart Social Distancing roadmap
As the project is under substantial development, lots of ideas can help us improve application
performance and add other exciting features. Smart Social Distancing roadmap can best explain our
priorities in the future of this project:
Evaluate and benchmark different models.
Improve the distance calculation module by considering perspectives.
Provide a safety score to a given site and calculate useful statistics about a specific time
interval; for example, how many people entered the place, and how many times social
distancing rules were violated.
UI improvements: show statistical and historical data in the GUI.
Aid the program optimization by profiling different parts of the software, from the CV engine
to UI overheads.
Provide the ability for the user to customize and re-train models with task-specific datasets.
33
2 APPENDIX
Source Code:
Detector.py
import logging
logger = logging.getLogger(__name__)
class Detector:
"""
Detector class is a high level class for detecting object using NVIDIA jetson devices.
When an instance of the Detector is created you can call inference method and feed your
input image in order to get the detection results.
:param config: Is a ConfigEngine instance which provides necessary parameters.
"""
def __del__(self):
del self.net
34
def inference(self, resized_rgb_image):
"""
Run inference on an image and get Frames rate (fps)
Args:
resized_rgb_image: A numpy array with shape [height, width, channels]
Returns:
output: List of objects, each obj is a dict with two keys "id" and "bbox" and "score"
e.g. [{"id": 0, "bbox": [x1, y1, x2, y2], "score":s%}, {...}, {...}, ...]
"""
self.fps = self.net.fps
output = self.net.inference(resized_rgb_image)
return output
mobilenet_ssd_v2.py
import ctypes
import numpy as np
import tensorrt as trt
import pycuda.driver as cuda
import time
from ..utils.fps_calculator import convert_infr_time_to_fps
#import pycuda.autoinit # Required for initializing CUDA driver
import logging
logger = logging.getLogger(__name__)
class Detector:
"""
Perform object detection with the given prebuilt tensorrt engine.
:param config: Is a ConfigEngine instance which provides necessary parameters.
35
:param output_layout:
"""
def _load_plugins(self):
""" Required as Flattenconcat is not natively supported in TensorRT. """
ctypes.CDLL("/opt/libflattenconcat.so")
trt.init_libnvinfer_plugins(self.trt_logger, '')
def _load_engine(self):
""" Load engine file as a trt Runtime. """
trt_bin_path = '/repo/data/jetson/TRT_%s.bin' % self.model
with open(trt_bin_path, 'rb') as f, trt.Runtime(self.trt_logger) as runtime:
return runtime.deserialize_cuda_engine(f.read())
def _allocate_buffers(self):
"""
Create some space to store intermediate activation values.
Since the engine holds the network definition and trained parameters, additional space is
necessary.
"""
for binding in self.engine:
size = trt.volume(self.engine.get_binding_shape(binding)) * \
self.engine.max_batch_size
host_mem = cuda.pagelocked_empty(size, np.float32)
cuda_mem = cuda.mem_alloc(host_mem.nbytes)
self.bindings.append(int(cuda_mem))
if self.engine.binding_is_input(binding):
self.host_inputs.append(host_mem)
self.cuda_inputs.append(cuda_mem)
else:
self.host_outputs.append(host_mem)
36
self.cuda_outputs.append(cuda_mem)
del host_mem
del cuda_mem
logger.info('allocated buffers')
return
self.host_inputs = []
self.cuda_inputs = []
self.host_outputs = []
self.cuda_outputs = []
self.bindings = []
self._init_cuda_stuff()
def _init_cuda_stuff(self):
cuda.init()
self.device = cuda.Device(0) # enter your Gpu id here
self.cuda_context = self.device.make_context()
self.engine = self._load_engine()
37
self._allocate_buffers()
self.engine_context = self.engine.create_execution_context()
self.stream = cuda.Stream() # create a CUDA stream to run inference
def __del__(self):
""" Free CUDA memories. """
for mem in self.cuda_inputs:
mem.free()
for mem in self.cuda_outputs:
mem.free
del self.stream
del self.cuda_outputs
del self.cuda_inputs
self.cuda_context.pop()
del self.cuda_context
del self.engine_context
del self.engine
del self.bindings
del self.host_inputs
del self.host_outputs
@staticmethod
def _preprocess_trt(img):
""" Preprocess an image before TRT SSD inferencing. """
img = img.transpose((2, 0, 1)).astype(np.float32)
img = (2.0 / 255.0) * img - 1.0
return img
38
img_h, img_w, _ = img.shape
boxes, confs, clss = [], [], []
for prefix in range(0, len(output), self.output_layout):
# index = int(output[prefix+0])
conf = float(output[prefix + 2])
if conf < float(self.conf_threshold):
continue
x1 = (output[prefix + 3]) # * img_w)
y1 = (output[prefix + 4]) # * img_h)
x2 = (output[prefix + 5]) # * img_w)
y2 = (output[prefix + 6]) # * img_h)
cls = int(output[prefix + 1])
boxes.append((y1, x1, y2, x2))
confs.append(conf)
clss.append(cls)
return boxes, confs, clss
39
t_begin = time.perf_counter()
cuda.memcpy_htod_async(
self.cuda_inputs[0], self.host_inputs[0], self.stream)
self.engine_context.execute_async(
batch_size=1,
bindings=self.bindings,
stream_handle=self.stream.handle)
cuda.memcpy_dtoh_async(
self.host_outputs[1], self.cuda_outputs[1], self.stream)
cuda.memcpy_dtoh_async(
self.host_outputs[0], self.cuda_outputs[0], self.stream)
self.stream.synchronize()
inference_time = time.perf_counter() - t_begin # Seconds
return result
classifier.py
class Classifier:
"""
Classifier class is a high level class for classifying images using x86 devices.
When an instance of the Classifier is created you can call inference method and feed your
input image in order to get the classifier results.
:param config: Is a ConfigEngine instance which provides necessary parameters.
40
"""
def __init__(self, config):
self.config = config
self.name = self.config.get_section_dict('Classifier')['Name']
if self.name == 'OFMClassifier':
from libs.classifiers.x86 import face_mask
self.net = face_mask.Classifier(self.config)
else:
raise ValueError('Not supported network named: ', self.name)
loggers.py
import time
LOG_FORMAT_VERSION = "1.0"
class Logger:
"""logger layer to build a logger and pass data to it for logging
this class build a layer based on config specification and call update
method of it based on logging frequency
:param config: a ConfigEngine object which store all of the config parameters. Access to any
parameter
is possible by calling get_section_dict method.
"""
41
"""build the logger and initialize the frame number and set attributes"""
self.config = config
# Logger name, at this time only csv_logger is supported. You can implement your own logger
# by following csv_logger implementation as an example.
self.name = self.config.get_section_dict("Logger")["Name"]
if self.name == "csv_logger":
from . import csv_processed_logger
self.logger = csv_processed_logger.Logger(self.config, camera_id)
# Specifies how often the logger should log information. For example with time_interval of 0.5
# the logger log the information every 0.5 seconds.
self.time_interval = float(self.config.get_section_dict("Logger")["TimeInterval"]) # Seconds
self.submited_time = 0
# self.frame_number = 0 # For Logger instance from loggers/csv_logger
42
if time.time() - self.submited_time > self.time_interval:
objects = self.format_objects(objects_list)
self.logger.update(objects, distances, version=LOG_FORMAT_VERSION)
self.submited_time = time.time()
# For Logger instance from loggers/csv_logger
# region
# self.logger.update(self.frame_number, objects_list, distances)
# self.frame_number += 1
# end region
objects.append(obj)
return objects
43
csv_logger.py
import csv
import os
from datetime import date
from tools.objects_post_process import extract_violating_objects
import numpy as np
44
object_dict.update({str(key) + "_" + str(i): item})
else:
# TODO: Inspect why some items are float and some are np.float32
if isinstance(value, (float, np.float32)):
value = round(float(value), 4)
object_dict.update({key: value})
return object_dict
class Logger:
"""A CSV logger class that store objects information and violated distances information into csv
files.
This logger creates two csv file every day in two different directory, one for logging detected
objects
and one for logging violated social distancing incidents. The file names are the same as recording
date.
:param config: A ConfigEngine object which store all of the config parameters. Access to any
parameter
is possible by calling get_section_dict method.
"""
45
def update(self, frame_number, objects_list, distances):
"""Write the object and violated distances information of a frame into log files.
Args: frame_number: current frame number objects_list: A list of dictionary where each
dictionary stores
information of an object (person) in a frame. distances: A 2-d numpy array that stores distance
between each
pair of objects.
"""
file_name = str(date.today())
objects_log_file_path = os.path.join(self.objects_log_directory, file_name + ".csv")
distances_log_file_path = os.path.join(self.distances_log_directory, file_name + ".csv")
self.log_objects(objects_list, frame_number, objects_log_file_path)
self.log_distances(distances, frame_number, distances_log_file_path)
@staticmethod
def log_objects(objects_list, frame_number, file_path):
"""Write objects information of a frame into the object log file.
Each row of the object log file consist of a detected object (person) information such as
object (person) ids, bounding box coordinates and frame number.
Args: objects_list: A list of dictionary where each dictionary stores information of an object
(person) in a
frame. frame_number: current frame number file_path: log file path
"""
if len(objects_list) != 0:
object_dict = list(map(lambda x: prepare_object(x, frame_number), objects_list))
if not os.path.exists(file_path):
with open(file_path, "w", newline="") as csvfile:
field_names = list(object_dict[0].keys())
writer = csv.DictWriter(csvfile, fieldnames=field_names)
writer.writeheader()
46
with open(file_path, "a", newline="") as csvfile:
field_names = list(object_dict[0].keys())
writer = csv.DictWriter(csvfile, fieldnames=field_names)
writer.writerows(object_dict)
47
Csv_processed_logger.py
import csv
import os
from datetime import date, datetime
from tools.environment_score import mx_environment_scoring_consider_crowd
from tools.objects_post_process import extract_violating_objects
import itertools
import numpy as np
class Logger:
"""A CSV logger class that store objects information and violated distances information into csv
files.
This logger creates two csv file every day in two different directory, one for logging detected
objects
and violated social distancing incidents. The file names are the same as recording date.
:param config: A ConfigEngine object which store all of the config parameters. Access to any
parameter
is possible by calling get_section_dict method.
"""
os.makedirs(self.objects_log_directory, exist_ok=True)
48
def update(self, objects_list, distances, version):
"""Write the object and violated distances information of a frame into log files.
Args:
objects_list: List of dictionary where each dictionary stores information of an object
(person) in a frame.
distances: A 2-d numpy array that stores distance between each pair of objects.
"""
file_name = str(date.today())
objects_log_file_path = os.path.join(self.objects_log_directory, file_name + ".csv")
self.log_objects(version, objects_list, distances, objects_log_file_path)
49
# Get environment score
environment_score = mx_environment_scoring_consider_crowd(no_detected_objects,
no_violating_objects)
# Get timeline which is used for as Timestamp
now = datetime.now()
current_time = now.strftime("%Y-%m-%d %H:%M:%S")
file_exists = os.path.isfile(file_path)
with open(file_path, "a") as csvfile:
headers = ["Version", "Timestamp", "DetectedObjects", "ViolatingObjects",
"EnvironmentScore", "Detections", 'ViolationsIndexes']
writer = csv.DictWriter(csvfile, fieldnames=headers)
if not file_exists:
writer.writeheader()
writer.writerow(
50
from tools.objects_post_process import extract_violating_objects
from libs.utils import visualization_utils
from libs.utils.camera_calibration import get_camera_calibration_path
from libs.uploaders.s3_uploader import S3Uploader
import logging
logger = logging.getLogger(__name__)
class Distancing:
self.classifier = None
self.classifier_img_size = None
self.face_mask_classifier = None
self.running_video = False
self.tracker = CentroidTracker(
max_disappeared=int(self.config.get_section_dict("PostProcessor")["MaxTrackFrame"]))
self.camera_id = self.config.get_section_dict(source)['Id']
self.logger = Logger(self.config, self.camera_id)
self.image_size = [int(i) for i in self.config.get_section_dict('Detector')['ImageSize'].split(',')]
self.default_dist_method = self.config.get_section_dict('PostProcessor')["DefaultDistMethod"]
if self.config.get_section_dict(source)["DistMethod"]:
self.dist_method = self.config.get_section_dict(source)["DistMethod"]
else:
self.dist_method = self.default_dist_method
self.dist_threshold = self.config.get_section_dict("PostProcessor")["DistThreshold"]
51
self.resolution = tuple([int(i) for i in self.config.get_section_dict('App')['Resolution'].split(',')])
self.birds_eye_resolution = (200, 300)
if self.dist_method == "CalibratedDistance":
calibration_file = get_camera_calibration_path(
self.config, self.config.get_section_dict(source)["Id"])
try:
with open(calibration_file, "r") as file:
self.h_inv = file.readlines()[0].split(" ")[1:]
self.h_inv = np.array(self.h_inv, dtype="float").reshape((3, 3))
except FileNotFoundError:
logger.error("The specified 'CalibrationFile' does not exist")
logger.info(f"Falling back using {self.default_dist_method}")
self.dist_method = self.default_dist_method
self.screenshot_period = float(
self.config.get_section_dict("App")["ScreenshotPeriod"]) * 60 # config.ini uses minutes as
unit
self.bucket_screenshots = config.get_section_dict("App")["ScreenshotS3Bucket"]
self.uploader = S3Uploader(self.config)
self.screenshot_path =
os.path.join(self.config.get_section_dict("App")["ScreenshotsDirectory"], self.camera_id)
if not os.path.exists(self.screenshot_path):
os.makedirs(self.screenshot_path)
52
# Resize input image to resolution
cv_image = cv.resize(cv_image, self.resolution)
53
obj['face_label'] = face_mask_results[idx]
idx = idx + 1
else:
obj['face_label'] = -1
box = obj["bbox"]
x0 = box[1]
y0 = box[0]
x1 = box[3]
y1 = box[2]
obj["centroid"] = [(x0 + x1) / 2, (y0 + y1) / 2, x1 - x0, y1 - y0]
obj["bbox"] = [x0, y0, x1, y1]
obj["centroidReal"] = [(x0 + x1) * w / 2, (y0 + y1) * h / 2, (x1 - x0) * w, (y1 - y0) * h]
obj["bboxReal"] = [x0 * w, y0 * h, x1 * w, y1 * h]
54
playlist.m3u8 in the target directory. This file contains the list of generated video segments and
is updated
automatically.
This instance does not serve these video segments to the client. It is expected that the target
video directory
is being served by a static file server and the clientside HLS video library downloads
"playlist.m3u8". Then,
the client video player reads the link for video segments, according to HLS protocol, and
downloads them from
static file server.
:param feed_name: Is the name for video feed. We may have multiple cameras, each with
multiple video feeds (e.g. one
feed for visualizing bounding boxes and one for bird's eye view). Each video feed should be
written into a
separate directory. The name for target directory is defined by this variable.
:param fps: The HLS video player on client side needs to know how many frames should be
shown to the user per
second. This parameter is independent from the frame rate with which the video is being
processed. For example,
if we set fps=60, but produce only frames (by calling `.write()`) per second, the client will see
a loading
indicator for 5*60/30 seconds and then 5 seconds of video is played with fps 60.
:param resolution: A tuple of size 2 which indicates the resolution of output video.
"""
encoder = self.config.get_section_dict('App')['Encoder']
video_root = f'/repo/data/processor/static/gstreamer/{feed_name}'
shutil.rmtree(video_root, ignore_errors=True)
os.makedirs(video_root, exist_ok=True)
playlist_root = f'/static/gstreamer/{feed_name}'
if not playlist_root.endswith('/'):
playlist_root = f'{playlist_root}/'
55
# the entire encoding pipeline, as a string:
pipeline = f'appsrc is-live=true ! {encoder} ! mpegtsmux ! hlssink max-files=15 ' \
f'target-duration=5 ' \
f'playlist-root={playlist_root} ' \
f'location={video_root}/video_%05d.ts ' \
f'playlist-location={video_root}/playlist.m3u8 '
out = cv.VideoWriter(
pipeline,
cv.CAP_GSTREAMER,
0, fps, resolution
)
if not out.isOpened():
raise RuntimeError("Could not open gstreamer output for " + feed_name)
return out
56
elif self.device == 'x86':
from libs.detectors.x86.detector import Detector
from libs.classifiers.x86.classifier import Classifier
self.detector = Detector(self.config)
if 'Classifier' in self.config.get_sections():
self.classifier = Classifier(self.config)
self.classifier_img_size = [int(i) for i in
self.config.get_section_dict('Classifier')['ImageSize'].split(',')]
if self.device != 'Dummy':
print('Device is: ', self.device)
print('Detector is: ', self.detector.name)
print('image size: ', self.image_size)
input_cap = cv.VideoCapture(video_uri)
fps = max(25, input_cap.get(cv.CAP_PROP_FPS))
if (input_cap.isOpened()):
logger.info(f'opened video {video_uri}')
else:
logger.error(f'failed to load video {video_uri}')
return
self.running_video = True
57
(self.camera_id, self.resolution),
(self.camera_id + '-birdseye', self.birds_eye_resolution)
)
)
dist_threshold = float(self.config.get_section_dict("PostProcessor")["DistThreshold"])
class_id = int(self.config.get_section_dict('Detector')['ClassID'])
frame_num = 0
start_time = time.time()
while input_cap.isOpened() and self.running_video:
_, cv_image = input_cap.read()
birds_eye_window = np.zeros(self.birds_eye_resolution[::-1] + (3,), dtype="uint8")
if np.shape(cv_image) != ():
cv_image, objects, distancings = self.__process(cv_image)
output_dict = visualization_utils.visualization_preparation(objects, distancings,
dist_threshold)
category_index = {class_id: {
"id": class_id,
"name": "Pedestrian",
}} # TODO: json file for detector config
# Draw bounding boxes and other visualization factors on input_frame
visualization_utils.visualize_boxes_and_labels_on_image_array(
cv_image,
output_dict["detection_boxes"],
output_dict["detection_classes"],
output_dict["detection_scores"],
output_dict["detection_colors"],
category_index,
instance_masks=output_dict.get("detection_masks"),
use_normalized_coordinates=True,
58
line_thickness=3,
)
# TODO: Implement perspective view for objects
birds_eye_window = visualization_utils.birds_eye_view(birds_eye_window,
output_dict["detection_boxes"],
output_dict["violating_objects"])
try:
fps = self.detector.fps
except:
# fps is not implemented for the detector instance"
fps = None
59
# -_- -_- -_- -_- -_- -_- -_- -_- -_- -_- -_- -_- -_- -_-
# endregion
out.write(cv_image)
out_birdseye.write(birds_eye_window)
frame_num += 1
if frame_num % 100 == 1:
logger.info(f'processed frame {frame_num} for {video_uri}')
# Save a screenshot only if the period is greater than 0, a violation is detected, and the
minimum period has occured
if (self.screenshot_period > 0) and (time.time() > start_time + self.screenshot_period) and
(
len(violating_objects) > 0):
start_time = time.time()
self.capture_violation(f"{start_time}_violation.jpg", cv_image)
self.save_screenshot(cv_image)
else:
continue
self.logger.update(objects, distancings)
input_cap.release()
out.release()
out_birdseye.release()
del self.detector
self.running_video = False
def stop_process_video(self):
self.running_video = False
60
"""
this function post-process the raw boxes of object detector and calculate a distance matrix
for detected bounding boxes.
post processing is consist of:
1. omitting large boxes by filtering boxes which are bigger than the 1/4 of the size the image.
2. omitting duplicated boxes by applying an auxilary non-maximum-suppression.
3. apply a simple object tracker to make the detection more robust.
params:
object_list: a list of dictionaries. each dictionary has attributes of a detected object such as
"id", "centroid" (a tuple of the normalized centroid coordinates (cx,cy,w,h) of the box) and
"bbox" (a tuple
of the normalized (xmin,ymin,xmax,ymax) coordinate of the box)
returns:
object_list: the post processed version of the input
distances: a NxN ndarray which i,j element is distance between i-th and l-th bounding box
"""
new_objects_list = self.ignore_large_boxes(objects_list)
new_objects_list = self.non_max_suppression_fast(new_objects_list,
float(self.config.get_section_dict("PostProcessor")[
"NMSThreshold"]))
tracked_boxes = self.tracker.update(new_objects_list)
new_objects_list = [tracked_boxes[i] for i in tracked_boxes.keys()]
for i, item in enumerate(new_objects_list):
item["id"] = item["id"].split("-")[0] + "-" + str(i)
@staticmethod
61
def ignore_large_boxes(object_list):
"""
filtering boxes which are biger than the 1/4 of the size the image
params:
object_list: a list of dictionaries. each dictionary has attributes of a detected object such as
"id", "centroid" (a tuple of the normalized centroid coordinates (cx,cy,w,h) of the box) and
"bbox" (a tuple
of the normalized (xmin,ymin,xmax,ymax) coordinate of the box)
returns:
object_list: input object list without large boxes
"""
large_boxes = []
for i in range(len(object_list)):
if (object_list[i]["centroid"][2] * object_list[i]["centroid"][3]) > 0.25:
large_boxes.append(i)
updated_object_list = [j for i, j in enumerate(object_list) if i not in large_boxes]
return updated_object_list
@staticmethod
def non_max_suppression_fast(object_list, overlapThresh):
"""
omitting duplicated boxes by applying an auxilary non-maximum-suppression.
params:
object_list: a list of dictionaries. each dictionary has attributes of a detected object such
"id", "centroid" (a tuple of the normalized centroid coordinates (cx,cy,w,h) of the box) and
"bbox" (a tuple
of the normalized (xmin,ymin,xmax,ymax) coordinate of the box)
overlapThresh: threshold of minimum IoU of to detect two box as duplicated.
returns:
62
object_list: input object list without duplicated boxes
"""
# if there are no boxes, return an empty list
boxes = np.array([item["centroid"] for item in object_list])
corners = np.array([item["bbox"] for item in object_list])
if len(boxes) == 0:
return []
if boxes.dtype.kind == "i":
boxes = boxes.astype("float")
# initialize the list of picked indexes
pick = []
cy = boxes[:, 1]
cx = boxes[:, 0]
h = boxes[:, 3]
w = boxes[:, 2]
x1 = corners[:, 0]
x2 = corners[:, 2]
y1 = corners[:, 1]
y2 = corners[:, 3]
area = (h + 1) * (w + 1)
idxs = np.argsort(cy + (h / 2))
while len(idxs) > 0:
last = len(idxs) - 1
i = idxs[last]
pick.append(i)
xx1 = np.maximum(x1[i], x1[idxs[:last]])
yy1 = np.maximum(y1[i], y1[idxs[:last]])
xx2 = np.minimum(x2[i], x2[idxs[:last]])
yy2 = np.minimum(y2[i], y2[idxs[:last]])
63
h = np.maximum(0, yy2 - yy1 + 1)
# compute the ratio of overlap
overlap = (w * h) / area[idxs[:last]]
# delete all indexes from the index list that have
idxs = np.delete(idxs, np.concatenate(([last],
np.where(overlap > overlapThresh)[0])))
updated_object_list = [j for i, j in enumerate(object_list) if i in pick]
return updated_object_list
"""
This function calculates a distance l for two input corresponding points of two detected
bounding boxes.
it is assumed that each person is H = 170 cm tall in real scene to map the distances in the
image (in pixels) to
physical distance measures (in meters).
params:
first_point: (x, y, h)-tuple, where x,y is the location of a point (center or each of 4 corners of a
bounding box)
and h is the height of the bounding box.
second_point: same tuple as first_point for the corresponding point of other box
returns:
l: Estimated physical distance (in centimeters) between first_point and second_point.
"""
dx = xc2 - xc1
64
dy = yc2 - yc1
lx = dx * 170 * (1 / h1 + 1 / h2) / 2
ly = dy * 170 * (1 / h1 + 1 / h2) / 2
l = math.sqrt(lx ** 2 + ly ** 2)
return l
"""
This function calculates a distance matrix for detected bounding boxes.
Three methods are implemented to calculate the distances, the first one estimates distance with
a calibration matrix
which transform the points to the 3-d world coordinate, the second one estimates distance of
center points of the
boxes and the third one uses minimum distance of each of 4 points of bounding boxes.
params:
object_list: a list of dictionaries. each dictionary has attributes of a detected object such as
"id", "centroidReal" (a tuple of the centroid coordinates (cx,cy,w,h) of the box) and "bboxReal"
(a tuple
of the (xmin,ymin,xmax,ymax) coordinate of the box)
returns:
distances: a NxN ndarray which i,j element is estimated distance between i-th and j-th
bounding box in real scene (cm)
"""
if self.dist_method == "CalibratedDistance":
world_coordinate_points = np.array([self.transform_to_world_coordinate(bbox) for bbox in
nn_out])
if len(world_coordinate_points) == 0:
distances_asarray = np.array([])
65
else:
distances_asarray = cdist(world_coordinate_points, world_coordinate_points)
else:
distances = []
for i in range(len(nn_out)):
distance_row = []
for j in range(len(nn_out)):
if i == j:
l=0
else:
if (self.dist_method == 'FourCornerPointsDistance'):
lower_left_of_first_box = [nn_out[i]["bboxReal"][0], nn_out[i]["bboxReal"][1],
nn_out[i]["centroidReal"][3]]
lower_right_of_first_box = [nn_out[i]["bboxReal"][2], nn_out[i]["bboxReal"][1],
nn_out[i]["centroidReal"][3]]
upper_left_of_first_box = [nn_out[i]["bboxReal"][0], nn_out[i]["bboxReal"][3],
nn_out[i]["centroidReal"][3]]
upper_right_of_first_box = [nn_out[i]["bboxReal"][2], nn_out[i]["bboxReal"][3],
nn_out[i]["centroidReal"][3]]
lower_left_of_second_box = [nn_out[j]["bboxReal"][0],
nn_out[j]["bboxReal"][1],
nn_out[j]["centroidReal"][3]]
lower_right_of_second_box = [nn_out[j]["bboxReal"][2],
nn_out[j]["bboxReal"][1],
nn_out[j]["centroidReal"][3]]
upper_left_of_second_box = [nn_out[j]["bboxReal"][0],
nn_out[j]["bboxReal"][3],
nn_out[j]["centroidReal"][3]]
upper_right_of_second_box = [nn_out[j]["bboxReal"][2],
nn_out[j]["bboxReal"][3],
66
nn_out[j]["centroidReal"][3]]
l1 = self.calculate_distance_of_two_points_of_boxes(lower_left_of_first_box,
lower_left_of_second_box)
l2 = self.calculate_distance_of_two_points_of_boxes(lower_right_of_first_box,
lower_right_of_second_box)
l3 = self.calculate_distance_of_two_points_of_boxes(upper_left_of_first_box,
upper_left_of_second_box)
l4 = self.calculate_distance_of_two_points_of_boxes(upper_right_of_first_box,
upper_right_of_second_box)
l = self.calculate_distance_of_two_points_of_boxes(center_of_first_box,
center_of_second_box)
distance_row.append(l)
distances.append(distance_row)
distances_asarray = np.asarray(distances, dtype=np.float32)
return distances_asarray
67
Args:
bbox: a dictionary of a coordinates of a detected instance with "id",
"centroidReal" (a tuple of the centroid coordinates (cx,cy,w,h) of the box) and "bboxReal" (a
tuple
of the (xmin,ymin,xmax,ymax) coordinate of the box) keys
Returns:
A numpy array of (X,Y) of transformed point
"""
floor_point = np.array([int((bbox["bboxReal"][0] + bbox["bboxReal"][2]) / 2),
bbox["bboxReal"][3], 1])
floor_world_point = np.matmul(self.h_inv, floor_point)
floor_world_point = floor_world_point[:-1] / floor_world_point[-1]
return floor_world_point
@staticmethod
68
def anonymize_face(image):
"""
Blur an image to anonymize the person's faces.
"""
(h, w) = image.shape[:2]
kernel_w = int(w / 3)
kernel_h = int(h / 3)
if kernel_w % 2 == 0:
kernel_w -= 1
if kernel_h % 2 == 0:
kernel_h -= 1
return cv.GaussianBlur(image, (kernel_w, kernel_h), 0)
69
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
_TITLE_LEFT_MARGIN = 10
_TITLE_TOP_MARGIN = 10
STANDARD_COLORS = [
"Green",
"Blue"
]
def draw_bounding_box_on_image_array(
image,
ymin,
70
xmin,
ymax,
xmax,
color=(255, 0, 0), # RGB
thickness=4,
display_str_list=(),
use_normalized_coordinates=True,
):
"""Adds a bounding box to an image (numpy array).
Args:
image: a numpy array with shape [height, width, 3].
ymin: ymin of bounding box.
xmin: xmin of bounding box.
ymax: ymax of bounding box.
xmax: xmax of bounding box.
color: color to draw bounding box. Default is red.
thickness: line thickness. Default value is 4.
display_str_list: list of strings to display in box
(each to be shown on its own line).
use_normalized_coordinates: If True (default), treat coordinates
ymin, xmin, ymax, xmax as relative to the image. Otherwise treat
coordinates as absolute.
"""
image_pil = Image.fromarray(np.uint8(image)).convert("RGB")
draw_bounding_box_on_image(
image_pil,
ymin,
71
xmin,
ymax,
xmax,
color,
thickness,
display_str_list,
use_normalized_coordinates,
)
np.copyto(image, np.array(image_pil))
def draw_bounding_box_on_image(
image,
ymin,
xmin,
ymax,
xmax,
color=(255, 0, 0), # RGB
thickness=4,
display_str_list=(),
use_normalized_coordinates=True,
):
"""Adds a bounding box to an image.
72
Args:
image: a PIL.Image object.
ymin: ymin of bounding box.
xmin: xmin of bounding box.
ymax: ymax of bounding box.
xmax: xmax of bounding box.
color: color to draw bounding box. Default is red.
thickness: line thickness. Default value is 4.
display_str_list: list of strings to display in box
(each to be shown on its own line).
use_normalized_coordinates: If True (default), treat coordinates
ymin, xmin, ymax, xmax as relative to the image. Otherwise treat
coordinates as absolute.
"""
draw = ImageDraw.Draw(image)
im_width, im_height = image.size
if use_normalized_coordinates:
(left, right, top, bottom) = (
xmin * im_width,
xmax * im_width,
ymin * im_height,
ymax * im_height,
)
else:
(left, right, top, bottom) = (xmin, xmax, ymin, ymax)
draw.line(
[(left, top), (left, bottom), (right, bottom), (right, top), (left, top)],
width=thickness,
fill=color,
)
73
try:
font = ImageFont.truetype("arial.ttf", 24)
except IOError:
font = ImageFont.load_default()
# If the total height of the display strings added to the top of the bounding
# box exceeds the top of the image, stack the strings below the bounding box
# instead of above.
display_str_heights = [font.getsize(ds)[1] for ds in display_str_list]
# Each display_str has a top and bottom margin of 0.05x.
total_display_str_height = (1 + 2 * 0.05) * sum(display_str_heights)
74
font=font,
)
text_bottom -= text_height - 2 * margin
def draw_keypoints_on_image_array(
image, keypoints, color="red", radius=2, use_normalized_coordinates=True
):
"""Draws keypoints on an image (numpy array).
Args:
image: a numpy array with shape [height, width, 3].
keypoints: a numpy array with shape [num_keypoints, 2].
color: color to draw the keypoints with. Default is red.
radius: keypoint radius. Default value is 2.
use_normalized_coordinates: if True (default), treat keypoint values as
relative to the image. Otherwise treat them as absolute.
"""
image_pil = Image.fromarray(np.uint8(image)).convert("RGB")
draw_keypoints_on_image(
image_pil, keypoints, color, radius, use_normalized_coordinates
)
np.copyto(image, np.array(image_pil))
def draw_keypoints_on_image(
image, keypoints, color="red", radius=2, use_normalized_coordinates=True
):
"""Draws keypoints on an image.
Args:
75
image: a PIL.Image object.
keypoints: a numpy array with shape [num_keypoints, 2].
color: color to draw the keypoints with. Default is red.
radius: keypoint radius. Default value is 2.
use_normalized_coordinates: if True (default), treat keypoint values as
relative to the image. Otherwise treat them as absolute.
"""
draw = ImageDraw.Draw(image)
im_width, im_height = image.size
keypoints_x = [k[1] for k in keypoints]
keypoints_y = [k[0] for k in keypoints]
if use_normalized_coordinates:
keypoints_x = tuple([im_width * x for x in keypoints_x])
keypoints_y = tuple([im_height * y for y in keypoints_y])
for keypoint_x, keypoint_y in zip(keypoints_x, keypoints_y):
draw.ellipse(
[
(keypoint_x - radius, keypoint_y - radius),
(keypoint_x + radius, keypoint_y + radius),
],
outline=color,
fill=color,
)
Args:
image: uint8 numpy array with shape (img_height, img_height, 3)
mask: a uint8 numpy array of shape (img_height, img_height) with
76
values between either 0 or 1.
color: color to draw the keypoints with. Default is red.
alpha: transparency value between 0 and 1. (default: 0.4)
Raises:
ValueError: On incorrect data type for image or masks.
"""
if image.dtype != np.uint8:
raise ValueError("`image` not of type np.uint8")
if mask.dtype != np.uint8:
raise ValueError("`mask` not of type np.uint8")
if np.any(np.logical_and(mask != 1, mask != 0)):
raise ValueError("`mask` elements should be in [0, 1]")
if image.shape[:2] != mask.shape:
raise ValueError(
"The image has spatial dimensions %s but the mask has "
"dimensions %s" % (image.shape[:2], mask.shape)
)
rgb = ImageColor.getrgb(color)
pil_image = Image.fromarray(image)
def visualize_boxes_and_labels_on_image_array(
77
image,
boxes,
classes,
scores,
colors,
category_index,
instance_masks=None,
instance_boundaries=None,
keypoints=None,
use_normalized_coordinates=True,
max_boxes_to_draw=20,
min_score_thresh=0.0,
agnostic_mode=False,
line_thickness=4,
groundtruth_box_visualization_color="black",
skip_scores=False,
skip_labels=False,
):
"""Overlay labeled boxes on an image with formatted scores and label names.
Args:
image: uint8 numpy array with shape (img_height, img_width, 3)
boxes: a numpy array of shape [N, 4]
classes: a numpy array of shape [N]. Note that class indices are 1-based,
and match the keys in the label map.
scores: a numpy array of shape [N] or None. If scores=None, then
78
this function assumes that the boxes to be plotted are groundtruth
boxes and plot all boxes as black with no classes or scores.
colors: BGR fromat colors for drawing the boxes
category_index: a dict containing category dictionaries (each holding
category index `id` and category name `name`) keyed by category indices.
instance_masks: a numpy array of shape [N, image_height, image_width] with
values ranging between 0 and 1, can be None.
instance_boundaries: a numpy array of shape [N, image_height, image_width]
with values ranging between 0 and 1, can be None.
keypoints: a numpy array of shape [N, num_keypoints, 2], can
be None
use_normalized_coordinates: whether boxes is to be interpreted as
normalized coordinates or not.
max_boxes_to_draw: maximum number of boxes to visualize. If None, draw
all boxes.
min_score_thresh: minimum score threshold for a box to be visualized
agnostic_mode: boolean (default: False) controlling whether to evaluate in
class-agnostic mode or not. This mode will display scores but ignore
classes.
line_thickness: integer (default: 4) controlling line width of the boxes.
groundtruth_box_visualization_color: box color for visualizing groundtruth
boxes
skip_scores: whether to skip score when drawing a single detection
skip_labels: whether to skip label when drawing a single detection
Returns:
uint8 numpy array with shape (img_height, img_width, 3) with overlaid boxes.
"""
# Create a display string (and color) for every box location, group any boxes
# that correspond to the same location.
box_to_display_str_map = collections.defaultdict(list)
79
box_to_color_map = collections.defaultdict(str)
box_to_instance_masks_map = {}
box_to_instance_boundaries_map = {}
box_to_keypoints_map = collections.defaultdict(list)
if not max_boxes_to_draw:
max_boxes_to_draw = boxes.shape[0]
for i in range(min(max_boxes_to_draw, boxes.shape[0])):
if scores is None or scores[i] > min_score_thresh:
box = tuple(boxes[i].tolist())
if instance_masks is not None:
box_to_instance_masks_map[box] = instance_masks[i]
if instance_boundaries is not None:
box_to_instance_boundaries_map[box] = instance_boundaries[i]
if keypoints is not None:
box_to_keypoints_map[box].extend(keypoints[i])
if scores is None:
box_to_color_map[box] = groundtruth_box_visualization_color
else:
display_str = ""
if not skip_labels:
if not agnostic_mode:
if classes[i] in category_index.keys():
class_name = category_index[classes[i]]["name"]
else:
class_name = "N/A"
display_str = str(class_name)
if not skip_scores:
if not display_str:
display_str = "{}%".format(int(100 * scores[i]))
else:
display_str = "{}: {}%".format(
80
display_str, int(100 * scores[i])
)
box_to_display_str_map[box].append(display_str)
if agnostic_mode:
box_to_color_map[box] = "DarkOrange"
else:
box_to_color_map[box] = STANDARD_COLORS[
classes[i] % len(STANDARD_COLORS)
]
81
draw_keypoints_on_image_array(
image,
box_to_keypoints_map[box],
color=color,
radius=line_thickness / 2,
use_normalized_coordinates=use_normalized_coordinates,
)
return image
82
for i, obj in enumerate(nn_out):
# Colorizing bounding box based on the distances between them
# R = 255 when dist=0 and R = 0 when dist > dist_threshold
redness_factor = 1.5
r_channel = np.maximum(255 * (dist_threshold - distance[i]) / dist_threshold, 0) *
redness_factor
g_channel = 255 - r_channel
b_channel = 0
# Create a tuple object of colors
color = (int(b_channel), int(g_channel), int(r_channel))
# Get the object id
obj_id = obj["id"]
# Split and get the first item of obj_id
obj_id = obj_id.split("-")[0]
box = obj["bbox"]
if "score" in obj:
score = obj["score"]
else:
score = 1.0
# Append all processed items
detection_classes.append(int(obj_id))
detection_scores.append(score)
detection_boxes.append(box)
colors.append(color)
is_violating.append(True) if distance[i] < dist_threshold else is_violating.append(False)
output_dict["detection_boxes"] = np.array(detection_boxes)
output_dict["detection_scores"] = detection_scores
output_dict["detection_classes"] = detection_classes
output_dict["violating_objects"] = is_violating
output_dict["detection_colors"] = colors
return output_dict
83
def birds_eye_view(input_frame, boxes, is_violating):
"""
This function receives a black window and draw circles (based on boxes) at the frame.
Args:
input_frame: uint8 numpy array with shape (img_height, img_width, 3)
boxes: A numpy array of shape [N, 4]
is_violating: List of boolean (True/False) which indicates the correspond object at boxes is
a violating object or not
Returns:
input_frame: Frame with red and green circles
"""
h, w = input_frame.shape[0:2]
for i, box in enumerate(boxes):
center_x = int((box[0] * w + box[2] * w) / 2)
center_y = int((box[1] * h + box[3] * h) / 2)
center_coordinates = (center_x, center_y)
color = (0, 0, 255) if is_violating[i] else (0, 255, 0)
input_frame = cv.circle(input_frame, center_coordinates, 2, color, 2)
return input_frame
84
origin: Top-left corner of the text string in the image. The resolution should be normalized
between 0-1
fontscale: Font scale factor that is multiplied by the font-specific base size.
color: Text Color. (BGR format)
thickness: Thickness of the lines used to draw a text.
"""
resolution = input_frame.shape
origin = int(resolution[1] * origin[0]), int(resolution[0] * origin[1])
font = cv.FONT_HERSHEY_SIMPLEX
cv.putText(input_frame, txt, origin, font, fontscale,
color, thickness, cv.LINE_AA)
SCREENSHOT
85
REFERENCES
1. R. E. Park, (1924) "The concept of social distance: As applied to the study of racial
relations", Journal of applied sociology, vol. 8, pp. 339334.
2. N. Karakayali, (2009) "Social distance and affective orientations 1", Sociological Forum, vol.
24, no. 3, pp. 538562.
3. Ajay Rupani, Pawan Whig , Gajendra Sujediya and Piyush Vyas, (2017) ”A robust
technique for image processing based on interfacing of Raspberry-Pi and FPGA using IoT,”
International Conference on Computer, Communications and Electronics (Comptelix),
IEEE Xplore: 18 Augus
4. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A Large-Scale
Hierarchical Image Database,” in CVPR, 2009.
5. C. Lampert, H. Nickisch, and S. Harmeling, “Learning to detect unseen object classes by
between-class attribute transfer,” in CVPR, 2009.
6. O. Russakovsky, J. Deng, Z. Huang, A. Berg, and L. Fei-Fei, “Detecting avocados to
zucchinis: what have we done, and where are we going?” in ICCV, 2013.
7. C. Fellbaum, WordNet: An electronic lexical database. Blackwell Books, 1998.
8. Mark Sandler Andrew Howard Menglong Zhu Andrey Zhmoginov Liang-Chieh Chen Google
Inc. {sandler, howarda, menglong, azhmogin, lcchen}@google.com
86