1822pt2 B.tech It Batchno 317
1822pt2 B.tech It Batchno 317
1822pt2 B.tech It Batchno 317
at
SATHYABAMA INSTITUTE OF SCIENCE AND TECHNOLOGY
(Deemed to be University)
Submitted in partial fulfilment of the requirements for the
award of Bachelor of Technology in
Information Technology
By
SIVAKUMAR S
(REG NO 38120098)
DEPARTMENT OF Technology
SCHOOL OF COMPUTING
NOVEMBER 2021
i
ii
SATHYABAMA
INSTITUTE OF SCIENCE AND TECHNOLOGY
(DEEMED TO BE UNIVERSITY)
Accredited with Grade “A” by NAAC
(Established under Section 3 of UGC Act, 1956)
JEPPIAAR NAGAR, RAJIV GANDHI SALAI
CHENNAI– 600119
www.sathyabama.ac.in
BONAFIDE CERTIFICATE
This is to certify that this Project Report is the bonafide work ofSivakumar
S(38120098) who carried out the project entitled “Virtual Drag and Drop control
using Hand Tracking Module in Machine Learning” under my supervision from
June 2021 to November 2021.
Internal Guide
Dr.Sendurusrinivasulu, M.E., Ph.D.,
iii
Internal Examiner External Examiner
iv
DECLARATION
DATE:
PLACE:
1
ACKNOWLEDGEMENT
I would like to express my sincere and deep sense of gratitude to my Project Guide
Dr.Sendurusrinivasulu, M.E., Ph.D., for his valuable guidance, suggestions and
constant encouragement paved the way for the successful completion of my project
work.
I wish to express my thanks to all Teaching and Non-teaching staff members of the
Department of Information Technology who were helpful in many ways for the
completion of the project.
2
Training Certificate
3
ABSTRACT
4
TABLE OF CONTENTS
1. INTRODUCTION 1
5
4. RESULTS and DISCUSSIONS 12
4.1 Test and Results 12
5. CONCLUSION 13
5.1 Conclusion 13
APPENDIX 14
Screen shots 14
Source code 20
REFERENCE 34
6
LIST OF ABBREVIATIONS
ACRONYM EXPANSION
ML Machine Learning
MP Media Pipe
CV2 OpenCV
NP NumPy
7
CHAPTER - 1
INTRODUCTION
8
very significant for human-computer interaction. In this work, we present a novel real-
time method for hand gesture recognition for Volume control in a personal
computer.It is very useful in many ways for various kinds of people, it can be used in
a conference or in meeting to make it easier to control the audio, it is very helpful for
old people to control the audio without a physical interaction.
1.4.1 Advantages
The aim of the project is to create an easier way to communicate with the
system for controlling the volume of the media being played and also to create a
better HCI.We created this project to make easier way to control the audio of the
system and to utilize the hand gesture recognition in an efficient way. The further
9
aim to save time and energy of the people and to avoid physical contact with the
hardware devices.
1.6.1 GOALS
CHAPTER – 2
IMPLMENTATION
10
Fig.1
Fig.2
2.1.2 Inner Circle of the Maximal Radius.
When the palm point is found, it can draw a circle with the palm point as the
center point inside the palm. The circle is called the inner circle because it is
included inside the palm. The radius of the circle gradually increases until it reaches
the edge of the palm. That is the radius of the circle stops to increase when the black
pixels are included in the circle. The circle is the inner circle of the maximal radius
which is drawn as the circle with the red color in Figure2.
11
Fig.3
12
Fig.4
The centers of the fingers are lined to the palm point. Then, the degrees
between these lines and the wrist line are computed. If there is a degree smaller
than, it means that the thumb appears in the hand image. The corresponding center
is the center point of the thumb. The detected thumb is marked with the number 1. If
all the degrees are larger than, the thumb does not exist in the image.
2.2.2 Detection and Recognition of Other Fingers.
In order to detect and recognize the other fingers, the palm line is first
searched. The palm line parallels to the wrist line. The palm line is searched in the
way: start from the row of the wrist line. For each row, a line paralleling to the wrist
line crosses the hand. If there is only one connected set of white pixels in the
intersection of the line and the hand, the line shifts upward. Once there are more
than one connected sets of white pixels in the intersection of the line and the hand,
the line is regarded as a candidate of the palm line. In the case of the thumb not
detected, the line crossing the hand with more than one connected sets of white
pixels in their intersection is chosen as the palm line. In the case of the thumb
existing, the line continues to move upward with the edge points of the palm instead
of the thumb as the starting point of the line. Now, since the thumb is taken away,
there is only one connected set of pixels in the intersection of the line and the hand.
Once the connected set of white pixels turns to 2 again, the palm line is found. The
search of the palm line is shown in Figure 5.
Fig.5
After the palm line is obtained, it is divided into 4 parts. According to the horizontal
coordinate of the center point of a finger, it falls into certain parts. If the finger falls into
the first part, it is the forefinger. If the finger belongs to the second part, it is the middle
13
finger. The third part corresponds to the ring finger. The fourth part is the little finger.
The result of finger recognition of Figure 1 is demonstrated in Figure 6. In the figure, the
yellow line is the palm line and the red line parallels to the wrist line.
Fig.6
The sense of sight is arguably the most important of man's five senses. It
provides a huge amount of information about the world that is rich in detail and
delivered at the speed of light. However, human vision is not without its limitations,
both physical and psychological. Through digital imaging technology and computers,
man has transcended many visual limitations. He can see into far galaxies, the
microscopic world, the sub-atomic world, and even “observe” infra-red, x-ray,
ultraviolet and other spectra for medical diagnosis, meteorology, surveillance, and
military uses, all with great success. While computers have been central to this
success, for the most part man is the sole interpreter of all the digital data. For a long
time, the central question has been whether computers can be designed to analyse
and acquire information from images autonomously in the same natural way humans
can. According to Gonzales and Woods [2], this is the province of computer vision,
which is that branch of artificial intelligence that ultimately aims to “use computers to
emulate human vision, including learning and being able to make inferences and
tak[ing] actions based on visual inputs.” The main difficulty for computer vision as a
relatively young discipline is the current lack of a final scientific paradigm or model
for human intelligence and human vision itself on which to build a infrastructure for
computer or machine learning [3]. The use of images has an obvious drawback.
Humans perceive the world in 3D, but current visual sensors like cameras capture
the world in 2D images. The result is the natural loss of a good deal of information in
the captured images. Without a proper paradigm to explain the mystery of human
vision and perception, the recovery of lost information (reconstruction of the world)
from 2D images represents a difficult hurdle for machine vision [4]. However, despite
14
this limitation, computer vision has progressed, riding mainly on the remarkable
advancement of decades-old digital image processing techniques, using the science
and methods contributed by other disciplines such as optics, neurobiology,
psychology, physics, mathematics, electronics, computer science, artificial
intelligence and others. Computer vision techniques and digital image processing
methods both draw the proverbial water Real-Time Hand Gesture Detection and
Recognition Using Simple Heuristic Rules Page 3 of 57 from the same pool, which is
the digital image, and therefore necessarily overlap. Image processing takes a digital
image and subjects it to processes, such as noise reduction, detail enhancement, or
filtering, for the purpose of producing another desired image as the end result. For
example, the blurred image of a car registration plate might be enhanced by imaging
techniques to produce a clear photo of the same so the police might identify the
owner of the car. On the other hand, computer vision takes a digital image and
subjects it to the same digital imaging techniques but for the purpose of analysing
and understanding what the image depicts. For example, the image of a building can
be fed to a computer and thereafter be identified by the computer as a residential
house, a stadium, high-rise office tower, shopping mall, or a farm barn. [5] Russell
and Norvig [6] identified three broad approaches used in computer vision to distil
useful information from the raw data provided by images. The first is the feature
extraction approach, which focuses on simple computations applied directly to digital
images to measure some useable characteristic, such as size. This relies on
generally known image processing algorithms for noise reduction, filtering, object
detection, edge detection, texture analysis, computation of optical flow, and
segmentation, which techniques are commonly used to pre-process images for
subsequent image analysis. This is also considered an “uninformed” approach. The
second is the recognition approach, where the focus is on distinguishing and
labelling objects based on knowledge of characteristics that sets of similar objects
have in common, such as shape or appearance or patterns of elements, sufficient to
form classes. Here computer vision uses the techniques of artificial intelligence in
knowledge representation to enable a “classifier” to match classes to objects based
on the pattern of their features or structural descriptions. A classifier has to “learn”
the patterns by being fed a training set of objects and their classes and achieving the
goal of minimizing mistakes and maximizing successes through a step-by-step
process of improvement. There are many techniques in artificial intelligence that can
be used for object or pattern recognition, including statistical pattern recognition,
neural nets, genetic algorithms and fuzzy systems. The third is the reconstruction
approach, where the focus is on building a geometric model of the world suggested
by the image or images and which is used as a basis for action. This corresponds to
the stage of image understanding, which represents the highest and most complex
level of computer vision processing. Here the emphasis is on enabling the computer
vision system to construct internal models based on the data supplied by the images
and to discard or update these internal Real-Time Hand Gesture Detection and
Recognition Using Simple Heuristic Rules Page 4 of 57 models as they are verified
against the real world or some other criteria. If the internal model is consistent with
the real world, then image understanding takes place. Thus, image understanding
15
requires the construction, manipulation and control of models and at the moment
relies heavily upon the science and technology of artificial intelligence.
2.4 OpenCV
• width Step – an integer showing the number of bytes per image row
16
• Integral image function for summing subregions (computing Haar wavelets),
OpenCV also has an ML (machine learning) module containing well known statistical
classifiers and clustering tools. These include:
17
particular discrimination function used for separating the subsets from each other.
The ability of a classifier to classify objects based on its decision rule may be
understood as classifier learning, and the set of the feature vectors (objects) inputs
and corresponding outputs of classifications (both positive and negative results) is
called the training set. It is expected that a well-designed classifier should get 100%
correct answers on its training set. A large training set is generally desirable to
optimize the training of the classifier, so that it may be tested on objects it has not
encountered before, which constitutes its test set. If the classifier does not perform
well on the test set, modifications to the design of the recognition system may be
needed.
CHAPTER 3
SOFTWARE AND HARDWARE REQUIREMENTS
Extensions
The features that Visual Studio Code includes out-of-the-box are just the start.
VS Code extensions let you add languages, debuggers, and tools to your installation
to support your development workflow. VS Code's rich extensibility model lets
extension authors plug directly into the VS Code UI and contribute functionality
through the same APIs used by VS Code.
18
Version Control System
Visual Studio Code has integrated source control and includes Git support in-
the-box. Many other source control providers are available through extensions on the
VS Code Marketplace. VS Code has support for handling multiple Source Control
providers simultaneously. For example, you can open multiple Git repositories
alongside your TFS local workspace and seamlessly work across your projects. The
SOURCE CONTROL PROVIDERS list of the Source Control view (Ctrl+Shift+G)
shows the detected providers and repositories and you can scope the display of your
changes by selecting a specific provider. VS Code ships with a Git source control
manager (SCM) extension. Most of the source control UI and work flows are
common across other SCM extensions, so reading about the Git support will help
you understand how to use another provider.
3.1.2 LANGUAGES
Python
Python is an interpreted high-level general-purpose programming language.
Its design philosophy emphasizes code readability with its use of significant
indentation. Its language constructs as well as its object-oriented approach aim to
help programmers write clear, logical code for small and large-scale projects. Python
is dynamically-typed and garbage-collected. It supports multiple programming
paradigms, including structured (particularly, procedural), object-oriented
and functional programming. It is often described as a "batteries included" language
due to its comprehensive standard library.
19
CHAPTER – 4
4.1 CONCLUSION
Developed a system for “Virtual Drag and Drop control using Hand Tracking
Module in Machine Learning”The project helps the users to control volume by
using hand gesture which can be used in many situations and in any place.
It can be a step for time and energy saving for people and next move towards
virtual communication.
Anyone can use this facility and the system can run in background as well.
The project is completed with the objective of positive intent only.
The future work of the project goes by the aim of successfully attaining a
complete virtual environment with no hardware devices connected for any
operation to be done. Complete gesture control for mouse, keyboard and other
controls of a computer without any physical interaction.
CHAPTER – 5
Result snapshots:
20
21
22
23
importcv2
fromcvzone.HandTrackingModuleimportHandDetector
importcvzone
importos
cap = cv2.VideoCapture(0)
cap.set(3, 1280)
cap.set(4, 720)
detector = HandDetector(detectionCon=0.8)
classDragImg():
def__init__(self, path, posOrigin, imgType):
self.posOrigin = posOrigin
self.imgType = imgType
self.path = path
ifself.imgType == 'png':
self.img = cv2.imread(self.path, cv2.IMREAD_UNCHANGED)
else:
self.img = cv2.imread(self.path)
self.size = self.img.shape[:2]
defupdate(self, cursor):
ox, oy = self.posOrigin
h, w = self.size
# Check if in region
ifox<cursor[0] <ox + wandoy<cursor[1] <oy + h:
self.posOrigin = cursor[0] - w // 2, cursor[1] - h // 2
path = "ImagesPNG"
myList = os.listdir(path)
print(myList)
listImg = []
forx, pathImginenumerate(myList):
if'png'inpathImg:
imgType = 'png'
else:
imgType = 'jpg'
listImg.append(DragImg(f'{path}/{pathImg}', [50 + x * 300, 50], imgType))
24
whileTrue:
success, img = cap.read()
img = cv2.flip(img, 1)
hands, img = detector.findHands(img, flipType=False)
ifhands:
lmList = hands[0]['lmList']
# Check if clicked
length, info, img = detector.findDistance(lmList[8], lmList[12], img)
print(length)
iflength<60:
cursor = lmList[8]
forimgObjectinlistImg:
imgObject.update(cursor)
try:
forimgObjectinlistImg:
except:
pass
cv2.imshow("Image", img)
cv2.waitKey(1)
25
CHAPTER – 6
REFERENCES
26
27
28