Batch no_07
Batch no_07
A PROJECT REPORT
Submitted by
ANAND G 312321106016
DHANUSU MAHARAJAN M 312321106044
of
BACHELOR OF ENGINEERING
in
ELECTRONICS AND COMMUNICATION ENGINEERING
BONAFIDE CERTIFICATE
Certified that this project report “VISUAL TRACKPAD USING PYTHON” is the
SIGNATURE SIGNATURE
This report of project work submitted by the above student in partial fulfilment for the
award of Bachelor of Engineering in Electronics and Communication Engineering
degree in Anna University was evaluated and confirmed to be reports of the work done
by the above student.
The contentment and elation that accompany the successful completion of any work
would be incomplete without mentioning the people who made it possible.
Words are inadequate in offering our sincere thanks and gratitude to our respected
Chairman Dr. B. Babu Manoharan, M.A., M.B.A., Ph.D. for his outstanding
leadership, unwavering dedication, and invaluable contributions to our organization.
His vision, guidance, and commitment have been instrumental in shaping our success
and driving us towards excellence.
We express our sincere gratitude to our beloved Managing Director Mr. B. Shashi
Sekar, M.Sc., (Intl. Business) and Executive Director Mrs. B. Jessie Priya,
M.Com., (Commerce) for their support and encouragement.
We also thank our beloved principal Dr. Vaddi Seshagiri Rao, M.E, MBA, Ph.D.,
for having encouraged us to do our under graduation in Electronics and
Communication Engineering in this esteemed institution.
We also express our sincere thanks and most heartfelt sense of gratitude to our eminent
Head of the Department, Dr. S. Rajesh Kannan, M.E., Ph.D., for having extended
his helping hand at all times.
We humbly express our profound gratitude for the invaluable guidance and expert
suggestions generously shared by our supervisor, Dr. Shirley Selvan
,M.E., Ph.D.
We thank all staff members of our department, our family members and friends who
have been the greatest source of support to us.
ABSTRACT
There are different reasons for which people need an artificial of locomotion
such as a virtual keyboard. The main reason we need some alternative and innovative
ideas which make our work easier. The project's main objectives are to design an
accurate and efficient eye-tracking system, develop software to process the captured
data, and integrate the software with the computer's operating system to allow hands-
free mouse control. The system comprises of an eye tracking device and a software
module. The eye-tracking device captures the user's eye movements, while the
software module analyses the captured data to determine the intended mouse
movements. The software module then communicates the mouse movements to the
computer's operating system, allowing the user to interact with the computer hands-
free. The project involves the use of machine learning algorithms and computer
vision techniques to improve the accuracy of the eye- tracking system. The number
of people, who need to move around with the help of some article means, because of
an illness. Moreover, implementing a controlling system in it enables them to move
without the help of another person is very helpful. The idea of eye controls of great
use to not only the future of natural input but more importantly the handicapped and
disabled. Camera will be capturing the image of eye movement. First detect pupil
centre position of eye. Then the different variation on pupil position get different
command set for virtual keyboard. The signals pass the motor driver to interface with
the virtual keyboard itself. The motor driver will control both speed and direction to
enable the virtual keyboard to move forward, left, right and stop.
TABLE OF CONTENTS
CHAPTER NO TITLE PAGE NO
ABSTRACT IV
LIST OF ABBREVIATIONS VII
1 INTRODUCTION 1
1.1 MACHINE LEARNING 2
1.2 MACHINE LEARNING APPLICATIONS 3
1.3 SENSOR TECHNOLOGY IN AUTONOMOUS 4
SYSTEMS
1.4 INTEGRATION WITH AI AND MACHINE 5
LEARNING
1.5 PRACTICAL APPLICATION 6
2 LITERATURE REVIEW 8
3 METHODOLOGY 14
3.1 REQUIREMENT ANALYSIS 15
3.2 SOFTWARE REQUIREMENTS 17
4 RESULT AND DISCUSSIONS 21
5 CONCLUSION 24
SOURCE CODE 25
REFERENCES
LIST OF ABBREVIATIONS
1
movement using OpenCV methodology. And how the cursor is moving using eyeball
with example with the better solutions. And, also the Conclusion part is present
Machine learning is a very hot topic for many key reasons, and because it
provides the ability to automatically obtain deep insights, recognize unknown
patterns, and create high performing predictive models from data, all without
requiring explicit programming instructions.
Imagine a dataset as a table, where the rows are each observation (aka
measurement, data point, etc), and the columns for each observation represent the
features of that observation and their values.
At the outset of a machine learning project, a dataset is usually split into two
or three subsets. The minimum subsets are the training and test datasets, and often an
optional third validation dataset is created as well.
Once these data subsets are created from the primary dataset, a predictive
model or classifier is trained using the training data, and then the model’s predictive
accuracy is determined using the test data.
2
As mentioned, machine learning leverages algorithms to automatically model
and find patterns in data, usually with the goal of predicting some target output or
response. These algorithms are heavily based on statistics and mathematical
optimization.
As we move forward into the digital age, one of the modern innovations we’ve
seen is the creation of Machine Learning. This incredible form of artificial
intelligence is already being used in various industries and professions. For Example,
Image and Speech Recognition, Medical Diagnosis, Prediction, Classification,
Learning Associations, Statistical Arbitrage, Extraction, Regression. Today we’re
looking at all these Machine Learning Applications in today’s modern world.
1.2.1.Image Recognition
It is one of the most common machine learning applications. There are many
situations where you can classify the object as a digital image. For digital images,
the measurements describe the outputs of each pixel in the image.
In the case of a black and white image, the intensity of each pixel serves as
one measurement. So, if a black and white image has N*N pixels, the total number
of pixels and hence measurement is N2.
3
In the coloured image, each pixel considered as providing 3 measurements of the
intensities of 3 main colour components i.e. RGB. So, N*N coloured image there are
3 N2 measurements.
For face detection – The categories might be face versus no face present.
There might be a separate category for each person in a database of several
individuals.
For character recognition – It can segment a piece of writing into smaller
images, each containing a single character. The categories might consist of
the 26 letters of the English alphabet, the 10 digits, and some special
characters.
4
1.3.2 Advantages Of Ultrasonic Sensors
Ultrasonic sensors offer several advantages, including non-contact operation,
high accuracy, and reliability in various environmental conditions. These qualities
make them suitable for applications where precise distance measurement is required,
such as robotic navigation and object detection. While not inherently tied to visual
trackpads, these advantages highlight the potential for integrating ultrasonic sensors
into multi-modal input systems to enhance user interaction and expand functionality.
5
sophisticated interaction capabilities, such as gesture-based commands, multi-touch
gestures, and gesture recognition in diverse environments and lighting conditions.
Machine learning algorithms can also adapt and improve over time based on user
feedback, further enhancing the user experience.
A practical application for a visual trackpad using Python could be its integration
into interactive kiosks or digital signage systems. Here's how it could be implemented:
1 Interactive Kiosks:
Wayfinding and Navigation: In a shopping mall or large public space, users can interact
with a visual trackpad to navigate through maps, find directions to specific stores or
facilities, and explore points of interest.
Product Catalog Browsing: In retail environments, users can browse through product
catalogs by swiping through images or categories using hand gestures on the visual
trackpad. They can select items for more information or to make a purchase.
Virtual Tour Guides: In tourist attractions or historical sites, users can use the
visual trackpad to access virtual tour guides, view augmented reality overlays, and
interact with interactive maps or exhibits.
6
Information Display: In corporate or educational environments, digital signage displays
can provide interactive information displays, allowing users to browse through event
schedules, campus maps, or company announcements using hand gestures on the visual
trackpad.
7
CHAPTER 2
LITERATURE REVIEW
2.1Hand Gesture Recognition Based Virtual Mouse Events
2.2 Virtual Mouse Control Using Coloured Finger Tips And Hand Gesture
Recognition
8
this algorithm in real world scenarios. A method for on-screen cursor control without
any physical connection to a sensor is presented. Identification of coloured caps on
the fingertips and their tracking is involved in this work. Different hand gestures can
be replaced in place of coloured caps for the same purpose. Different operations of
mouse controlled are single left click, double left click, right click and scrolling.
Various combinations of the coloured caps are used for different operations. Range
of skin colour can be varied in the program in accordance with the person to be used,
surrounding lightening conditions. An approximate area ratio that is not being used
by the hand in the convex hull is taken after analysing the program output at different
gestures of the hand. This work can be used in various real time applications like
cursor control in a computer, android based smart televisions etc. Although there are
devices like mouse and laser remotes for the same purpose, this work is so simple so
that it reduces the usage of external hardware in such a way that the motion of fingers
in front of a camera will result in the necessary operation on the screen.
2.3 Mouse On A Ring: A Mouse Action Scheme Based On Imu And Multi-Level
Decision Algorithm
Yuliang Zhao (2022) determined that the traditional mouse has been used as
a main tool for human-computer interaction for more than 50 years. However, it has
become unable to cater to people’s need for mobile officing and all-weather use due
to its reliance on the support of a two-dimensional plane, poor portability,
wearisomeness, and other problems. In this paper, they proposed a portable ring- type
wireless mouse scheme based on IMU sensors and a multi-level decision algorithm.
The user only needs to operate in the air with a smart ring worn on the middle finger
of their right hand to realize the interactive function of a mouse. The smart ring first
captures changes in the finger’s attitude angle to reflect how the cursor position
changes. And, it captures the rapid rotation of the user’s palm to the left and right to
achieve mouse clicking. In addition, a multi-level decision algorithm is developed to
9
improve the response speed and recognition accuracy of the virtual mouse. The
experimental results show that the virtual mouse has a target selection accuracy of
over 96%, which proves its practicability in real-world applications. This virtual
mouse is expected to be used as a portable and reliable tool for multi-scenario human-
computer interaction applications in the future. In this paper, they proposed a ring-
type wireless mouse scheme based on IMU sensors and a multi-leveldecision
algorithm. This scheme does not require the support of a 2D desktop. The user can
realize the interactive function of a traditional mouse in the air by simply wearing a
smart ring on the middle finger of their right hand. First, the smart ring captures
changes in the finger’s attitude angle to control how the cursor moves. This reduces
the dependence on the environment and allows for unrestricted interaction in multiple
scenarios. Second, the smart ring, which is integrated on the ring carrier without extra
auxiliary sensors, captures the rapid rotation of the user’s palm to the left and right
to achieve mouse clicking. This design greatly reduces the virtual mouse’s size and
improves its portability. Finally, considering that the increase in user motion types
affects the recognition performance, a multilevel decision algorithm is developed
based on the hierarchy in the data set and used to improve the response speed and
recognition accuracy of the virtual mouse. The experimental results show that the
target selection accuracy of the virtual mouse is over 96%, and its target selection
time and path efficiency are very close to those of a standard mouse. These findings
demonstrate the effectiveness and usability ofthe virtual mouse in real-world
applications. They believe that this virtual mouse will provide a portable, highly
accurate, and responsive solution to support the needs for multi-scenario human-
computer interaction in the future.
10
virtual keyboard. They compared different typing methods that use the dwell time of
the pointer as a character selection method. The proposed solution can be used in
different applications, such as: computer gaming, virtual reality, remote control and
to facilitate communication with people affected by certain disabilities and can easily
accommodate more types of users. The performances of different methods have been
analysed by using the speed of typing measured in words per minute, the task success
rate, as well as the feedback from the user regarding comfort and perceived ease-of-
use. In this paper, a robust method of typing based on a wireless gyro- mouse has
been presented. The proposed interface includes a specialized virtual keyboard with
dwell time as a selection method and a wireless gyro-mouse based on an IMU sensor
that combines an accelerometer and gyroscope in a single package. The proposed
device has been tested on 10 healthy subjects, students of “Gheorghe Asachi”
Technical University of Iasi, Romania. According to the obtained results, all subjects
succeeded in using the device as intended, and obtained good performances in the
typing test. All subjects rapidly and easily learned to use the device and succeed in
completing the required tasks. The best performance results have been obtained for
a dwell time of 750 ms, where the average speed of typing for all subjects was
4.1WPM.
2.5 Mouse On A Ring: A Mouse Action Scheme Based On Imu And Multi-Level
Decision Algorithm
Nowadays computer vision has reached its pinnacle, where a computer can
identify its owner using a simple program of image processing. In this stage of
development, people are using this vision in many aspects of day to day life, like
Face Recognition, Colour detection, Automatic car, etc. In this project, computer
vision is used in creating an Optical mouse and keyboard using hand gestures. The
camera of the computer will read the image of different gestures performed by a
person's hand and according to the movement of the gestures the Mouse or the cursor
11
of the computer will move, even perform right and left clicks using different gestures.
Similarly, the keyboard functions may be used with some different gestures, like
using one finger gesture for alphabet select and four-figure gesture to swipe left and
right.
Anjith George, Aurobinda Routray (2016) determine the estimation eye gaze
direction is useful in various human-computer interaction tasks. Knowledge of gaze
direction can give valuable information regarding users point of attention. Certain
patterns of eye movements known as eye accessing cues are reported to be related to
the cognitive processes in the human brain. They proposed a real-time framework for
the classification of eye gaze direction and estimation of eye accessing cues. In the
first stage, the algorithm detects faces using a modified version of the Viola- Jones
algorithm. A rough eye region is obtained using geometric relations and facial
landmarks. The eye region obtained is used in the subsequent stage to classify the
eye gaze direction. A convolutional neural network is employed in this work for the
classification of eye gaze direction. The proposed algorithm was tested on Eye
Chimera database and found to outperform state of the art methods. The
computational complexity of the algorithm is very less in the testing phase. The
algorithm achieved an average frame rate of 24 fps in the desktop environment. In
this work, a framework for real-time classification of eye gaze direction is presented.
The estimated eye gaze direction is used to infer eye accessing cues, giving
information about the cognitive states. The computational complexity is very less;
They achieved frame rates around 24 Hz in Python implementation in a 2.0 GHz
Core i5 PC running Ubuntu 64 bit (4GB RAM). The per-frame computational time
is 42 ms, which is much less than that of the other state of the art methods (250 ms in
12
[20]). Off the shelf webcams can be used for computing the Eye gaze direction. The
proposed algorithm works even with in plane rotations of the face. The eye gaze
direction obtained can also be used for human-computer interaction applications. The
computational complexity of the algorithm in testing phase is less, which makes it
suitable for smart devices with low resolution cameras using pre-trained models.
13
CHAPTER 3
METHODOLOGY
BLOCK DIAGRAM
Image Capturing
Face Detection
Eye Detection
Fig.3.0.Represents on how the camera detects the user and approach its functions.
The user has to sit in front of the display screen of a private computer with a specialised
video camera established above the screen to study the consumer’s eyes. The laptop
constantly analyses the video photo of the attention and determines where the consumer
is calling on the display screen. Nothing is attached to the consumer’s head or body. To
Frames. To "pick out“ any key, the user looks at it for a specific amount of time, and to
"press“ any key, the consumer simply blinks the eye. On this device, calibration
14
procedures are not required. For this system, the entry is the easiest. No Outside Hardware
Is Connected or Required. The camera gets its input from the eye. After receiving these
streaming movies from thecameras, it’ll spoil Into Frames. After Receiving Frames, It
Will Check for Lighting Conditions Because Cameras Require Enough Lighting Fixtures
from External Sources, in Any Other Case, Blunders Will Show on the Screen.The
Movement of the Pupil is detected and it determines where the curser move. Blinking of
left eye results in left click on virtual mouse pad.
- Evaluate and select appropriate computer vision libraries for gesture recognition and
image processing tasks.
- Consider factors such as compatibility with Python, support for real-time processing,
and availability of pre-trained models.
- Ensure proper calibration and alignment of the camera to capture clear and consistent
images for gesture recognition.
- Gather a diverse dataset of hand gestures to train and validate the gesture recognition
algorithm.
- Implement a real-time processing pipeline to capture video frames from the camera,
apply the gesture recognition algorithm, and generate corresponding input commands.
- Optimize the processing pipeline for efficiency and low latency to ensure responsive
interaction with the visual trackpad.
- Integrate the visual trackpad with a user interface (UI) framework or application to
provide feedback and visual feedback to the user.
- Design intuitive user controls and feedback mechanisms to guide users in performing
gestures and interpreting system responses.
16
3.1.9. Testing and Evaluation:
- Conduct thorough testing of the visual trackpad system under various conditions,
including different lighting, backgrounds, and user scenarios.
- Evaluate the accuracy, responsiveness, and usability of the visual trackpad through
user testing, feedback collection, and performance metrics analysis.
- Prepare the visual trackpad system for deployment, including packaging, installation,
and distribution to end-users or deployment platforms.
- Provide ongoing maintenance and support for the visual trackpad system, including
bug fixes, performance optimizations, and updates to accommodate changes in hardware
or software dependencies.
- Incorporate user feedback and feature requests to iteratively improve the visual
trackpad's functionality and usability over time
SOFTWARE REQUIREMENTS
Python
OpenCV
PyAutoGUI
17
3.2.1 Python
Python is a high-level, interpreted programming language known for its
simplicity, readability, and versatility. It was first created in the late 1980s by Guido
van Rossum and has since become one of the most popular languages for a wide
range of applications, including web development, data analysis, machine learning,
scientific computing, and more.
One of the key features of Python is its simplicity and ease of use. Its syntax is
easy to learn and read, making it an ideal language for beginners. The language
emphasizes code readability, which means that it is designed to be easily understood
and maintained by developers
Python is also known for its vast collection of libraries and frameworks, which
provide a wealth of functionality for developers. Some of the most popular libraries
include NumPy, Pandas, Matplotlib, and SciPy, which provide powerful tools for
data analysis and scientific computing.Additionally, Python has a large and active
community of developers, who contribute to the language by creating new libraries
and tools, as well as supporting each other through online forums and communities.
Another advantage of Python is its cross-platform compatibility. Python code can
run on various operating systems, including Windows, Linux, and macOS, without
the need for modification. This makes it a popular choice for developing software
that needs to run on multiple platforms.
Finally, Python is an interpreted language, which means that it does not need
to be compiled before running. This allows for rapid prototyping and iteration,
making it an ideal choice for developing software in an agile environment.
18
also provides bindings for various programming languages, including Python.
Python OpenCV is a Python interface for the OpenCV library that allows
developers to use OpenCV functions and algorithms in Python code. It provides a
rich set of image processing and computer vision algorithms that can be used for
various tasks such as object detection, recognition, tracking, face detection, image
and video analysis, and more.
Python OpenCV provides various modules for image and video processing,
including image filtering, feature detection and description, object recognition,
motion analysis, and more. It also provides support for reading and writing various
image and video file formats, as well as for capturing live video streams from
cameras.
One of the main benefits of using Python OpenCV is its simplicity and ease of
use. It provides a Pythonic interface that makes it easy to write Python code for image
processing and computer vision tasks. Additionally, it has a large community of
developers who contribute to its development, provide support, and share their
knowledge and code through various forums and resources.
In summary, Python OpenCV is a powerful tool for image and video processing
and computer vision tasks that provides a simple and easy-to-use Python interface
for the OpenCV library.
Finally, Python is an interpreted language, which means that it does not need to be
compiled before running. This allows for rapid prototyping and iteration, making it
an ideal choice for developing software in an agile environment .
3.2.3 PYAUTOGUI
PyAutoGUI is a Python library that allows you to automate mouse and
keyboard operations on a computer. It provides a simple and cross-platform interface
for controlling GUI applications, automating repetitive tasks, and testing software.
Some of the common tasks that can be automated using PyAutoGUI include clicking,
typing, dragging, scrolling, and taking screenshots.
19
PyAutoGUI works by using Python's built-in modules such as pywin32 (for
Windows) and Quartz (for Mac) to simulate keyboard and mouse input events. It can
also retrieve information about the screen, such as the position of windows, mouse
cursor position, and colour values of pixels. This allows for a wide range of
possibilities when automating tasks.
One of the main advantages of PyAutoGUI is its simplicity. Its easy-to-use
functions and intuitive interface make it accessible to beginners who have little or no
experience in automation. Additionally, PyAutoGUI is cross-platform, meaning that
it can be used on Windows, Mac, and Linux operating systems.
However, it is important to note that PyAutoGUI may not work well in some
situations, especially in cases where there is a delay or lag in the system. It is also
important to use PyAutoGUI carefully, as it can potentially cause damage to the
system if not used properly.
3.2.4 ALGORITHM:
STEP 2 : Open a file and using the file location go to the terminal.
STEP 4 : Initialize the code and Start the video capturing of WEBCAM
20
CHAPTER 4
RESULT:
Fig.4.1.This picture shows the detection of eye (pupil) from the webcam.
21
Fig.4.2. shows the result and working of visual trackpad. After implementing
the visual trackpad using python and conducting thorough testing, several key results
and discussion emerge:
Functional Visual Trackpad System: The primary outcome of the project would be the
development of a functional visual trackpad system implemented in Python. This system
would allow users to control a cursor or interact with digital content using hand gestures
captured by a camera.
Gesture Recognition Algorithm: A key outcome would be the implementation of a
robust gesture recognition algorithm capable of accurately detecting and interpreting a
variety of hand movements and gestures. This algorithm would form the core of the
visual trackpad system and enable intuitive interaction with digital devices.
Real-Time Processing Pipeline: The project would involve the design and
implementation of a real-time processing pipeline to capture video frames from the
camera, apply the gesture recognition algorithm, and generate corresponding input
commands. This pipeline would ensure responsive interaction with the visual trackpad
system.
User Interface Integration: Another outcome would be the integration of the visual
trackpad system with a user interface (UI) framework or application, allowing users to
visualize their hand gestures and receive feedback on their interactions. The UI would
provide an intuitive and user-friendly interface for controlling digital content.
Testing and Evaluation Results: The project would include testing and evaluation of
the visual trackpad system under various conditions, including different lighting
environments, backgrounds, and user scenarios. Results from testing would provide
insights into the accuracy, responsiveness, and usability of the system.
Documentation and User Guide: A comprehensive documentation and user guide
would be produced as an outcome of the project, detailing the implementation details,
usage instructions, and troubleshooting tips for the visual trackpad system. This
documentation would facilitate the deployment and use of the system by other
developers and end-users.
22
Demonstration and Presentation: Finally, the project outcomes would include a
demonstration of the visual trackpad system and a presentation showcasing its features,
functionality, and potential applications. This demonstration would highlight the
project's achievements and contributions to the field of human-computer interaction.
23
CHAPTER 5
CONCLUSION
In this research, First, it will locate the eye's pupil in the middle. Then, various commands
are established for the virtual keyboard depending on the variations in pupil position.
The signals pass through the motor driver on their way to the virtual keyboard itself. The
motor driver's ability to regulate direction and speed allows the virtual mouse to go
forward, left, right, and stop. It provides a control method based on eye tracking that
enables people to easily and intuitively interact with computers using only their eyes. The
system combines keyboard and mouse functionality such that users may use it to do
practically all computer inputs without the need for conventional input devices. The
technology not only makes it possible for disabled people to use computers in the same
way as able-bodied users, but it also gives sighted persons a fresh option for using
computers. In a surfing trial, the suggested method increased browsing effectiveness and
enjoyment, and also allowed users to easily engage with multimedia. the development
and implementation of the visual trackpad using Python represent a significant
advancement in human-computer interaction (HCI) technology. Through the integration
of computer vision techniques, machine learning algorithms, and real-time processing,
the visual trackpad offers users a novel and intuitive input method that enhances their
interaction with digital devices and environments. , Future improvements could focus on
enhancing the robustness and adaptability of the gesture recognition algorithm through
advanced machine learning techniques.
24
SOURCE CODE
SOURCE CODE:
import cv2
import mediapipe as mp
import pyautogui
cam = cv2.VideoCapture(0)
face_mesh = mp.solutions.face_mesh.FaceMesh(refine_landmarks=True)
screen_w, screen_h = pyautogui.size()
while True:
_, frame = cam.read()
frame = cv2.flip(frame, 1)
rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
output = face_mesh.process(rgb_frame)
landmark_points = output.multi_face_landmarks
frame_h, frame_w, _ = frame.shape
if landmark_points:
landmarks = landmark_points[0].landmark
26
REFERENCES
constructs: An empirical study,‖ Commun. ACM, vol. 26, no. 11, pp.
853–860.
[6] Murphy,L. (2008)―Debugging: The good, the bad, and the quirky—
27
[8] QiangJi and Zhiwei Zhu (2007)‖Novel Eye Gaze Tracking Techniques
Under Natural Head Movement‖, Senior Member, IEEE, VOL. 54, NO.
12, DECEMBER
[11] S. K. Kang, M. Y. Nam and P. K. Rhee, "Color Based Hand and Finger
[12] Z. Ma and E. Wu, “Real-time and robust hand tracking with a single
depth camera,” Vis. Comput., vol. 30, no. 10, pp. 1133–1144, Oct. 2014.
28
29