B09 SignLanguageDetection
B09 SignLanguageDetection
BACHELOR OF TECHNOLOGY
In
By
CERTIFICATE
This is to certify that the Mini Project-2 entitled “SIGN LANGUAGE DETECTION”, is being
submitted by K.SUSHMA, G.CHANDRIKA, P.KUSUMA LATHA, S.VYSHNAVI ,K.RUPA
SRI bearing the Regd. No. 18B01A0574, 18B01A0590, 18B01A05A9, 18B01A05B7,
19B01A0508 in partial fulfilment of the requirements for the award of the degree of
“Bachelor of Technology in Computer Science & Engineering” is a record of bonafide
work carried out by them under my guidance and supervision during the academic year
2021–2022 and it has been found worthy of acceptance according to the requirements of
the university.
External Examiner
ACKNOWLEDGEMENTS
It is natural and inevitable that thoughts and ideas of other people tend to drift into the
subconscious due to various human parameters, where one feels to acknowledge the help
and guidance derived from others. We acknowledge each one of those who have contributed
to the fulfilment of this project report.
We wish to place out a deep sense of gratitude to Sri. K. V. Vishnu Raju, chairman of SVES
for providing us with all facilities necessary to carry out this project successfully.
We express our heart full thanks to our principal Dr. G. Srinivasa Rao for his constant
support on every progress in our work.
We wish to express our sincere thanks to vice Principal Dr. P. Srinivasa Raju for being a
source of inspiration and constant encouragement.
We are privileged to express our sincere gratitude to the honourable Head of the
Department Dr. P. Kiran Sree for giving his support and guidance in our endeavours.
We express our sincere thanks to the members of the Project Review Committee Ms. P.
Sudha Rani, Associate Professor, Dr. V. Purushothama Raju, Professor in CSE & Dean
(Academics)-SVECW.
We express our deep sense of gratitude and sincere appreciation to our guide Dr. P. Kiran
Sree, Head of the Department in CSE - SVECW, for his indispensable suggestions, unflinching
and esteemed guidance throughout the project.
It has been a great pleasure doing project work at Shri Vishnu Engineering College for Women
as a part of our curriculum.
PROJECT ASSOCIATES
KOVVURI SUSHMA (18B01A0574)
GADIDESI CHANDRIKA (18B01A0590)
PAPPULA KUSUMA LATHA (18B01A05A9)
SANKA SRI NAGA VARDHINI VYSHNAVI (18B01A05B7)
KASANI RUPA SRI (19B01A0508)
ABSTRACT
There have been several advancements in technology and a lot of research has been
done to help the people who are deaf and dumb. The purpose of this project is to
design a convenient system that is used to detect the visual-gestural language used
by deaf and hard hearing people for communication purposes. One should learn sign
language to interact with them. Most of the existing tools for sign language learning
use external sensors which are costly. Because of this, the process of sign language
learning is a very difficult task.
Our project aims at extending a step forward in this field by using Deep Learning and
python. In this approach, a dataset is collected and the useful information extracted
is input into supervised learning techniques. Computer recognition of sign language
deals from sign gesture acquisition and continues till text generation followed by
conversion of this text into speech. First the images are collected for learning using
webcam and OpenCV. Then the images are labeled for sign language detection using
labelImg. Next, we build a CNN model and then we train the model with the training
dataset and finally the language is detected using OpenCV and webcam.
So ideally the result is a real-time object detection device that can detect different
sign language poses based on American Sign Language System.
Sign Language is the means of communication among the deaf and mute community.
Sign Language emerges and evolves naturally within the hearing-impaired
community. Sign Language communication involves manual and non-manual signals
where manual signs involve fingers, hands, arms and non-manual signs involve face,
head, eyes and body. Sign Language is a well-structured language with a phonology,
morphology, syntax and grammar. Sign language is a complete natural language
that uses different ways of expression for communication in everyday life. Sign
Language recognition systems transfer communication from human-human to
human-computer interaction. The aim of the sign language recognition system is to
present an efficient and accurate mechanism to transcribe text or speech, thus the
“dialog communication” between the deaf and hearing person will be smooth.
2) Vision Based Approach: This approach takes images from the camera as data
of gestures. The vision-based method mainly concentrates on capturing images
of gestures and extracting the main feature and recognizing it. This approach is
more convenient to use as it does not need the user to wear any gadgets.
The goal of this project was to build a neural network able to classify which gesture
of the Sign Language is being signed, given an image of a signing hand. This project
is a first step towards building a possible sign language translator, which can take
communications in sign language and translate them into written language. Such a
translator would greatly lower the barrier for many deaf and mute individuals to be
able to better communicate with others in day-to-day interactions.
Minimizing the verbal exchange gap among D&M and non-D&M people turns into a
want to make certain effective conversation among all. Sign language translation is
among one of the most growing lines of research and it enables the maximum natural
manner of communication for those with hearing impairments. A hand gesture
recognition system offers an opportunity for deaf people to talk with vocal humans
without the need of an interpreter. The system is built for the automated conversion
of ASL into textual content and speech.
This goal is further motivated by the isolation that is felt within the deaf community.
Loneliness and depression exist at higher rates among the deaf population, especially
when they are immersed in a hearing world. Large barriers that profoundly affect life
quality stem from the communication disconnect between the deaf and the hearing.
Some examples are information deprivation, limitation of social connections, and
difficulty in integrating a society.
Objectives :
● Collection of images for dataset with webcam using OpenCV and Python.
● Building a CNN model and training the model with a training dataset.
The increasing growth of machine learning, computer techniques divided into traditional
methods and machine learning methods. This section describes the related works of
conversion of Sign language to text and how machine learning methods are better than
traditional methods. The existing method in this project is the sensor-based method.
Disadvantages:
⦁ Need to carry sensors every time with hand.
⦁ High cost.
⦁ Low Accuracy.
⦁ Complex gestures cannot be recognized.
⦁ No speech conversion is present.
In our project we primarily focus on producing a CNN model which can recognize
Sign Language gesture and can convert into text. The predicted text is again
converted to speech so that the visually challenging people can also able to
understand the gesture. Both the text and audio will be shown in our Application.
Block Diagram :
Advantages:
⦁ Accuracy is good.
⦁ Low complexity.
⦁ Highly efficient.
⦁ Complex gestures can also be detected.
⦁ Speech conversion is present.
2.3 Feasibility Study
Technical Feasibility
Economic Feasibility
Behavioural Feasibility
• OPENCV
The library has more than 2500 optimized algorithms, which includes a
comprehensive set of both classic and state-of-the-art computer vision
and machine learning algorithms. These algorithms can be used to detect
and recognize faces, identify objects, classify human actions in videos,
track camera movements, track moving objects, extract 3D models of
objects, produce 3D point clouds from stereo cameras, stitch images
together to produce a high resolution image of an entire scene, find similar
images from an image database, remove red eyes from images taken
using flash, follow eye movements, recognize scenery and establish
markers to overlay it with augmented reality.
• CNN
RAM – 1GB
Processor – Intel core i5
Hard Disk – 512GB
4 SYSTEM DESIGN
4.1 Introduction
The design activity is often divided into two separate phase-system design
and detaileddesign. System design, which is sometimes also called top-level
design, aims to identify the modules that should be in the system, the
specifications of these modules, and how they interact with each other to
produce the desired results. At the end of system design all the major data
structures, file formats, output formats, as well as the major modules in the
system and their specifications are decided.
• Sequence diagram
• Collaboration diagram
Use cases:
A use case describes a sequence of actions that provide something
of measurablevalue to an actor and is drawn as a horizontal ellipse.
Actors:
An actor is a person, organization, or external system that plays a role
in one or moreinteractions with the system.
Include:
In one form of interaction, a given use case may include another. "Include
is a Directed Relationship between two use cases, implying that the behaviour
of the included use case is inserted into the behaviour of the including use
case.The first use case often depends on theoutcome of the included use case.
This is useful for extracting truly common behaviours frommultiple use cases
into a single description. The notation is a dashed arrow from the includingto the
included use case, with the label "«include»“. There are no parameters or return
values.To specify the location in a flow of events in which the base use case
includes the behaviour of another, you simply write include followed by the name
of use case you want to include, asin the following flow for track order.
Extend:
Generalization:
Associations:
Associations between actors and use cases are indicated in use case
diagrams by solidlines. An association exists whenever an actor is involved with
an interaction described by a use case. Associations are modelled as lines
connecting use cases and actors to one another,with an optional arrowhead on
one end of the line. The arrowhead is often used to indicatingthe direction of
the initial invocation of the relationship or to indicate the primary actor within
the use case.
5.1 Introduction
In this module, the user can directly turn on their camera and start showing
the gestures, these gestures are then given to the model and the model
translates the gesture into text. The user can see the translated sign to text
in the screen.
Generally, a Convolutional Neural Network has three layers, which are as follows;
We will start with an input image to which we will be applying multiple feature
detectors, which are also called filters to create the feature maps that comprise a
Convolution layer. Then on the top of that layer, we will be applying the ReLU or
Rectified Linear Unit to remove any linearity or increase non-linearity in our images.
Next, we will apply a Pooling layer to our Convolutional layer, so that from every
feature map we create a Pooled feature map as the main purpose of the pooling layer
is to make sure that we have spatial invariance in our images. It also helps to reduce
the size of our images as well as avoid any kind of overfitting of our data. After that,
we will flatten all of our pooled images into one long vector or column of all of these
values, followed by inputting these values into our artificial neural network. Lastly,
we will feed it into the locally connected layer to achieve the final output.
1. Convolution Layer:
In convolution layer we take a small window size [typically of length 5*5] that
extends to the depth of the input matrix. The layer consists of learnable filters of
window size. During every iteration we slid the window by stride size [typically 1],
and compute the dot product of filter entries and input values at a given position.
2. Pooling Layer:
We use pooling layer to decrease the size of activation matrix and ultimately reduce
the learnable parameters. There are two types of pooling:
a. Max Pooling: In max pooling we take a window size [for example window of size
2*2], and only take the maximum of 4 values. Well lid this window and continue this
process, so well finally get an activation matrix half of its original Size.
b. Average Pooling: In average pooling, we take advantage of of all Values in a
window.
In convolution layer, neurons are connected only to a local region, while in a fully connected
region, we will connect all the inputs to neurons.
4.Final Output Layer:
After getting values from fully connected layer, we will connect them to the final layer of
neurons [having count equal to total number of classes], that will predict the probability of
each image to be in different classes.
5.4 SCREENS:
Images of Dataset for class P
CNN-Model
Home Screen Image
Live Translation of Gesture to Text
6. SYSTEM TESTING
6.1. Introduction:
TESTING OBJECTIVES
• A good test case is one that has a high probability of finding an undiscovered
error.
Test Levels
Unit Testing
Unit testing involves the design of test cases that validate that the
internal program logic is functioning properly, and that program inputs
produce valid outputs. All decision branches and internal code flow
should be validated. It is the testing of individual software units of the
application.
Integration Testing
Functional Testing
System Testing
Unit Testing
Integration Testing
Acceptance Testing
In this report, a functional real time vision based American Sign Language
recognition for Deaf and mute people have been developed for ASL alphabets.
We achieved final accuracy of 90.0% on our data set. We have improved our
prediction after implementing two layers of algorithms wherein we have
verified and predicted symbols which are more similar to each other.
This gives us the ability to detect almost all the symbols provided that they
are shown properly, there is no noise in the background and lighting is
adequate.
8.FUTURE SCOPE
[1] T. Yang, Y. Xu, and “A., Hidden Markov Model for Gesture Recognition”, CMU-RI-
TR-94 10, Robotics Institute, Carnegie Mellon Univ., Pittsburgh, PA, May 1994.
[2] Pujan Ziaie, Thomas M uller, Mary Ellen Foster, and Alois Knoll “A Na ̈ıve Bayes
Munich, Dept. of Informatics VI, Robotics and Embedded Systems, Boltzmannstr. 3,
DE-85748 Garching, Germany.
[3]https://docs.opencv.org/2.4/doc/tutorials/imgproc/gausian_median_blur_bilater
al_filter/gausian_median_blur_bilateral_filter.html
[4] Mohammed Waleed Kalous, Machine recognition of Auslan signs using
PowerGloves: Towards large-lexicon recognition of sign language.
[5]aeshpande3.github.io/A-Beginner%27s-Guide-To-Understanding-Convolutional-
Neural Networks-Part-2/
[6] http://www-i6.informatik.rwth-aachen.de/~dreuw/database.php
[7] Pigou L., Dieleman S., Kindermans PJ., Schrauwen B. (2015) Sign Language
Recognition Using Convolutional Neural Networks. In: Agapito L., Bronstein M.,
Rother C. (eds) Computer Vision - ECCV 2014 Workshops. ECCV 2014. Lecture Notes
in Computer Science, vol 8925. Springer, Cham
[8] Zaki, M.M., Shaheen, S.I.: Sign language recognition using a combination of new
vision-based features. Pattern Recognition Letters 32(4), 572–577 (2011).
INTRODUCTION TO PYTHON
Python
What is Python? Chances you are asking yourself this. You may have found this book
because you want to learn to program but don’t know anything about programming
languages. Or you may have heard of programming languages like C, C++, C#, or
Java and want to know what Python is and how it compares to “big name” languages.
Hopefully I can explain it to you.
Python concepts
If you’re not interested in the how’s and why's of Python, feel free to skip to the next
chapter. In this chapter I will try to explain to the reader why I think Python is one
of the best languages available and why it’s a great one to start programming with.
• Open-source general-purpose language.
• Object Oriented, Procedural, Functional
• Easy to interface with C/ObjC/Java/Fortran
• Easy-is to interface with C++ (via SWIG)
• Great interactive environment
• Great interactive environment
Python is a high-level, interpreted, interactive and object-oriented scripting
language. Python is designed to be highly readable. It uses English keywords
frequently whereas other languages use punctuation, and it has fewer syntactic
constructions than other languages.
⦁ Python is Interpreted − Python is processed at runtime by the interpreter. You
do not need to compile your program before executing it. This is similar to PERL and
PHP.
⦁ Python is Interactive − you can actually sit at a Python prompt and interact
with the interpreter directly to write your programs.
⦁ Python is Object-Oriented − Python supports Object-Oriented style or
technique of programming that encapsulates code within objects.
⦁ Python is a Beginner's Language − Python is a great language for the
beginner-level programmers and supports the development of a wide range of
applications from simple text processing to WWW browsers to games.
Numpy
Humpy’s main object is the homogeneous multidimensional array. It is a table of
elements (usually numbers), all of the same type, indexed by a tuple of positive
integers. In numbly dimensions are called axes. The number of axes is rank.
• Offers Matlab-ish capabilities within Python
• Fast array operations
• 2D arrays, multi-D arrays, linear algebra etc.
Matplotlib
• High quality plotting library.
Python class and objects
These are the building blocks of OOP. Class creates a new object. This object can be
anything, whether an abstract data concept or a model of a physical object, e.g. a
chair. Each class has individual characteristics unique to that class, including
variables and methods. Classes are very powerful and currently “the big thing” in
most programming languages. Hence, there are several chapters dedicated to OOP
later in the book.
The class is the most basic component of object-oriented programming. Previously,
you learned how to use functions to make your program do something.