REPORT_LATEX_CODE_A08__1_

AUDIO TO SIGN LANGUAGE
TRANSLATOR
A Project Report Submitted in the

Partial Fulfillment of the Requirements
for the Award of the Degree of
BACHELOR OF TECHNOLOGY
IN
Information Technology
Submitted by
Gudipati Nihitha Reddy 19881A1218

Madige Laasya 19881A1232
Angothu Mahesh Naik 20885A1202
SUPERVISOR
Dr. Ganesh B Regulwar
Associate Professor
Department of Information Technology
April, 2023
Department of Information Technology
CERTIFICATE
This is to certify that the project titled AUDIO TO SIGN LANGUAGE
TRANSLATOR is carried out by
Gudipati Nihitha Reddy 19881A1218

Madige Laasya 19881A1232
Angothu Mahesh Naik 20885A1202
in partial fulfillment of the requirements for the award of the degree of
Bachelor of Technology in Information Technology during the year

2022-23.
Signature of the Supervisor Signature of the HOD

Dr.Ganesh B Regulwar Dr. Munisekar Velpuru
Associate Professor Professor and Head,IT
Kacharam (V), Shamshabad (M), Ranga Reddy (Dist.)–501218, Hyderabad, T.S.

Ph: 08413-253335, 253201, Fax: 08413-253482, www.vardhaman.org
Acknowledgement
The satisfaction that accompanies the successful completion of the task

would be put incomplete without the mention of the people who made it
possible, whose constant guidance and encouragement crown all the efforts
with success.
We wish to express our deep sense of gratitude to Dr. Ganesh B

Regulwar, Associate Professor and Project Supervisor, Department of
Information Technology, Vardhaman College of Engineering, for his able guid-
ance and useful suggestions, which helped us in completing the project in time.
We are particularly thankful to Dr. Munisekar Velpuru, the Head of

the Department, Department of Information Technology, his guidance, intense
support and encouragement, which helped us to mould our project into a
successful one.
We show gratitude to our honorable Principal Dr. J.V.R. Ravindra, for

providing all facilities and support.
We avail this opportunity to express our deep sense of gratitude and heart-
ful thanks to Dr. Teegala Vijender Reddy, Chairman and Sri Teegala
Upender Reddy, Secretary of VCE, for providing a congenial atmosphere to
complete this project successfully.
We also thank all the staff members of Electronics and Communication

Engineering department for their valuable support and generous advice. Finally
thanks to all our friends and family members for their continuous support and
enthusiastic help.
Gudipati Nihitha Reddy

Madige Laasya
Angothu Mahesh Naik
i
Abstract
In this paper, an innovative technique for communicating with people who

have vocal and hearing impairments is presented. It discusses a more effec-
tive technique for speech-to-sign translation and sign recognition.. Millions of
individuals all over the world have hearing impairments. Deaf people never
have the opportunity that hearing people do, whether it be to play computer
games, communicate, go to seminars, or participate in video meetings.The
communication barrier faced by the deaf community due to the use of sign
language can make it difficult to interact with individuals who do not under-
stand the language. To address this issue, a research paper aims to develop
a multi-module translating system capable of converting English audio to En-
glish text. This system will use structured grammar representations generated
through English text parsing and apply Indian Sign Language grammar rules
to facilitate translation. Through the use of this technology, the communica-
tion gap between the deaf and hearing communities can be bridged.
Keywords: Natural language processing, speech recognition, speech to

text, and machine translation.
ii
Table of Contents
Title Page No.

Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 1.4 Aim and Objectives . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4.1 Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.6 Speech to Sign Fundamentals . . . . . . . . . . . . . . . . . . . . . 5
1.6.1 Conversion of Speech to Text . . . . . . . . . . . . . . . . 6
1.6.2 Finding ISL in Datasets . . . . . . . . . . . . . . . . . . . . 7
CHAPTER 2 LITERATURE SURVEY . . . . . . . . . . . . . . . . 8
2.1 General Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Basic Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Google Speech API . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 Unsupervised Algorithm for Text Document . . . . . . . . . . . . 12
2.5 Natural Language Processing (NLP) . . . . . . . . . . . . . . . . . 14
2.5.1 NLP Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5.2 Natural Language Toolkit (NLTK) . . . . . . . . . . . . . . 16
2.5.3 Deep Learning, Machine Learning, and Statistical NLP . . 17
CHAPTER 3 SYSTEM ANALYSIS . . . . . . . . . . . . . . . . . . 18
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Existing System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 Proposed system . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.4 Hardware and Software Requirements . . . . . . . . . . . . . . . . 21
3.4.1 Hardware requirements . . . . . . . . . . . . . . . . . . . . . 21
iii
3.4.2 Software requirements . . . . . . . . . . . . . . . . . . . . . 22
3.5 Advantages and Disadvantages . . . . . . . . . . . . . . . . . . . . 22
3.5.1 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.5.2 Disadvantages . . . . . . . . . . . . . . . . . . . . . . . . . . 23
CHAPTER 4 SYSTEM DESIGN DETAILS . . . . . . . . . . . . . 24
4.1 DFD with Detailed Explanation . . . . . . . . . . . . . . . . . . . 24
4.1.1 UML diagrams . . . . . . . . . . . . . . . . . . . . . . . . 25
4.1.2 Flow Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2 Proposed Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
CHAPTER 5 IMPLEMENTATION . . . . . . . . . . . . . . . . . . 29
5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.1.1 Forms of Input . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.1.2 Speech Recognition . . . . . . . . . . . . . . . . . . . . . . . 30
5.1.3 Pre-processing of text . . . . . . . . . . . . . . . . . . . . 31
5.1.4 Porter Stemming Algorithm . . . . . . . . . . . . . . . . . . 32
5.1.5 Text to Sign Language . . . . . . . . . . . . . . . . . . . . 33
5.2 Technologies used . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.2.1 HTML (Hyper Text Markup Language) . . . . . . . . . . 33
5.2.2 CSS (Cascading Style Sheets) . . . . . . . . . . . . . . . . 35
5.2.3 Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.2.4 Django framework . . . . . . . . . . . . . . . . . . . . . . . 36
5.3 NLTK Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.3.1 word tokenizes . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.3.2 Elimination of Stop Words . . . . . . . . . . . . . . . . . 38
5.3.3 Lemmatization and Synonym replacement . . . . . . . . . 38
5.3.4 WordNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.3.5 Punkt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
CHAPTER 6 TESTING AND RESULT . . . . . . . . . . . . . . . 40
6.1 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
6.1.1 Testing Objectives . . . . . . . . . . . . . . . . . . . . . . . 40
6.1.2 Testing principles . . . . . . . . . . . . . . . . . . . . . . . . 41
6.2 System Testing Plan . . . . . . . . . . . . . . . . . . . . . . . . . . 42
6.3 Screenshots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
CHAPTER 7 CONCLUSION AND FUTURE SCOPE . . . . . . 46
7.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
7.2 Future Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
List of Figures
2.1 Speech to text conversion . . . . . . . . . . . . . . . . . . . . . . 12

2.2 Unsupervised algorithm for text mining . . . . . . . . . . . . . . . 13
2.3 Speaking tags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4 Sentiment analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.1 Proposed System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.1 Level 0 DFD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.2 Level 1 DFD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3 UML diagram for system . . . . . . . . . . . . . . . . . . . . . . . 26
4.4 Flow Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.5 Block diagram for the Audio to Sign Language Translator . . . . 28
6.1 Testing Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

6.2 Screenshot of the Home . . . . . . . . . . . . . . . . . . . . . . . . 43
6.3 Screenshot of the Sign up . . . . . . . . . . . . . . . . . . . . . . . 43
6.4 Screenshot of the Login . . . . . . . . . . . . . . . . . . . . . . . . 43
6.5 Screenshot of the Converter . . . . . . . . . . . . . . . . . . . . . . 44
6.6 Signifies “how are you” in sign language . . . . . . . . . . . . . . 44
6.7 Signifies “where are you” in sign language . . . . . . . . . . . . . 44
6.8 Signifies “where are you” in sign language . . . . . . . . . . . . . 45
v
Abbreviations
Abbreviation Description
VCE Vardhaman College of Engineering
CMOS Complementary Metal Oxide Semiconductor

CHAPTER 1
INTRODUCTION
1.1 Introduction
Sign language is the main means of communication for those with hearing
and voice impairments. Deaf persons are said to speak exclusively in sign
language. This combines hand motions, arm or body motions, and facial
emotions. By utilising this method, deaf individuals can take part in all the
activities that hearing people enjoy, from regular communication to information
access. A natural visual-spatial language called sign language (SL) combines
face emotions, hand shapes, arm orientation, and movement with the movement
of the upper body and upper body parts to produce verbal utterances in three
dimensions rather than just one.People in India who are hard of hearing, deaf,
or both developed the language . The languages of deaf and dumb people may
differ because there are numerous communities of them all over the world.Many
languages, including Urdu, French, and English, are widely used nowadays.
Deaf people around the world communicate in a variety of sign languages
and expressions in a manner similar to this. American Sign Language (ASL),
Indian Sign Language (ISL), and British Sign Language (BSL) are all used
in Britain and India (ASL) in the United States to express their ideas and
interact with one another. There are already interactive systems available for
various sign languages, including ASL, BSL, etc. There are 5.07 million people
with hearing disabilities in India. More than 30% of them are under 20 years
old, while over 50% are between the ages of 20 and 60. These folks use sign
language to communicate with others because They seldom ever speak clearly.
Because sign languages lack a definite structure or syntax, communication
can be challenging for these persons with disabilities to communicate outside
of their small communities using these signs. Other efforts, such as those
using Indian Sign Language, have been made to demonstrate the same for
1
various sign languages. ISL research, which started in 1978, has specifically
shown that it has its own grammar and syntax, making it a complete natural
language. At public places like banks, hospitals, railway stations, and bus
stops, it can be extremely difficult for hearing-impaired people to communicate
since a hearing person cannot understand the sign language a deaf person
employs. Furthermore, a hearing person cannot interact with a deaf person
since he or she might not be conversant with sign language. Communication
between the deaf and hearing communities must be facilitated by translation
into a second language. This programme transforms speech to text, shows
graphics in Indian Sign Language, and accepts speech as input. The paper’s
major objective is to employ the right technology to convert spoken English
into Sign Language of India. Speech-to-text is utilised by the interface. API
to transform audio to text in the first phase of operation. and secondly, the
input text is tokenized into words which are mapped with the corresponding
gestures to obtain Sign Language for the required text. sequence.
1.2 Problem Definition

The project’s main goal is to translate user input into sign language. To
classify the text or speech into smaller parts, Natural Language Processing
(NLP) is used. then using database search terms and data. Display the
appropriate symbol or gesture to the user at the conclusion. We have taken
into consideration the following issues in this problem: 1. Voice recognition and
text conversion. 2. The translation of the entire statement into sign language.
3. The database or data collection contains no words. For informational
purposes, sign language makes use of manual communication techniques such
facial expressions, hand gestures, and physical movements. With the help of
movies for particular words, this project translates text into sign language..With
the help of movies for particular words, this project translates text into
sign language. Those who have trouble speaking communicate via hand
gestures and signs. Language comprehension is tough for average individuals.
Hence, a system that can distinguish between various signs and gestures and
Department of Information Technology 2

communicate information to the deaf from hearing persons is required. It
eliminates the divide between those with physical disabilities and average
people. Compared to other existing procedures, our method yields the desired
outcome in the shortest amount of time and with the highest degree of
precision and accuracy.
1.3 Motivation
For individuals with hearing and speech impairments, sign language is a
natural and preferred mode of communication. Although there have been many
attempts to translate, recognise, and translate sign language motions into text,
creating text-to-sign language conversion systems has proven to be difficult.
This is largely because there isn’t a significant corpus of sign language that
can be used as the basis for creating such systems.
Despite these challenges, The creation of a system for translating text into
Indian sign language has enormous promise for improving information access
and services for the deaf community. By utilizing modern technologies such as
machine learning and computer vision, such a system can accurately recognize
and interpret written text and convert it into sign language.
However, there are several challenges that must be addressed to develop an
accurate and effectiveconversion tool for sign language from text . These include
the need for a comprehensive Indian sign language corpus, the development
of advanced computer vision algorithms, and the need for high-quality data
to train the machine learning models.
The creation of a text-to-sign language conversion system can greatly
improve the deaf community’s access to information and services in Indian
sign language. While several challenges exist in developing such a system,
continued research and development in this area hold tremendous promise in
creating a more inclusive and equitable society.

1.4 1.4 Aim and Objectives
1.4.1 Aim
This project intends to:
Develop a communication system for deaf people using ISL grammar repre-
sentation.
• Generate sign language animations or videos using the ISL grammar

representation.
• Incorporate speech recognition and synthesis technologies to enable the

system to communicate.
Overall, the goal is to create a system that allows deaf people to communicate
using ISL grammar, by converting English sentences into an appropriate ISL
representation and enabling communication through sign language animations,
videos, and speech recognition and synthesis technologies.
1.4.2 Objectives
The improvement of communication between people with hearing and speech
impairments and those who do not know sign language is one of the main
goals of this initiative. The deaf community can use this web programme,
which is open source and free to use, to translate text into sign language as
a solution. The chances of success in the areas of education, employment,
interpersonal connections, and public access venues could all be enhanced by
this technology.
The precision and efficiency of the system must be continually improved
in order for this project to be successful. The ISL dictionary needs to be
continuously updated with new words and phrases to increase the system’s
breadth and accuracy. Additionally, the technology used for speech recognition
and animation needs to be continuously refined to improve the system’s
performance and reliability.
This project has the potential to transform the lives of people with hearing
and speech impairments by improving their ability to communicate effectively.

By making this technology open source and freely available, it can benefit the
deaf community and promote inclusivity and accessibility in society. Continued
development and improvement of this technology are necessary to ensure its
effectiveness and accuracy, making it a crucial tool for creating a more equitable
and inclusive society.[1]
1.5 Scheme
A regular person utilising a computer’s microphone can input speech. Voice
to text conversion—or speech to text-by-text recognition module—takes place
with the aid of a trained voice database. By comparing the database and
converted text, meanings and symbols are discovered. The sign symbols are
then shown alongside the text for hard of hearing people. A gesture is
a movement done with a bodily part, particularly the hands, arms, face,
or head, to convey important information or feelings. In applications that
include human-machine interaction, gesture recognition is useful. Speech to
sign translation is provided by the system. To convert spoken words into word
sequences, speech recognition software is employed.
1.6 Speech to Sign Fundamentals

To use our proposed system, a person with normal speech would use their
computer’s microphone to input speech. This speech input would be processed
using Natural Language Processing (NLP) to display the corresponding sign
language output. While there are various solutions and approaches to creating
a sign language translation system, we have chosen the most effective and
efficient approach for our system.
One of the major benefits of our system is that it can be used by both
hearing and hearing-impaired individuals, making it an inclusive communication
tool. Moreover, this system has the potential to improve access to information
for the hearing-impaired population in various sectors, including education,
healthcare, and employment.
To further enhance the system’s accuracy and effectiveness, continuous

updates to the ISL dictionary and advancements in speech recognition and
animation technology are necessary. Additionally, training and education on
the use of the system for hearing individuals can further promote inclusivity
and accessibility.
In conclusion, our proposed system offers an efficient and accessible means of
communication for individuals with hearing and speech impairments, breaking
down communication barriers between hearing and hearing-impaired individuals.
With ongoing improvements and advancements, this system can become an
essential tool for promoting inclusivity and accessibility across various sectors,
creating a more equitable society.[2]
1.6.1 Conversion of Speech to Text

NLTK, or the Natural Language Toolkit, is a robust platform for creating
Python programs that can process human language data. It includes a variety
of text processing libraries that support operations such as categorization,
tokenization, stemming, tagging, parsing, and semantic reasoning. NLTK also
provides access to over 50 corpora and lexical resources, including WordNet,
through a range of interfaces. This makes NLTK a popular choice for linguists,
engineers, students, educators, researchers, and industry professionals alike. In
addition to these features, NLTK offers wrappers for powerful NLP (Natural
Language Processing) libraries and a discussion forum for users to engage with
each other.
Due to its hands-on approach that covers programming foundations together
with topics in computational linguistics, one of the primary benefits of NLTK
is its usability. Moreover, NLTK provides thorough API documentation, which
makes it simple for users to access the resources they require. Because to the
platform’s compatibility with Windows, Mac OS X, and Linux, a variety of
users can access its tools and resources.
One of the main advantages of NLTK is its usability due to its hands-on
approach, which covers both issues in computational linguistics and program-
ming fundamentals. Moreover, NLTK offers comprehensive API documentation,
making it straightforward for users to access the resources they need. Several

users can use the platform’s tools and resources because it is compatible with
Windows, Mac OS X, and Linux.
Overall, NLTK’s collection of resources and features makes it an excellent
foundation for creating Python applications that interact with human language
data and enable users to do challenging tasks with ease.. Its open-source nature
and active community ensure that it remains a leading choice for users seeking
to build powerful, effective, and accessible language processing programs.
1.6.2 Finding ISL in Datasets

To effectively convert audio input into sign language animations, the words
must be split into letters. One way to achieve this is by using the Punkt
sentence tokenizer.To break a text into a list of sentences, this tokenizer builds
a model for abbreviated words, collocations, and sentence starters using an
unsupervised method. Before being put to use, the tokenizer needs to be
trained on a sizable amount of plaintext in the target language. The easiest
approach to learn parameters from the given text is to use ”PunktSentence-
Tokenizer(text),” as pre-packaged models might not always be appropriate.[3]
Once the words are split into letters, the corresponding sign language
gestures can be displayed, which are predefined in the dataset. However,
the accuracy of the animation depends on the quality of the dataset and
the ability of the system to recognize and map the gestures to the correct
letters and words. Thus, ongoing improvements in the dataset and animation
technology are crucial to the system’s effectiveness and accuracy.
Finally, we demonstrate the efficient conversion of audio input into sign lan-
guage animations using the Punkt phrase tokenizer and predefined sign language
gestures. However, the system’s accuracy is reliant on the dataset’s quality
and ongoing animation technology advancements. Our suggested method has
the potential to develop into a vital resource for encouraging accessibility and
inclusivity for India’s hearing-impaired population with further development
and improvement.[4]

CHAPTER 2
LITERATURE SURVEY
2.1 General Review

One of the most important studies on converting Marathi sign language
into text and vice versa, according to Amit Kumar Shinde, is sign language
recognition. For persons who have hearing loss, it is also the most common
and organic mode of communication. Without an interpreter, a hand gesture
recognition system can assist those who are deaf in communicating with
hearing individuals. Both the online camera and the offline mode of the
device are functional. In their research,Researchers Neha Poddar, Shrushti
Rao, Shruti Sawant, Vrushali Somavanshi, and Prof. Sumita Chandak looked
into the prevalence of deafness in India, where it is the second most common
cause of disability. For the deaf, a portable translation device that converts
higher mathematical sign language into comparable text and speech might be
immensely beneficial and solve many issues. Anbarasi Rajamohan, Hemavathy
R., and Dhanalakshmi’s glove-based deaf-mute communication translator is
an outstanding piece of research. Five flex sensors, touch sensors, and an
accelerometer are included in the glove. The controller compares the gesture
to outputs that are already stored.The evaluation of the interpreter was done
for the following 10 letters: A, B, C, D, F, I, L, O, M, N, T, S, and
W. In line with Neha V. Tavari Dr. P. N. Chatur, A. V. Deorankar many
physically disabled persons rely on sign language interpreters to communicate
with others and convey their opinions. The project debuts an image of a hand
that was taken with a web camera. After processing the picture, characteristics
are extracted. A classification method for recognition uses features as input.
Speech or text is generated using the motion that has been identified. In this
system, the flex sensor produces an analogue output that is unreliable and
costly since it needs several circuits. In their 2007 study, Purushottam Kar
8
et al. created the INGIT system, which translates Hindi strings into Indian
Sign Language. It was developed specifically for the railway invesOver 60
years ago, Ali and colleagues created a domain-specific system that required
English text as input. The text was changed into ISL text, and then those
letters were changed into symbols in ISL. The following components of the
system’s architecture were present: 1) A text translation input module. 2) A
tokenizer to break up the phrase’s words. 3) A list of ISL symbols specifically
for concerns about railroads. If a phrase didn’t have a matching sign assigned
to it in the repository, the synonyms sign was utilised.tigation sector. FCG
was used to implement Hindi grammar. A thin semantic structure is created
from the user input by the generated module. By passing this input via
ellipsis resolution, extra words are eliminated. Depending on the type of
phrase, the ISL generator module subsequently constructed an appropriate
ISL-tag structure. A HamNoSys converter was then used to create a graphical
simulation.Over 60 years ago, Ali and colleagues created a domain-specific
system that required English text as input. The text was changed into ISL
text, and then those letters were changed into symbols in ISL. The following
components of the system’s architecture were present: 1) A text translation
input module. 2) A tokenizer to break up the phrase’s words. 3) A list of ISL
symbols specifically for concerns about railroads. If a phrase didn’t have a
matching sign assigned to it in the repository, the synonyms sign was utilised.
4) A specially crafted translator mapped each word to its associated symbol.
Moreover, it screened the words to be translated by removing any derogatory
or abusive terms as well as any words without a stored indication. use webkit
Speech Recognition to capture audio input, which is then processed usingusing
the Chrome/Google Speech API, the audio is converted into text. After
that, we employ natural language processing (NLP) to divide the text into
manageable bits. Also, our system has a dependence parser that examines
the sentence’s grammar and creates word links. 5) A word adder that
added the words in the order they were entered. A two-phase process for
creating sign language was created by Vij et al. The first stage concentrated
on preprocessing Hindi 2 Sentences and converting them into ISL grammar.

WordNet and Dependency Parser were combined in the phase to achieve this.
Using dependency graphs, the Dependency Parser represented words as well as
the connections between head words and words that modify those heads. The
second step was using HamNoSys to convert this grammar into a number of
associated Sign Language symbols. The generated symbols are converted into
XML tags using SIGML. The XML tags form might then be read by a 3D
rendering application. A two-way ISL translation system was created by MS
Anand and colleagues. The input voice was initially processed by the noise
removal submodule in the speech-to-sign module. The speech recognizer used
the output as an input to translate the spoken words into a written word
sequence. A natural language converted the word sequence into a set of signs
by employing a rule-based approach. Finally, using a sign animation module
with text annotation, the signs were displayed. The Dasgupta et al. system
illustrates this. After being accepted as input, the English text was changed
into the proper ISL structure that adhered to grammatical conventions. Their
system’s essential elements comprised the following: a) Syntax parsing and
text analysis b) Representing the LFG f-structure c) Applying grammatical
rules d) and coming up with acceptable ISL sentences. To create a dependency
structure, the Minipar Parser and parse tree were used to parse the input text.
An f-structure that encoded the grammatical relationship of the input sentence
is produced. When we say that the grammar of the input sentence is correct,
what we really mean is the subject, object, and tense of the sentence. We
presented this information as a set of attribute-value pairs. Each attribute was
named after the corresponding grammatical sign. By adhering to the proper
grammar transfer rules, the established English f-structure was converted to an
f-structure in Indian Sign Language. Noteworthy is the fact that the absence
of a formal,This system is particularly difficult to evaluate because it uses the
standard ISL written orthography.

2.2 Basic Concept
To implement the proposed communication aid system for individuals with
hearing and speech impairments, we rely on several technologies. Initially,
we The audio or text input is first transformed into a more comprehensible
format before being translated into sign language.Our system employs advanced
animation techniques to convert the audio/text into sign language animations
that users can easily comprehend. Users will receive videos/clips of sign
language animations for the given input.
Despite the promising potential of our proposed system, there are still
some challenges to overcome. As with any speech recognition and animation
system, the quality of the input audio and text can significantly affect the
accuracy and effectiveness of the system. Additionally, our system’s reliance
on the ISL dictionary means that we need to keep updating it with new words
and phrases to ensure accurate translations.
However, our system’s ability to break down communication barriers and
promote inclusivity and accessibility makes it a vital tool for the hearing-
impaired population in India. As advancements in speech recognition and
animation technology continue, we are optimistic that our system will continue
to improve, ultimately creating a more equitable and inclusive society for all.[5]
2.3 Google Speech API

The most straightforward approach for recognising speech audio data is
to submit a synchronous recognition request to the Speech-to-Text API. Up
to one minute of synchronously requested voice audio data can be processed
via speech-to-text. Speech-to-Text only responds once it has processed and
recognised all of the audio. When a request is blocking (synchronous), Speech-
to-Text must respond before handling the following request. An audio file of
30 seconds is generally processed using speech-to-text in 15 seconds, which is
quicker than real-time. Your recognition request may take much longer if the
audio is of poor quality. The practise of eliminating undesirable or nonsensical
noise from input data, such as speech, is known as noise reduction.

Figure 2.1: Speech to text conversion
Filtering, spectrum restoration, and other methods are examples of different

noise reduction strategies. The two noise reduction methods are modulation
detection and synchrony detection. Because the clarity of the sound may
not be guaranteed when the user’s or the average person’s voice is captured
using a computer or mobile phone’s microphone, it is transferred to the noise
reduction system.
2.4 Unsupervised Algorithm for Text Document

Text categorization and text clustering are two important techniques in
natural language processing. Text categorization involves assigning a text to a
fixed set of classes or categories. On the other hand, text clustering involves

grouping a collection of unlabeled texts together based on their similarities,
with the goal of placing similar texts in the same cluster and dissimilar texts
in different clusters.[6]
Figure 2.2: Unsupervised algorithm for text mining
TF-IDF, which stands for term frequency-inverse document frequency, is a

widely-used technique in information retrieval and text mining that helps to
determine the importance of a word in a document. This approach can be
used to represent textual information as a vector space model, which is useful
in a variety of applications. Google has been using TF-IDF as a ranking
criterion for content for a long time, giving more weight to term frequency
than keyword density. The TF-IDF method considers both the frequency of a
term (TF) and its inverse document frequency (IDF) in information retrieval.
Each term has a corresponding TF and IDF score. The sum of the TF and

IDF scores for a term is used to weigh the term in any given content. This
method assigns a value to a keyword based on how frequently it appears in
the text, as well as its relevance to the entire corpus of information available
on the web.
2.5 Natural Language Processing (NLP)

Natural Language Processing (NLP) is a branch of computer science that
falls under the umbrella of Artificial Intelligence (AI). The ultimate goal of
NLP is to enable computers to comprehend and interpret both spoken and
written forms of human language. Computational linguistics is an important
part of this field, as it involves rule-based modelling of human language. To
fully understand the meaning of language, including the speaker or writer’s
intent and sentiment, computational linguistics is combined with statistical,
machine learning, and deep learning models. By integrating these various
techniques, NLP seeks to create computer systems that can understand and
interact with human language in a more natural and intuitive way. Language
translation, speech recognition, and text summarization are just a few of the
computer applications that use NLP. Digital assistants, chatbots for customer
support, and voice-activated Navigation systems are a few examples of NLP
applications that we come with on a regular basis. Moreover, NLP is essential
to commercial solutions that improve efficiency.business processes, increase
worker productivity, and streamline essential company procedures.
A sort of deep learning architecture called convolutional neural networks
(ConvNets) is capable of detecting spatial and temporal correlations in images
using the appropriate filters. ConvNets achieve better fitting to the picture
dataset by leveraging weights that have already been used and minimising
the number of parameters involved. To better comprehend the complexity of
images, this architecture may be trained.
In conclusion, NLP and ConvNets are important technologies that improve
our capacity to interact with computers and carry out challenging image
processing operations. As these technologies advance, they will be used
more and more frequently across a range of sectors, including manufacturing,

healthcare, and finance. Our general quality of life will increase as a result
of their continuing growth and use, which will result in more effective and
productive workflows.[7]
2.5.1 NLP Task

Numerous NLP tasks are aimed at breaking down human text and voice
data to help computers better comprehend it. Some examples of these tasks
include:
One of the most commonly known NLP tasks is speech recognition, also
referred to as speech-to-text. This process involves accurately converting
voice input into text. Any application that responds to voice commands or
queries relies on speech recognition. However, speech recognition is particularly
challenging due to the way people speak, with variations in speed, intonation,
emphasis, dialects, and frequently using incorrect grammar.
Figure 2.3: Speaking tags
Part of speech tagging, or grammatical tagging, is the process of determining

the part of speech of a word based on its context and usage. For example,
the word ”make” in the phrases ”What kind of car do you make?” and ”I can
make a paper aeroplane” is categorized as a verb and a noun, respectively.
Sentiment analysis is another NLP task that involves scanning texts for
subtle elements such as attitudes, emotions, sarcasm, confusion, and mistrust.

It seeks to understand the overall sentiment expressed in the text and can be
helpful in determining how people feel about a particular topic or product.[8]
The process of converting structured data into human language, or ”natural
Figure 2.4: Sentiment analysis
language production,” is frequently referred to as the opposite of speech

recognition or ”speech-to-text.”
2.5.2 Natural Language Toolkit (NLTK)

Python is a popular programming language that offers an extensive set
of tools and packages for Natural Language Processing (NLP). Among these
resources, the Natural Language Toolkit (NLTK) is a widely used open-source
collection of libraries, tools, and resources for developing NLP applications.
These libraries provide a range of functionalities for various NLP tasks, in-
cluding sentence parsing, word segmentation, stemming, and lemmatization,
which involve reducing words to their roots. Tokenization libraries are also
available to split texts into smaller units like phrases, sentences, paragraphs,
and chapters, helping computers understand the text better. Additionally,
the NLTK includes libraries for advanced capabilities like semantic reasoning,
which enables the deduction of logical conclusions from textual data. The
NLTK is a potent tool for creating a precise and effective system of commu-
nication aids.This toolbox can be used to increase the precision of animation

and speech recognition in our suggested system. These libraries can also be
used to add new words and phrases to the ISL lexicon, making it easier for
the system to understand and translate them.
The system’s precision and efficiency, meanwhile, also rely on the calibre
of the audio input. The performance of the system might be significantly
impacted by background noise and speaker accents, thus it is essential that
speech recognition technology continues to progress.
In conclusion, a wide range of tools and frameworks are available in the
Python programming language that can be utilised to create powerful NLP
systems, like our suggested communication help system. The NLTK toolkit
is a useful tool that can increase the system’s accuracy, and ongoing work
on the ISL lexicon and speech recognition technologies can contribute to the
development of a more inclusive and equal society.
2.5.3 Deep Learning, Machine Learning, and Statistical

NLP
The first NLP applications were created as hand-coded, rules-based systems
that could perform some NLP tasks. However, they struggled to scale to
manage the increasing volumes of text and audio data, as well as the many
exceptions they encountered. Statistical natural language processing (NLP)
entered the scene, utilizing deep learning models and computer algorithms to
automatically extract, classify, and label elements of text and speech input.
These systems can learn and improve over time, thanks to learning techniques
based on convolutional neural networks (CNNs) and recurrent neural networks
(RNNs). With these models, NLP systems can now extract increasingly
accurate meaning from enormous amounts of unstructured, raw, and unlabeled
text and speech data sets.

CHAPTER 3
SYSTEM ANALYSIS
3.1 Overview
The objective of this research is to develop a system that can translate audio
messages into Indian Sign Language (ISL). The system receives audio input,
converts it to text, and then displays appropriate ISL clips or preset GIFs.
This approach makes it easier for normal and deaf people to communicate
with one another.
The demand for better communication tools that deliver precise outcomes is
rising as technology develops. This study has a significant impact on the field
of speech to sign language translation, potentially bridging the communication
gap between the hearing and hearing-impaired communities. The equipment is
not simply for those who have hearing loss. It also has larger implications in
public spaces like hospitals, banks, train stations, and bus stops where vocal
people might not comprehend sign language.
Everyone who wants to learn sign language translation is the system’s
intended audience.facilitate better communication. By using this system, indi-
viduals can acquire basic knowledge of sign language, which can improve their
ability to communicate with the hearing-impaired population.
In conclusion, the development of this system can significantly enhance
the quality of life for the hearing-impaired population in India and promote
inclusivity and accessibility in various sectors. With ongoing advancements and
improvements in the technology used for speech recognition and animation,
this system has significant potential for adoption and can become an essential
tool for creating a more inclusive and equitable society.
18
3.2 Existing System
There are currently no reliable models for translating text into Indian Sign
Language, despite the fact that sign language is a universal language used by
people with hearing and speech disabilities to overcome communication hurdles
(ISL). Furthermore absent is a sufficient audio-visual accompaniment to oral
communication.
There has been little progress made in the computerization of ISL, despite
great advancements in identifying the sign languages used by different countries.
British Sign Language (BSL) or American Sign Language (ASL) have been
the main subjects of earlier research in this area. Just a small number of ISL
systems have been created to far, and this lack of progress emphasises the
need for further funding and innovation in this field.
Our proposed system aims to address this gap by providing an efficient
means of converting input audio into animation, which enables speech to be
translated into ISL. With continuous improvements and advancements, this
system can become an essential tool for creating a more inclusive and accessible
society for individuals with hearing and speech impairments.
It is imperative to recognize that the development of such a system requires
continuous efforts to update and To add new words and phrases, the ISL
dictionary should be expanded. The system’s accuracy is also influenced by
the calibre of the audio input , and background noise and speaker accents
can negatively impact the system’s performance. Hence, ongoing improvements
in speech recognition and animation technology are essential to enhance the
system’s accuracy and effectiveness.
In conclusion, the lack of proper computerization of ISL underscores the
need for further innovation and investment in this area. Our approach is
suggested to close the communication gap between people with hearing and
speech difficulties. by providing an efficient means of converting input audio
into animation. With continuous efforts towards improvement, this system can
become an essential tool for creating a more inclusive and equitable society.[9]

3.3 Proposed system
Despite the various approaches used to develop communication systems, The
creation of a system that is efficient and supports Indian sign language is still
lacking. Maintaining lexical and syntactic knowledge while translating English
text into Indian sign language, we suggest using a transfer-based translation
system. This system’s main objective is to assist those with hearing difficulties.
Audio to sign language conversion systems are uncommon, despite the fact
that several initiatives have concentrated on turning sign language into text
or audio output. The goal of this project is to present cutting-edge Python
technology that can convert audio into sign language. The system receives
auditory input, displays text on the screen, and then outputs the supplied
input’s sign language or sign code. During the conversion process, A collection
Figure 3.1: Proposed System
of movies and GIFs that depict the words is used to compare every word in
the phrase. If a word cannot be discovered, the system breaks it down into
its component letters and displays the matching system-predefined films or
clips. Input of audio or text, tokenization of the input, word or letter search

within the dataset, and presentation of films or clips make up the system’s
four main processes.
Both deaf people and hearing people can benefit greatly from this method.
It can help break down barriers to communication between hearing and hearing-
impaired people and allow the latter a way to express themselves more clearly.
However, the dataset containing the films and GIFs that serve as the word
representations must be updated frequently if this system is to remain effective.
Background noise and the speaker’s accent can further compromise the system’s
accuracy.[10]
In conclusion, the suggested audio to sign language translation technology
could alter how people with hearing loss communicate, including the wider
adoption of sign language in various sectors. With continuous improvements
and advancements in technology, this system can become an essential tool for
promoting inclusivity and accessibility in our society.
3.4 Hardware and Software Requirements

The system will require the essential tools necessary for system design and
implementation. The main standards are as follows.
3.4.1 Hardware requirements

1. The system requires around 50 gigabytes of disc space to be installed.
2. The computer must have an Intel Core i3 or Intel Atom CPU to run
the system.
3. Although not mandatory, having a GPU with at least 1GB of memory

is recommended for faster training and inference.
4. The system needs a minimum of 4GB of RAM to operate.
5. An intercom is necessary for communicating with others.
6. A keyboard is needed for inputting data and commands.

3.4.2 Software requirements
1. The software can be used on both Linux and Windows operating systems.
2. The system requires Python version 3.6 or later.
3. Internet browsers, such as Chrome, are necessary for accessing the sys-
tem’s web interface.
4. Access to the Internet is required for using the system’s online features.
3.5 Advantages and Disadvantages
3.5.1 Advantages
The suggested technique has a number of benefits for improving commu-
nication between people who have speech and hearing problems. Its potential
to be used in higher-level applications is a key benefit. With audio in-
put, the system can extract audio features that can then be used in other
applications.[11]
Another benefit is that the system does sign language translation more
quickly. Instead of analysing each sign language independently in the dataset,
it uses clustering by extracting audio features because it is quicker. Its speed
makes interpersonal communication more effective.
The system also offers more accurate results because it uses speech features
instead of metadata for sign language comparison. As a result, we can achieve
higher precision in our results.
Finally, the proposed system offers an easy-to-use user interface, making
it accessible to all individuals. The interface has been designed to be simple
and user-friendly, allowing normal users to retrieve their features without any
hindrance.
In conclusion, the proposed system offers several benefits, including faster
sign language translation, higher accuracy, and an easy-to-use interface. With
these advantages, the system can significantly improve communication between
individuals with hearing and speech impairments, and promote inclusivity and

accessibility in various sectors. Ongoing advancements in the technology used
for speech recognition and animation are necessary to improve the system’s
performance further.
3.5.2 Disadvantages
1. A small training dataset is currently being utilised with the project,
and it is stored on a personal computer or in a folder. Although it might be
expanded, there are several storage-related limitations to the project. 2. File
size and format limitations: While feature extraction is easier with files of
this type, the project can only be utilised with.mp4 files. Also, longer video
clips that go beyond the limit are difficult to analyse since they require more
storage and processing power.

CHAPTER 4
SYSTEM DESIGN DETAILS
4.1 DFD with Detailed Explanation

A data flow diagram shows in graphic form the many aspects of an
information system’s functionality as data flows through it. A DFD typically
serves as the initial stage to provide a high-level overview of the system
without going into great detail; afterwards, more detail can be included. Data
flow diagrams can be used to demonstrate data processing. A DFD describes
the input, output, system path, and storage places for the data. In contrast
to traditional structured flowcharts that concentrate on control flow, a DFD
does not include information about process time or whether processes will
operate sequentially or concurrently. The logic data flow diagram uses four
simple notations to depict a process and a data store.
In our proposed system, we have employed the Gain and Sarson notation
to create the DFD. The process is represented by curved boxes, while the
square boxes denote external entities. The data storage is represented by the
rectangular open boxes, and the data flow is shown by the arrows.
By utilizing a DFD, our system can provide a clear overview of the data flow
and its processing aspects. This can help identify potential inefficiencies or areas
for improvement, leading to more effective system design and implementation.
In conclusion, a data flow diagram is a valuable tool for modelling the
flow of data within an information system. Our proposed system employs
this technique to provide an overview of the data flow and processing aspects.
This can facilitate the identification of potential improvements, leading to more
effective system design and implementation. There are several layers in the
dataflow diagram. The context level, also known as level 0 DFD, depicts the
whole piece of software as a single unit. Level 1, or The level above includes
additional process and information flow components. Any sophisticated Level
24
Figure 4.1: Level 0 DFD
1 method will be further divided into subfunctions at Level 2, and so on.[12]
Figure 4.2: Level 1 DFD
4.1.1 UML diagrams

The Unified Modelling Language (UML) is a strong and popular modelling
language used in software engineering that provides a uniform method to define
system design. Developers can view a system’s architectural blueprints in a
diagram using UML, which includes crucial information such particular system
components, how those components interact with one another, and how the
system performs overall.
UML was initially developed for object-oriented design documentation, but
it has now been broadened to accommodate a number of design documentation
use cases. The external user interface of the system as well as how entities
interact with other components and interfaces may all be shown in UML
diagrams. UML has become a powerful tool in many situations, outside of
software engineering, due to its flexibility and adaptability.
By providing a standardized language and visual representation, UML
can help teams communicate and collaborate more effectively. It can also

aid in identifying potential design issues and inconsistencies early on in the
development process, improving the overall quality of the final product.
In conclusion, UML is a powerful tool for software developers and engineers
that provides a standardized way to represent the design of a system. Its
flexibility and versatility have made it useful in a wide range of design docu-
mentation use cases. With its ability to aid in communication, collaboration,
and issue identification, UML is an essential component of modern software
engineering.
Figure 4.3: UML diagram for system
4.1.2 Flow Diagram
4.2 Proposed Algorithm

1. Register or log in
2. Request input from the user
3. Apply NLTK on the input
4. Use the appropriate sign language.
4.2.1 Algorithm
1. Begin by launching the web application.

Figure 4.4: Flow Diagram
2. Either register for a new account or log in to an existing one.
3. Input the desired text into the system or speak into the microphone for
voice input.
4. Select the Submit option to process the input.
5. The system will begin processing the input data.
6. Once completed, the animation display will show a start button.
7. The system will display the outcome in a format appropriate to the

input data.
8. Close the web application when finished.
[13]

Figure 4.5: Block diagram for the Audio to Sign Language Translator

CHAPTER 5
IMPLEMENTATION
5.1 Overview
Individuals with hearing impairments face challenges in communicating with
the majority of the population who use spoken language as their primary means
of communication. This communication gap can lead to social isolation, limited
employment opportunities, and reduced access to education and healthcare.
There is now a chance to close this communication gap and enhance the
lives of people with hearing loss thanks to the advancement of multimedia,
animation, and other computer technologies.
For those who have hearing loss, sign language—a visual and gesture
language—is their main form of communication. The bulk of the hearing
population could not, however, be conversant in sign language, which might
widen the communication gap even further.
We offer a solution to this issue that records audio as input using Web
Kit Speech Recognition and uses Google Speech API to transform audio into
text. The text is subsequently divided into smaller, more digestible parts
using Natural Language Processing (NLP), and then a reliance parser is used
to examine the sentence’s grammatical structure and create word links. The
algorithm then transforms the text into clips or films of sign language.
By leveraging computer technology, the communication gap between those
with hearing loss and the general public can be closed, our proposed system
has significant potential to promote inclusivity, accessibility, and equity across
various sectors. Improved access to education, healthcare, and employment
can lead to a more equitable and inclusive society.[14]
29
5.1.1 Forms of Input
Our suggested method is made to take inputs in a variety of formats.
Live speech or text input are both acceptable forms of input. Users have the
freedom to select the input format that best meets their needs and preferences
thanks to this functionality.
The system’s ability to accept input in different formats makes it versatile
and applicable in various settings. For instance, users who prefer to communi-
cate through written texts can conveniently input text into the system, while
those who prefer to communicate through speech can use the live speech input
feature.
Additionally, our system’s compatibility with multiple input formats en-
hances its usability in different settings, such as education and healthcare. For
example, teachers can use the text input feature to communicate with hearing-
impaired students during class sessions, while healthcare providers can use the
live speech input feature to communicate with hearing-impaired patients during
consultations.
Moreover, our proposed system’s flexibility in input formats aligns with
the goal of promoting inclusivity and accessibility in society. It eliminates
the barriers that limit communication for the hearing-impaired population and
enables them to interact and communicate seamlessly with the remainder of
the globe.
In conclusion, a key component that improves the usability and accessibility
of our suggested system is its adaptability to accept a variety of input types.
Our method can assist in closing the communication gap between the hearing-
impaired and hearing people, so fostering a more equal and inclusive society.
5.1.2 Speech Recognition

Live speech to text conversion, our system uses the PyAudio Python
package. This package is ideal for recording audio on a wide range of
platforms. Once the live speech is received, It is then turned into text by
passing it through the Google Voice Recognizer API. This API accurately
converts audio to text using neural network models.

The audio is broken into smaller parts for lengthier audio files depending
on the presence of quiet. This enhances the accuracy and efficiency of the
transcribing process. The Google Voice Recognizer API is then used to convert
each segment into text.
Our system can offer effective communication to those with hearing and
speech difficulties thanks to the usage of this technology. Our method canBy
animating sign language from live speech, it is possible to bridge the communi-
cation gap between hearing and hearing-impaired people. This can significantly
enhance the quality of life for the hearing-impaired population and promote
inclusivity and accessibility across various sectors.
However, continuous updates and advancements in the technology used
for speech recognition and animation are necessary to improve the system’s
accuracy and effectiveness. With ongoing improvements, our system can
become an essential tool for promoting inclusivity and accessibility across
various sectors and creating a more inclusive and equitable society.
5.1.3 Pre-processing of text

Filler phrases in the English language, such as ”um,” ”ah,” and ”like,” are
commonly used to fill gaps in sentences but add little meaning or context to
the overall message. With over 30 filler words in the English language, these
words can detract from the clarity and coherence of a message.
Our proposed system aims to remove filler words from sentences and
enhance the overall meaning of the message. By eliminating these unnecessary
words, the system can improve the efficiency of communication and save time.
The system achieves this by using natural language processing techniques to
identify and remove filler words from the input text.
The elimination of filler words is especially critical in certain sectors, such
as public speaking and media, where clear and concise communication is
essential. By removing filler words, speakers can improve the overall quality
of their message, making it more compelling and impactful.
However, the identification and removal of filler words require continu-
ous development and improvement in natural language processing techniques.

These techniques must account for variations in speech patterns, accents, and
languages to ensure accurate and effective removal of filler words.
In conclusion, our proposed system offers a means of improving the clarity
and coherence of messages by removing filler words from sentences. With fur-
ther development and refinement, this system can have significant applications
in public speaking, media, and other sectors where effective communication is
critical.
5.1.4 Porter Stemming Algorithm

In practical situations, the Porter Stemming method is a well-liked method
for resolving word conflation. The discipline of computer science known
as Natural Language Processing (NLP) has made it possible for robots to
comprehend spoken language. Porter Stemming, a well-known NLP technique,
was initially presented in 1980. It is well renowned for being quick and
easy, which makes it perfect for information retrieval and data mining. Porter
Stemming generates results that are more accurate and have a reduced mistake
rate when compared to other stemming algorithms.
To free English words from their inflectional and derivational suffixes and
prefixes and return them to their root or base form, our suggested system
employsPorter’s algorithm for stemming.
The word ”agree” is the root of words like ”agrees,” ”agreeable,” and
”agreement.” for instance.The technique can shorten the time it takes to locate
the sign language equivalent of a given sentence by employing stemming.
The Porter Stemming method is used by the system to analyse input in
natural language more quickly. The system can more quickly and correctly
match input text with the appropriate sign language by simplifying each word
to its most basic form. The system’s responsiveness, accuracy, and usefulness
for those with speech and hearing impairments are improved by this method.
In summary, the Porter Stemming method is a useful component of our
suggested approach.It vastly enhances the system’s language processing skills’
effectiveness and efficiency, making it a useful communication tool for those with
speech and hearing impairments. A significant tool for data mining, information

retrieval, and other related domains, this system’s potential applications go
beyond communication aids.
5.1.5 Text to Sign Language

After receiving the processed text phrase from the prior stage, our system
loops through each word and searches the local system for the associated sign
language video sequences. If the word is found, the system creates a visual
sequence. The technology divides the word into letters if the word is not
discovered and plays the relevant sign video clips for each letter.
The ISL lexicon has to be updated often to maintain the system’s correct-
ness and efficacy. As new words and phrases are introduced, they must be
added to the system’s database. Additionally, the quality of the input audio
significantly impacts the system’s performance. Noise reduction techniques and
accent recognition technology can be integrated to enhance the system’s ability
to accurately recognize and translate speech.
Our proposed system’s potential for wider adoption extends beyond the
hearing-impaired population in India. It can be utilized globally to promote
inclusive communication and break down language barriers. Moreover, the
system’s use can extend beyond real-time communication, as it can be used as
an educational tool to teach sign language and improve communication skills.
In conclusion, our system offers an efficient and accessible means of commu-
nication for those with hearing and speech impairments. By converting audio
into sign language video sequences, it can help break down communication
barriers and promote inclusivity in various sectors. However, continuous up-
dates and improvements to the system’s technology and database are necessary
to ensure its accuracy and effectiveness.
5.2 Technologies used
5.2.1 HTML (Hyper Text Markup Language)

HTML, or Hypertext Markup Language, is a common markup language
used to create web pages that can be seen in web browsers. It might

be supported by other technologies, like Cascading Style Sheets (CSS) and
programming languages like JavaScript. When HTML documents arrive from
a web server or local storage, web browsers convert them into multimedia web
pages. HTML provides a semantic structure for web pages and also contains
cues for document presentation. HTML components are the fundamental
building blocks of web pages. They make it possible to include interactive
forms, images, and other elements into the page that is being created. HTML
is used to organise the semantics of text elements such as headings, paragraphs,
lists, links, and quotations. Angle brackets divide HTML components from
HTML tags. Despite some tags,Others, like image¿ and input¿, directly enter
content into the page. Other tags, like p¿, like to enclose and provide
information about the document text, which may also incorporate other tags
as sub-elements.
Browsers employ HTML tags to decipher the content of a web page even
if they don’t display them. With its semantic structure benefiting accessibility,
search engine optimization (SEO), and other web development chores, HTML
has become the standard language for building web pages. Because to HTML’s
broad popularity and the development of web technologies, it is now a necessary
tool for building interesting and dynamic websites.
However, as web technologies continue to evolve, so does the need for
continuous updates and improvements to HTML. The development of new
standards, like HTML5, seeks to address the challenges faced by web developers
in creating modern, responsive, and accessible web pages. The evolution of
HTML is critical for creating better user experiences and advancing the web’s
capabilities.
In summary, HTML is a crucial component of modern web development,
providing a semantic structure for web pages that allows for accessibility, SEO,
and other web development tasks. As web technologies continue to evolve,
the development of new standards will play a significant role in creating more
engaging and interactive websites, ensuring that the web remains a dynamic
and innovative platform.

5.2.2 CSS (Cascading Style Sheets)
A document created in a markup language like HTML is presented using
a style sheet language called Cascading Style Sheets (CSS). CSS is one of
the founding technologies of the World Wide Web, along with HTML and
JavaScript. CSS can be used to segregate layout, colour, and font from content
and presentation. This divide can boost the material’s accessibility, give the
definition of display features more freedom and control, and enhance the
calibre of the information. Different web pages can share formatting by using
a separate.css file to specify the necessary CSS. This reduces complexity and
duplication in the content’s structure and allows for the caching of the.css file
to speed up page loading between the shared file’s pages and its formatting. the
capacity to provide similar The separation of formatting and content enables
HTML pages to be rendered in a variety of styles for different rendering
methods, such as on-screen, in print, via voice (using a screen reader or a
browser with speech capabilities), and on Braille-based tactile devices. When
a user accesses the content on a mobile device, CSS also offers formatting
guidelines for alternative layouts.
5.2.3 Python
Python is a well-liked high-level programming language that prioritises the
code’s readability and usability. It was developed with the goal of assisting
programmers in writing logical and clear code for both small and large projects.
Python’s object-oriented approach and language structures support a variety of
programming paradigms, including structured, object-oriented, and functional
programming. The language’s dynamic typing and garbage collection make it
easier to work with and free developers from onerous memory management
responsibilities. Several people refer to Python’s vast standard library as a
”batteries included” feature. It provides developers with access to a variety
of modules and tools that can speed up and improve the efficiency of their
project completion. Python was made possible because Guido van Rossum
created the well-known high-level programming language Python in the late

1980s to take the role of the ABC programming language. Version 0.9.0 of
Python, the first release, was made available in 1991. List comprehensions,
cycle-detecting garbage collection, reference counting, and support for Unicode
were among the new features added to Python 2.0, which was published in
2000.
Python 3.0, which was released in 2008, was a significant change that was
only partially backwards compatible with earlier iterations. It was challenging
for some programmers to move to Python 3, and it took the new version a
long time to gain popularity. But, Python 3’s improved features, like a larger
standard library and more extensive support for Unicode, make it a more
desirable option for developers.
Python 2 was discontinued in 2020 with version 2.7.18, and Python 3
was recommended for programmers to use instead. Python, one of the most
popular programming languages, is still utilised in a variety of fields, such
as web development, data science, machine learning, and artificial intelligence.
Both beginning developers and seasoned ones should consider it because The
2008 release of Python 3.0 represents a significant improvement in terms of
its community support, versatility, and simplicity.
5.2.4 Django framework

Popular web development framework Django makes it easier to create
intricate, database-driven websites. By allowing developers to employ pre-built
components in a pluggable architecture, it promotes modularity, reusability, and
quick development. The ”Don’t Repeat Yourself” approach, which discourages
code duplication and encourages low coupling, lies at the heart of the Django
design philosophy.
Python is the primary programming language used in Django, which makes
it easy for developers to maintain and extend their projects. Even the settings,
files, and data models in Django are written in Python, providing consistency
throughout the project.
Django’s built-in administrative interface is another feature that sets it
apart from other web frameworks. This interface allows developers to create,

read, update, and delete database records through a web-based interface,
without needing to write any additional code. The interface is automatically
generated using introspection and can be customized using admin models.
Django is widely adopted and used by many popular websites, including
Instagram, Mozilla, Disqus, Bitbucket, Nextdoor, and Clubhouse. These sites
showcase Django’s ability to handle large amounts of data and complex business
logic while maintaining scalability and reliability.
In conclusion, Django’s pluggable architecture, consistent use of Python,
and built-in administrative interface make it an ideal choice for developing
complex web applications. Its popularity and adoption by well-known websites
demonstrate its reliability and scalability.
5.3 NLTK Library

Both ”an exceptional library to play with natural language” and ”a wonder-
ful tool for teaching, and working in, computational linguistics using Python”
have been used to describe NLTK.
5.3.1 word tokenizes

In natural language processing, word tokenization is a process used to split
sentences into individual words. The wordtokenize() method is widely used
for this task, and the output can be converted into a data frame for better
understanding in machine learning applications. Input from word tokenization
can also be utilised for stemming, removing punctuation, and removing numeric
characters from text.
Numeric data is necessary for machine learning models to be trained and
to make predictions. Word tokenization is therefore essential in the process
of turning text (a string) into numeric data. This may also be accomplished
using a Bag of Words or CountVectorizer strategy.
To better understand the theory behind word tokenization, we provide an
example of NLTK’s word tokenize function below. By leveraging the power of
word tokenization, we can enhance the effectiveness of our communication aid
system by converting input audio into a format that machine learning models

can process and analyze. This can lead to improved accuracy and performance
in translating speech into Indian Sign Language.
5.3.2 Elimination of Stop Words

In Information Retrieval, it is crucial to remove unnecessary words in order
to focus on the ones that convey meaning. These unnecessary words can
include various parts of speech such as coordinating conjunctions, possessive
endings, modals, foreign words, some determiners, comparative and superlative
adjectives, plural nouns and proper plural nouns, particles, symbols, interjec-
tions, and non-root verbs. The process of removing these words is known as
stopword removal, and it helps to reduce the dimensionality of the data and
improve computational efficiency in NLP tasks such as text classification and
clustering.
5.3.3 Lemmatization and Synonym replacement

In phrases made in Indian sign language, root words are used. So, we use
Porter Stemmer principles to transform them into root form. Also, each word
is examined in a bilingual dictionary; if a match cannot be made, a synonym
with the same part of speech is assigned.
5.3.4 WordNet
WordNet is a lexical database that organizes words into synsets, or groups
of cognitive synonyms, with lexical and semantic connections between them.
It is available in over 200 languages and can be accessed through a freely
available online browser. Examples of images you may see when using the
WordNet browser include word definitions, synsets, and semantic relationships
between words.
5.3.5 Punkt
Punkt is designed to automatically learn parameters from a corpus as-
sociated with the target domain, such as a collection of acronyms. The

pre-packaged models might not be suitable as a result. Use PuntSentenceTo-
kenizer instead to learn parameters from the given text (text).

CHAPTER 6
TESTING AND RESULT
6.1 Testing
The primary objective of testing, an essential part of software quality
assurance, is to execute a programme and discover bugs. It plays a crucial
part in ensuring software quality by offering the last word on the definition,
design, and code of the programme. System testing, which evaluates the
system as a whole and looks for any inconsistencies and deviations from the
original design, is an essential step in the testing process.
The proposed system must pass a variety of tests to ensure that it is
functional and efficient before it is ready for user acceptability testing. A
successful test will discover an error that hasn’t yet been discovered, and a
good test case will have a fair chance of doing so. Testing poses an intriguing
challenge for software engineers since it is still possible to overlook certain
errors or problems, even with thorough testing. To increase the precision
and efficiency of the testing process, it is essential to continually improve the
testing processes and instruments.
Testing is a crucial part of software quality assurance since it helps to
confirm that the software system works as planned. System testing evaluates
the programme as a whole, and the testing process helps identify flaws and
issues. Because the testing procedure is not error-proof, procedures and tools
must always be enhanced to produce the finest outcomes.
6.1.1 Testing Objectives

1. Running a piece of software in search of errors is known as testing.
2. A great test case has a probability of identifying an error that hasn’t

been found yet.
40
3. A test is successful if it identifies a mistake that has gone undetected.
6.1.2 Testing principles

All tests in software testing must be able to be linked to end-user require-
ments. Testing should be planned thoroughly in advance and should begin on
a modest scale before expanding to a bigger one. Comprehensive testing is
not practical, and it works best when carried out by a neutral third party.
The primary objective of test case design is to provide a set of tests that
have the highest probability of detecting software bugs. Two different test
case design approaches, referred to as white box testing and black box testing,
are employed to achieve this goal.
To ensure that every programme statement and condition has been executed
at least once, test cases are constructed from the program’s control structure
for white box testing. Black box testing, on the other hand, is designed to
validate functional requirements without taking into account how a programme
is internally structured. In order to provide thorough test coverage, it primarily
concentrates on the software’s information domain and test cases are developed
by partitioning input and output.
Problems that black box testing is used for The primary objective of test
case design is to provide a set of tests that have the highest probability of
detecting software bugs. Missing functions, interface problems, data structure
defects, and functional logic flaws are among the things that the white box
is designed to find.It’s crucial to remember, though, that thorough testing is
impossible, and the best testing is done by an impartial third party.
In summary, efficient software testing necessitates advance preparation,
starting with small-scale testing, and moving up to higher sizes. For greatest
efficacy, testing should be carried out by unaffiliated third parties and test
cases should be able to be traced back to end-user needs. Also, by using both
White box and Black box testing approaches, it is possible to find different
software flaws and therefore raise the product’s quality.

6.2 System Testing Plan
Figure 6.1: Testing Plan
6.3 Screenshots
The Above Screenshot represents the home page of the website.which will
have home register login and converter

Figure 6.2: Screenshot of the Home
The Below Screenshots represents the Signup and Login page of the web-
site.where user can register and where user can log in to the system
Figure 6.3: Screenshot of the Sign up
Figure 6.4: Screenshot of the Login
The below screenshot represents the tool to convert the audio or text to
sign language Some of the outputs are displayed in below screenshots

Figure 6.5: Screenshot of the Converter
Figure 6.6: Signifies “how are you” in sign language
Figure 6.7: Signifies “where are you” in sign language

Figure 6.8: Signifies “where are you” in sign language

CHAPTER 7
CONCLUSION AND FUTURE SCOPE
7.1 Conclusion
Several people in the nation who are deaf or have speech or hearing issues
communicate primarily in Indian Sign Language (ISL). Given the difficulties
associated with reading and understanding written messages, sign language
presents a more favoured and practical means of communicating. In sign
language, words, emotions, and noises are expressed using hand gestures, lip
movements, and facial expressions.
Little attention has been paid to the Python programming language’s
potential as a communication tool for people with speech and hearing problems.
By providing effective ways of communication for those with hearing and speech
impairments, our suggested solution seeks to close this gap.The technology helps
Indians who have hearing loss access information more easily and also works as
a teaching aid for ISL. With the help of our approach, people with disabilities
may communicate with the rest of society and express themselves clearly.
Our method efficiently translates speech into sign language by transforming
input audio into an animation, This can help remove barriers to communication
between those who are hearing and those who are hearing-impaired. Also,
this method has a great chance of being adopted in fields like employment,
healthcare, and education where the need for more inclusive communication
tools is greatest.
However, continuous updates to the ISL dictionary are necessary to ensure
the system’s accuracy and effectiveness. Additionally, the quality of the input
audio, including background noise and speaker accents, can affect the system’s
performance. Thus, ongoing advancements in speech recognition and animation
technology are necessary.
In conclusion, our proposed system can significantly enhance the quality
46
of life for the hearing-impaired population in India and promote inclusivity
and accessibility across various sectors. With continuous development and
improvement, our system can become an essential tool for creating a more
inclusive and equitable society.
7.2 Future Scope

Sentences that have never been used before will be used to test the
suggested method in the future. Using parallel corpora of texts in both
ISL and English, the machine translation technique will also be evaluated
and employed. The system’s performance will be assessed using a set of
criteria, with the ISL corpus being particularly used for assessing ISL phrases.
Automated sign language translation systems might help people who are deaf
overcome communication difficulties by leveraging AI. In circumstances when
human interpreters are not accessible, they might employ automated real-
time translation, text-based systems, personal assistants, and search for sign
language video material.To boost accessibility and make it cross-platform, the
system may be deployed utilising a variety of front-end solutions, such as an
Android or.net application. The system can also combine awareness of body
language and facial expressions to ensure that the context and tone of the
input speech are fully understood. The application will be more accessible
to more people if it is available on mobile devices and the web. In order
to create a two-way communication system, we may also add a hand gesture
detection system employing computer vision. By developing a comprehensive
solution that caters to the needs of persons who are deaf and hard of hearing,
we can help bridge the communication gap.

REFERENCES
[1] Nasser H Dardas and Nicolas D Georganas. “Real-time hand gesture de-
tection and recognition using bag-of-features and support vector machine
techniques”. In: IEEE Transactions on Instrumentation and measurement
60.11 (2011), pp. 3592–3607.
[2] Anand Ballabh and Umesh Chandra Jaiswal. “A study of machine trans-
lation methods and their challenges”. In: Int. J. Adv. Res. Sci. Eng 4.1
(2015), pp. 423–429.
[3] Purva C Badhe and Vaishali Kulkarni. “Indian sign language translator
using gesture recognition algorithm”. In: 2015 IEEE international con-
ference on computer graphics, vision and information security (CGVIS).
IEEE. 2015, pp. 195–200.
[4] Taner Arsan and Oğuz Ülgen. “Sign language converter”. In: International
Journal of Computer Science & Engineering Survey (IJCSES) 6.4 (2015),
pp. 39–51.
[5] Mohammed Elmahgiubi, Mohamed Ennajar, Nabil Drawil, and Mohamed
Samir Elbuni. “Sign language translator and gesture recognition”. In: 2015
Global Summit on Computer & Information Technology (GSCIT). IEEE.
2015, pp. 1–6.
[6] M Mahesh, Arvind Jayaprakash, and M Geetha. “Sign language translator
for mobile platforms”. In: 2017 International Conference on Advances in
Computing, Communications and Informatics (ICACCI). IEEE. 2017,
pp. 1176–1181.
[7] Neha Poddar, Shrushti Rao, Shruti Sawant, Vrushali Somavanshi, and
Sumita Chandak. “Study of sign language translation using gesture recog-
nition”. In: International Journal of Advanced Research in Computer and
Communication Engineering 4.2 (2015).
[8] Anbarasi Rajamohan, R Hemavathy, and M Dhanalakshmi. “Deaf-mute
communication interpreter”. In: International Journal of Scientific Engi-
neering and Technology 2.5 (2013), pp. 336–341.
[9] Yogeshwar I Rokade and Prashant M Jadav. “Indian sign language recog-
nition system”. In: International Journal of engineering and Technology
9.3 (2017), pp. 189–196.
[10] Shreyashi Narayan Sawant and MS Kumbhar. “Real time sign lan-
guage recognition using pca”. In: 2014 IEEE International Conference on
Advanced Communications, Control and Computing Technologies. IEEE.
2014, pp. 1412–1415.
[11] Madhuri Sharma, Ranjna Pal, and Ashok Kumar Sahoo. “Indian sign
language recognition using neural networks and KNN classifiers”. In:
ARPN Journal of Engineering and Applied Sciences 9.8 (2014), pp. 1255–
1259.
48
[12] Amitkumar Shinde and Ramesh Kagalkar. “Sign language to text and
vice versa recognition using computer vision in Marathi”. In: International
Journal of Computer Applications 975 (2015), p. 8887.
[13] Stephanie Stoll, Necati Cihan Camgoz, Simon Hadfield, and Richard
Bowden. “Text2Sign: towards sign language production using neural ma-
chine translation and generative adversarial networks”. In: International
Journal of Computer Vision 128.4 (2020), pp. 891–908.
[14] Neha V Tavari, AV Deorankar, and P Chatur. “A review of literature
on hand gesture recognition for Indian Sign Language”. In: International
Journal 1.7 (2013).

REPORT_LATEX_CODE_A08__1_

Uploaded by

Copyright:

Available Formats

REPORT_LATEX_CODE_A08__1_

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

REPORT_LATEX_CODE_A08__1_

Uploaded by

Copyright:

Available Formats

AUDIO TO SIGN LANGUAGE

A Project Report Submitted in the

Gudipati Nihitha Reddy 19881A1218

Department of Information Technology

This is to certify that the project titled AUDIO TO SIGN LANGUAGE

TRANSLATOR is carried out by

Gudipati Nihitha Reddy 19881A1218

in partial fulfillment of the requirements for the award of the degree of

Bachelor of Technology in Information Technology during the year

Signature of the Supervisor Signature of the HOD

Kacharam (V), Shamshabad (M), Ranga Reddy (Dist.)–501218, Hyderabad, T.S.

The satisfaction that accompanies the successful completion of the task

We wish to express our deep sense of gratitude to Dr. Ganesh B

We are particularly thankful to Dr. Munisekar Velpuru, the Head of

We show gratitude to our honorable Principal Dr. J.V.R. Ravindra, for

We also thank all the staff members of Electronics and Communication

Gudipati Nihitha Reddy

In this paper, an innovative technique for communicating with people who

Keywords: Natural language processing, speech recognition, speech to

Title Page No.

2.1 Speech to text conversion . . . . . . . . . . . . . . . . . . . . . . 12

3.1 Proposed System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.1 Level 0 DFD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

6.1 Testing Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

VCE Vardhaman College of Engineering

CMOS Complementary Metal Oxide Semiconductor

1.2 Problem Definition

Department of Information Technology 2

Department of Information Technology 3

• Generate sign language animations or videos using the ISL grammar

• Incorporate speech recognition and synthesis technologies to enable the

Department of Information Technology 4

1.6 Speech to Sign Fundamentals

Department of Information Technology 5

1.6.1 Conversion of Speech to Text

Department of Information Technology 6

1.6.2 Finding ISL in Datasets

Department of Information Technology 7

2.1 General Review

Department of Information Technology 9

Department of Information Technology 10

2.3 Google Speech API

Department of Information Technology 11

Filtering, spectrum restoration, and other methods are examples of different

2.4 Unsupervised Algorithm for Text Document

Department of Information Technology 12

Figure 2.2: Unsupervised algorithm for text mining

TF-IDF, which stands for term frequency-inverse document frequency, is a

Department of Information Technology 13

2.5 Natural Language Processing (NLP)

Department of Information Technology 14

2.5.1 NLP Task

Figure 2.3: Speaking tags

Part of speech tagging, or grammatical tagging, is the process of determining

Department of Information Technology 15

Figure 2.4: Sentiment analysis

language production,” is frequently referred to as the opposite of speech

2.5.2 Natural Language Toolkit (NLTK)

Department of Information Technology 16

2.5.3 Deep Learning, Machine Learning, and Statistical