0% found this document useful (0 votes)
7 views38 pages

final report_phase 1

The project report outlines the development of a system for multilingual conversion of sign language to text, focusing on Indian Sign Language (ISL). It aims to bridge communication gaps for the deaf and hard-of-hearing community by utilizing machine learning and computer vision to recognize and translate sign language gestures into text across multiple languages. The project seeks to enhance accessibility and inclusivity in various environments, including education and public services, while addressing the limitations of existing sign language recognition technologies.

Uploaded by

shraddhajinendra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views38 pages

final report_phase 1

The project report outlines the development of a system for multilingual conversion of sign language to text, focusing on Indian Sign Language (ISL). It aims to bridge communication gaps for the deaf and hard-of-hearing community by utilizing machine learning and computer vision to recognize and translate sign language gestures into text across multiple languages. The project seeks to enhance accessibility and inclusivity in various environments, including education and public services, while addressing the limitations of existing sign language recognition technologies.

Uploaded by

shraddhajinendra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

VISVESVARAYATECHNOLOGICAL UNIVERSITY

JNANA SANGAMA, BELAGAVI - 590018, KARNATAKA

A
PROJECTPHASE 1
REPORT ON

“MULTILINGUAL CONVERSION OF SIGN


LANGUAGE TO TEXT”
Submitted in partial fulfillment of the requirements for the award of the degree of
Bachelor of Engineering in Information Science and Engineering of Visvesvaraya
Technological University, Belagavi.

Submitted by
SHRADDHA S J 1AM22IS102
SINDHU M K 1AM22IS110
VIBHA GUNAGA 1AM22IS122
Y G SNEHA 1AM22IS124
Under the Guidance of
MANJULA DEVI P
Assistant Professor
Dept. of ISE, AMCEC

Department of Information Science and Engineering


AMC ENGINEERING COLLEGE
18th K.M, Bannerghatta Main Road, Bangalore-560 083

2024-2025
AMC ENGINEERING COLLEGE
DEPARTMENT OF INFORMATION SCIENCE AND
ENGINEERING
Accreditedby NAAC A+ AND NBA
18 Km, Bannerghatta Road, Bangalore-560083

CERTIFICATE
Certified that the project work entitled: “MULTILINGUAL CONVERSION OF
SIGN LANGUAGE TO TEXT” has been successfully completed by SHRADHHA S
J (1AM22IS102), SINDHU M K (1AM22IS110), VIBHA GUNAGA (1AM22IS122),
Y G SNEHA (1AM22IS124) all Bonafide students of AMC Engineering College,
Bengaluru in partial fulfilment of the requirements for the award of degree in Bachelor
of Engineering in Information Science and Engineering of Visvesvaraya
Technological University, Belgaum during the academic year 2024- 2025. The project
phase 1 report has been approved as it satisfies the academic requirements in respect of
project work for the said degree.

Manjula Devi P Dr. R. Amutha Dr. Yuvaraju.B.N


AssistantProfessor Professorand Head Principal AMCEC
Project Guide Department of ISE
DECLARATION
We, the student of 3rd year Information Science and Engineering, AMC Engineering
College, Bengaluru, hereby declare the project entitled “Multilingual Conversion of Sign
Language to Text” has been independently carried out by us under the guidance of
Manjula Devi P, Assistant Professor, Information Science and Engineering, AMC
Engineering College, Bengaluru and submitted in partial fulfillment of the requirements
for the award of the degree in Bachelor of Engineering in Information Science and
Engineering of the Visvesvaraya Technological University, Belagavi during the academic
year 2024-2025.

We also declare that, to the best of our knowledge and believe the work reported here
does not form or part of any other dissertation on the basis of which a degree or award
was conferred on an early occasion of this by any other students.

Place:

Date:

NAME USN SIGNATURE

SHRADDHA S J 1AM22IS102

SINDHU M K 1AM22IS110

VIBHA GUNAGA 1AM22IS122

Y G SNEHA 1AM22IS124
ACKNOWLEDGEMENT
At the very onset, we would like to place our gratitude on all those people who helped
us in making this project work a successful one.

Coming up, this project was not easy. Apart from the sheer effort, the enlightenment
of our very experienced teachers also plays a paramount role because it is they who
guide us in the right direction.

First of all, we would like to thank the Management of AMC Engineering College
for providing sucha healthy environment for the successful completion ofproject work
.
In this regard, we express our sincere gratitude to the Chairman Dr. K Paramahamsa
and the Principal Dr. Yuvaraju.B.N, for providing us all the facilities in this college.

We are extremely grateful to our Professor and Head of the Department of Information
Science and Engineering, Dr. R. Amutha, for having accepted to patronize us in the
right direction with all her wisdom.

We place our heartfelt thanks to Manjula Devi P, Assistant Professor, Department of


Information Science and Engineering for having guided us for the project, and all the
staff members of our department for helping us out at all times.

We thank Dr.Ashwin.M and Dr.R.Senkamalavalli Project Coordinators, Department


of Information Science and Engineering. We thank our beloved friends for having
supported us with all their strength and might. Last but not the least, we thank our parents
for supporting and encouraging us throughout. We made an honest effort in this
assignment.

SHRADDHA S J
[1AM22IS102]
SINDHU M K
[1AM22IS110]
VIBHAGUNAGA
[1AM22IS122]
Y G SNEHA
[1AM22IS124]

i
ABSTRACT

The Multilingual Conversion of Sign Language to Text project aims to bridge the communication
gap between individuals who use sign language and those who use spoken or written languages.
By leveraging advanced machine learning techniques and computer vision, this project seeks to
develop a real-time system capable of recognizing and translating sign language gestures into text
across multiple languages. This system utilizes computer vision and deep learning techniques to
recognize and interpret sign language gestures from video input, converting them into
grammatically accurate textual output. It then employs natural language processing and machine
translation models to render the recognized text into multiple spoken/written languages, enhancing
accessibility and inclusivity. By supporting various regional sign languages and spoken language
translations, this system offers a scalable, real-time communication tool that can be deployed
across platforms such as mobile devices, desktops, and smart cameras.

This system gathers a diverse dataset of sign language gestures from various sign languages such
as Indian Sign Language (ISL), Kannada Sign Language (KSL). This project will employ techniques
like image processing and neural networks to accurately identify and interpret sign language
gestures from video input. By combining cutting-edge technologies in computer vision and natural
language processing, the project seeks to make communication more accessible, efficient, and
inclusive for people worldwide.
Keywords: Indian Sign Language, Kannada Sign Language, Image Processing, Neural
networks, Deep Learning, Computer Vision.

ii
CONTENTS
TITLE PAGE NO.

ACKNOWLEDGEMENT i

ABSTRACT ii

CONTENTS iii
CHAPTERS
1: INTRODUCTION 1

1.1 Problem Statement 3

1.2 Objective of Project 4

1.3 Scope of Project 6

2: LITERATURE REVIEW 8

3: SYSTEM REQUIREMENTS SPECIFICATION 11

3.1 Software Requirements 11

4: SYSTEM ANALYSIS 12

4.1 Existing System 12

4.2 Proposed System 14

5: SYSTEM DESIGN 17

5.1 Block Diagram 17

5.2 Modules 20

5.2.1 Data Collection Module 20

5.2.2 Pre Processing Module 21

5.2.3 Hand Detection and Tracking Module 22

5.2.4 Feature Extraction Module 23

5.2.5 Model Training Module 24

5.2.6 Gesture Recognition Module 25

iii
5.2.7 Language Translation Module 27

5.2.8 Output Generation Module 28

5.2.9 User Interface Module 29

REFERENCES 31

iv
MULTILINGUAL CONVERSION OF SIGN LANGUAGE TO TEXT

Chapter 1

INTRODUCTION
Communication has been defined as an act of conveying intended meanings from one entity or
group to another through the use of mutually understood signs and semiotic rules. It plays a
vital role in the existence and continuity of human. For an individual to progress in life and
coexist with other individuals there is the need for effective communication. Effective
communication is an essential skill that enables us to understand and connect with people
around us. It allows us to build respect and trust, resolve differences and maintain sustainable
development in our environment where problem solving, caring and creative ideas can thrive.
Poor communication skills are the largest contributor to conflict in relationships.

The only form of communication for deaf and mute people—mostly illiterates—is sign
language. However, it is still difficult to interact with regular people without the aid of a human
interpreter because most members of the general public are not eager to learn this sign language.
The deaf and hard of hearing become isolated as a result. Nevertheless, the development of
technology makes it possible to overcome the obstacle and close the communication distance.
Various sign languages are used around the globe. There are around 300 different sign
languages in use around the globe. This is so because individuals from various ethnic groups
naturally created sign languages. Maybe there isn't a common sign language in India.

Based on the World Health Organization (WHO) statistics, there are over 430 million people
with hearing loss disability (WHO 2023) which is 5% of the world population and it is It is
estimated that by 2050 over 700 million people – or 1 in every 10 people – will have disabling
hearing loss. According to Sign Solutions, there are more than 300 sign languages used around
the world.

Different regions of India have their own dialects and lexical differences in Indian Sign
Language. However, new initiatives to standardize Indian Sign Language have been made
(ISL). It is possible to train the machine to recognize gestures and translate them into text and
voice. To facilitate communication between deaf-mute and vocal persons, the algorithm
effectively and accurately categorizes hand gestures. Additionally, the identified sign's gesture
name is spoken and displayed. system helps the blind to navigate independently using real time
object detection and identification. The development ofa sign language to text converter using

Dept. of ISE, AMCEC 2024-25 1 |P a g e


MULTILINGUAL CONVERSION OF SIGN LANGUAGE TO TEXT

artificial intelligence (AI) and machine learning (ML) has the potential to revolutionize
communication for people who are deaf or hard of hearing. By using complex algorithms and
deep learning models, AI and ML can accurately recognize and interpret sign language gestures
in real-time. This technology can also improve over time as it is trained on more data and can
adapt to variations in signing styles and dialects. The result is a powerful tool that can convert

sign language into written or spoken language, making it easier for people who do not know

sign language to communicate with those who do.

We present a novel method using convolutional neural networks (CNN) for hand sign language
recognition and conversion into text to address this problem. The purpose of this project is to
determine whether it is feasible and accurate to translate hand sign gestures into text. The need
for effective and inclusive communication, where deep learning technology is essential, is what
spurs this research.

In our project, we focus on producing a model which can recognize Fingerspelling-based hand
gestures to form a complete word by combining each gesture. The current study aims to review
the development of sign language recognition and translation systems in real-time concerning
various approaches and technology used. The following sections detail the unique contributions
of these works and highlight major ideas and improvements presented. Therefore, this survey
study synthesizes this huge body of research and is also likely to shed light on the
transformational potential and relevance of real-time sign language identification and
translation systems in encouraging inclusive communication for the deaf and persons with
hearing impairments.

Dept. of ISE, AMCEC 2024-25 2 |P a g e


MULTILINGUAL CONVERSION OF SIGN LANGUAGE TO TEXT

1.1 Problem Statement


Indian Sign Language (ISL), used by a significant portion of the deaf population in India, lacks
the same level of technological support as ASL. The existing communication tools, including
sign language converters and recognition systems, are mostly developed for ASL users. This
neglect leads to a digital divide that restricts ISL users' access to education, employment
opportunities, public services, and inclusive social participation. Moreover, the absence of
standardized and publicly accessible ISL datasets further hinders the development of reliable
solutions. Indian Sign Language (ISL), used by a significant portion of the deaf population in
India, lacks the same level of technological support as ASL. The existing communication tools,
including sign language converters and recognition systems, are mostly developed for ASL
users. This neglect leads to a digital divide that restricts ISL users' access to education,
employment opportunities, public services, and inclusive social participation. Moreover, the
absence of standardized and publicly accessible ISL datasets further hinders the development
of reliable solutions. This project aims to address this challenge by developing a robust ISL-
to-text conversion system. The proposed solution will focus on accurately recognizing and
translating Indian Sign Language gestures into textual content. Such a system will enable ISL
users to interact more effectively in educational settings, workplaces, and day-to-day
interactions, reducing their dependency on interpreters and promoting self-reliance.

Communication is a fundamental human right, yet millions of people with hearing and speech
impairments face daily challenges in expressing themselves to those who do not understand
sign language. This communication gap not only limits their ability to interact socially but
also restricts access to essential services such as education, healthcare, and employment.
Traditional means of bridging this gap—such as human interpreters—are often expensive,
unavailable, or impractical in real-time, everyday situations. As a result, there is an urgent
need for a cost-effective and scalable solution that can enable seamless communication
between sign language users and the general public. While several technologies have
attempted to address sign language translation, most existing systems are limited in
functionality. Many focus on converting gestures to text in a single language, often English,
and fail to accommodate multilingual needs or regional sign language variations.
Furthermore, these systems often lack real-time processing capabilities and struggle with
accurate gesture recognition due to environmental factors such as background noise, lighting,
or partial hand visibility. This reduces their effectiveness and limits their use in practical.

Dept. of ISE, AMCEC 2024-25 3 |P a g e


MULTILINGUAL CONVERSION OF SIGN LANGUAGE TO TEXT

Another critical challenge lies in the complexity of sign languages themselves. Unlike spoken
languages, sign languages rely heavily on hand shapes, movement, orientation, and facial
expressions, which must all be interpreted together to derive accurate meaning. Additionally,
each country or region may have its own version of sign language—such as American Sign
Language (ASL), British Sign Language (BSL), or Indian Sign Language (ISL)—making it
essential for a robust system to support multilingual output and dynamic translation.
Addressing these intricacies is key to developing a truly inclusive solution.

The proposed system aims to bridge this communication gap by developing a Multilingual
Conversion of Sign Language to Text application. Using computer vision and machine
learning techniques, the system will detect and recognize hand gestures from video input,
extract relevant features, classify the gestures, and convert them into text in the user’s
preferred language. The system will be structured using a Finite State Machine (FSM) to
ensure reliable and logical state transitions during the recognition and translation process.
This approach guarantees modularity, error handling, and consistent user interaction.

Ultimately, this project seeks to empower the deaf and hard-of-hearing community by
providing them with a tool that translates their sign language gestures into readable,
multilingual text in real-time. Such a system would not only enhance inclusivity and
independence but also improve interactions in educational, professional, and public
environments. By addressing limitations in current technologies and incorporating
multilingual capabilities, this solution strives to make communication more accessible and
equitable for all.

1.2 Objective of the Project


• To develop a sign language-to-text conversion system that accurately recognizes
gestures from sign languages, starting with Indian Sign Language (ISL).
• To enable multilingual text output from recognized sign gestures in various
spoken/written languages such as English, Hindi, and regional languages.
• To bridge the communication gap between the deaf and hard-of-hearing community
and non-signers across linguistically diverse environments.
• To use computer vision, machine learning, and natural language processing
technologies for real-time and precise gesture recognition and text translation.

Dept. of ISE, AMCEC 2024-25 4 |P a g e


MULTILINGUAL CONVERSION OF SIGN LANGUAGE TO TEXT

• To promote inclusivity and accessibility in education, workplaces, healthcare, and


public services by supporting multi-language communication.

• To develop a system that captures and interprets sign language gestures using
computer vision and machine learning techniques.

• To accurately detect and recognize hand gestures in real-time using webcam or


video input.

• To extract meaningful features from the gestures for reliable classification and
recognition.

• To implement multilingual support, enabling users to select and display text in


different spoken languages (e.g., English, Hindi, etc.). To utilize a Finite State
Machine (FSM) for managing state transitions in the gesture recognition and
translation process, ensuring modular and error-resilient flow.

Dept. of ISE, AMCEC 2024-25 5 |P a g e


MULTILINGUAL CONVERSION OF SIGN LANGUAGE TO TEXT

1.3 Scope of the Project


This project aims to address the communication barriers faced by the deaf and hard-of-hearing
community, particularly in linguistically diverse regions like India. The scope of this project
includes designing a system that can accurately recognize sign language gestures and convert
them into meaningful textual output in multiple languages, starting with Indian Sign Language
(ISL). A key part of the project's scope involves developing a gesture recognition system
capable of identifying hand signs, movements, and shapes specific to ISL. The system will be
trained on a dataset of labelled ISL gestures, using computer vision techniques to detect and
interpret them. Initially, the focus will be on static signs such as alphabets and commonly used
words, with future extensions to dynamic gestures and sentence-level recognition.

The project extends beyond simple gesture recognition by incorporating multilingual


capabilities. Once a gesture is recognized, the system will convert it into corresponding text in
a user-selected language. In the initial phase, the output languages will include English and
Hindi, with the potential to integrate more regional languages such as Tamil, Bengali, or
Marathi, making the system highly adaptable to various linguistic contexts. The scope includes
leveraging advanced technologies such as machine learning models for classification, computer
vision libraries for gesture detection (e.g., OpenCV, MediaPipe), and natural language
processing (NLP) tools for multilingual support. These technologies will work together to
ensure real-time and accurate translation of gestures to text, even in varied lighting and
background conditions.

To maximize impact, the system will be designed to run on commonly used platforms such as
web browsers and Android mobile devices. The scope includes developing a user-friendly
interface that enables deaf users, educators, and the general public to interact with the system
without requiring technical expertise. Accessibility features like voice output and adjustable
text size may also be included in later versions. The project is scoped to serve a wide range of
use cases, including communication in classrooms, workplaces, hospitals, and public service
environments. It can also act as a learning tool for those interested in sign language, promoting
inclusivity and awareness among the general population. By making sign language more
accessible and translatable, the project has the potential to empower deaf individuals and
integrate them more fully into society. Although the initial scope is limited to ISL and a few
output languages, the system is designed with scalability in mind. Future phases may involve
adding support forother sign languages like ASL or BSL, integrating speech synthesis for sign-

Dept. of ISE, AMCEC 2024-25 6 |P a g e


MULTILINGUAL CONVERSION OF SIGN LANGUAGE TO TEXT

to-speech conversion, and recognizing facial expressions and body posture to interpret emotion
and context. This would make the system more comprehensive and closer to natural human
communication. The practical applications of this project are vast, with significant potential to
improve communication between deaf and hearing individuals. In education, healthcare,
customer service, and social settings, this technology can help bridge the communication gap.
Furthermore, real-time sign language translation could assist interpreters and provide a solution
in situations where interpreters are not available. The societal impact of this system is immense,
as it could improve the quality of life for deaf individuals, promote inclusivity, and reduce
communication barriers in diverse environments. This project will focus on both technical
innovation and real-world applicability, with the aim of developing a robust and scalable
solution for multilingual sign language recognition and translation.

Dept. of ISE, AMCEC 2024-25 7 |P a g e


MULTILINGUAL CONVERSION OF SIGN LANGUAGE TO TEXT

Chapter 2

LITERATURE REVIEW
Ramesh M. Kagalkar and Nagaraj H.N, “New Methodology for Translation of Static Sign
Symbol to Words in Kannada Language.” Proposed a Methodology for Translation of
Static Sign Symbol to Words in Kannada Language. The goal of this paper is to develop a
system for automatic translation of static gestures of alphabets in Kannada sign language. It
maps letters, words and expression of a particular language to a collection of hand gestures
enabling an in individual exchange by using hands gestures instead of by speaking. The
system capable of recognizing signing symbols may be used as a way of communication with
hard of hearing people. Furthermore, the complexity of translating dynamic gestures into a
system for text output would require further development.. For example, systems based on
Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) have been
widely used for gesture recognition, achieving impressive accuracy in detecting hand gestures.
Datasets such as the American Sign Language (ASL) dataset, RWTH-PHOENIX for German
Sign Language, and others have been instrumental in training these models. However,
challenges remain in ensuring the accuracy and robustness of such systems, especially when
dealing with diverse sign language variations and different signing styles across regions. [1].

Vishnu Sai Y and Rathna G N, “Indian Sign Language Gesture Recognition using Image
Processing and Deep Learning.” proposed Indian Sign Language Gesture Recognition using
Image Processing and Deep Learning To bridge the identical attempt to bridge the identical,
we propose a real time hand gesture recognition system supported the data captured by the
Microsoft Kinect RGBD camera. on condition that there is no one to 1 mapping between the
pixels of the depth and thus the RGB camera, we used computer vision techniques like 3D
construction and transformation. After achieving one to 1 mapping, segmentation of the hand
gestures was done from the background. Convolutional Neural Networks (CNNs) were
utilised for training 36 static gestures relating Indian sign Language (ISL) alphabets and
numbers. The model achieved an accuracy of 98.81% on training using 45,000 RGB images
and 45,000 depth images. Further Convolutional LSTMs were used for training 10 ISL
dynamic word gestures and an accuracy of 99.08% was obtained by training 1080 videos. The
model showed accurate real time performance on prediction of ISL static gestures, leaving a
scope for further research on sentence formation through gestures. The model also showed
competitive adaptability to

Dept. of ISE, AMCEC 2024-25 8 |P a g e


MULTILINGUAL CONVERSION OF SIGN LANGUAGE TO TEXT

American Sign language (ASL) gestures when the ISL models’ weights were transfer learned
to ASL and it resulted in giving 97.71% accuracy [2].

Prof. Radha S. Shirbhate, Mr. Vedant D. Shinde, “Sign language Recognition Using
Machine Learning Algorithm.” proposed -Sign language Recognition Using Machine
Learning Algorithm. Hand gestures differ from one person to a different person in shape and
orientation; therefore, a controversy of linearity arises. Recent systems have come up with
various ways and algorithms to accomplish the matter and build this method. Algorithms like
K- Nearest neighbors (KNN), Multi-class Super Vector Machine (SVM), and experiments
using hand gloves were using decode the hand gesture movements before. during this paper, a
comparison between KNN, SVM, and CNN algorithms is completed to see which algorithm
would offer the simplest accuracy among all. Approximately 29,000 images were split into
test and train data and pre-processed to suit into the KNN, SVM, and CNN models to get
maximum accuracy [3].

Kajal Dakhare and Vidhi Wankhede, “A Survey On Recognition And Translation System
Of Real-Time Sign Language” proposed a method for Using depth cameras to recognise
sign language Processing depth camera data using computer vision methods. Depth cameras,
such as the Microsoft Kinect or Intel RealSense, capture three-dimensional data, which
provides more detailed information about the position and movement of the hands, arms, and
body compared to traditional RGB cameras. By processing depth data, their method improves
the accuracy and robustness of sign language recognition systems, especially in scenarios
where variations in lighting, background noise, or occlusions can affect the recognition
performance of conventional visual-based systems. The key advantage of using depth cameras
is the ability to detect the spatial relationships between the signer’s body parts in three
dimensions, which helps to distinguish subtle variations in gestures that may be challenging to
recognize using only RGB images.[4].

N.Tanibata and N.Shimada, “Extraction of Hand Features for Recognition of Sign Language Words”
proposed a system that describes a novel method of fingertips and centre of palms detection in
dynamic hand gestures generated by either one or both hands without using any kind of sensor
or marker. We call it Natural Computing as no sensor, marker or colour is used on hands to
segment skin in the images and hence user would be able to do operations with natural hand.
This is done by segmenting and tracking the face and hands using skin colour. The tracking of
elbows is done by matching the template of an elbow shape. The hand

Dept. of ISE, AMCEC 2024-25 9 |P a g e


MULTILINGUAL CONVERSION OF SIGN LANGUAGE TO TEXT

features like area of hand, direction of hand motion, etc. are extracted and are then input to
Hidden Markov Model (HMM) [5].

Aditya Das, Shantanu, “Sign Language Recognition Using Deep Learning On Static
Gesture Images.” proposed a system for Sign Language Recognition Using Deep Learning
On Static Gesture Images. The image dataset used consists of static sign gestures captured on
an RGB camera. Preprocessing was performed on the pictures, which then served as cleaned
input. The paper presents results obtained by retraining and testing this signing gestures dataset
on a convolutional neural network model using Inception v3. The model consists of multiple
convolution filter inputs that are processed on the identical input. This paper also reviews the
assorted attempts that are made at sign language detection using machine learning and depth
data of images. It takes stock of the varied challenges posed in tackling such an issue, and
descriptions future scope also [6].

Dept. of ISE, AMCEC 2024-25 10 |P a g e


MULTILINGUAL CONVERSION OF SIGN LANGUAGE TO TEXT

Chapter 3

SYSTEM REQUIREMENTS SPECIFICATION


3.1 Software Requirements

➢ Python: The primary language for implementing machine learning, computer vision,
and NLP models. Python is highly suitable for this project due to its extensive libraries
and frameworks for data processing and model training.
➢ OpenCV: An open-source computer vision library for real-time image processing and
computer vision tasks. It will be used for hand gesture recognition, image processing,
and video frame manipulation.
➢ Media Pipe: A framework from Google that facilitates real-time hand tracking and pose
estimation. It's useful for detecting hand gestures and body movements from video
streams.
➢ OpenCV (DNN Module): Used for integrating pre-trained models or deep learning
algorithms in the computer vision pipeline.
➢ NumPy and Pandas: Essential for handling large datasets, data manipulation, and
numerical computing tasks.
➢ Matplotlib or Seaborn: For visualizing data and performance metrics during training
and evaluation.
➢ Google Translate API: For translating the output text (from the sign language
recognition) into various languages. It supports a wide range of languages and can be
easily integrated with the system.
➢ IDE/Code Editor: Tools like PyCharm, VS Code, or Jupyter Notebook for code
development and testing.
➢ Intel RealSense cameras: These are popular for depth sensing and 3D imaging. They
capture depth data, which allows for more accurate hand gesture recognition by
providing additional spatial context. These cameras also work well in various lighting
conditions and can detect hand and body movements with greater precision, making
them ideal for recognizing complex gestures.

Dept. of ISE, AMCEC 2024-25 11 |P a g e


MULTILINGUAL CONVERSION OF SIGN LANGUAGE TO TEXT

Chapter 4

SYSTEM ANALYSIS
4.1 Existing System
The existing method of using single-handed signs for sign language recognition, particularly in
American Sign Language (ASL), poses certain challenges, both from a linguistic and physical
standpoint. ASL, as you noted, is used primarily in the United States and Canada, where many
sign language recognition systems have been developed to work with single- handed gestures.
However, this approach is not universally applicable to all sign languages, particularly those
used in other regions, such as the Indian Sign Language (ISL) or regional variants. In fact, the
Indian Deaf community typically uses two-handed signs for many of their gestures, which
differs significantly from the one-handed signs common in ASL.

One of the major issues with systems that rely heavily on single-handed gestures for sign
language recognition is the physical strain it places on users. Prolonged use of a single hand for
gesturing can result in hand sprains, muscle fatigue, and potentially lead to long-term injuries,
especially when used continuously for communication. Studies have highlighted the risks of
repetitive strain injuries (RSI) that arise from using one hand in unnatural or sustained positions.
Furthermore, the reliance on single-handed gestures limits the inclusivity of sign language
recognition systems. Sign languages across the world, including Indian Sign Language (ISL),
use a combination of one- and two-handed gestures, with some gestures requiring both hands
to convey meaning. For example, certain signs in ISL, such as alphabet gestures or complex
expressions, cannot be accurately conveyed using a single hand alone. Therefore, ASL-based
sign language recognition methods may not work well for other languages, especially in regions
where the Deaf community uses two-handed gestures as the norm.

From a health perspective, multi-hand gesture recognition systems can help reduce the risk of
repetitive strain injuries by promoting the use of both hands instead of relying on a single hand
for communication. The two-handed approach may allow for more natural and less strenuous
hand positions, reducing the likelihood of injuries associated with prolonged single-handed
gesturing.

Dept. of ISE, AMCEC 2024-25 12 |P a g e


MULTILINGUAL CONVERSION OF SIGN LANGUAGE TO TEXT

The existing method of focusing on single-handed sign language recognition is not without its
limitations, both in terms of its applicability to a wider range of sign languages and its impact
on users' physical health. For a more inclusive, accessible, and ergonomically friendly sign
language recognition system, it is essential to incorporate multi-handed gesture recognition.
This approach would not only accommodate sign languages like ISL but also promote healthier
communication practices by reducing strain on a single hand, ultimately creating a more
effective and user-friendly system for the global Deaf community.

Fig. 4.1. Single Hand Sign Language Gesture

Ultimately, the shift from single-handed to multi-handed gesture recognition represents an


essential step in making sign language recognition systems more inclusive, ergonomic, and
accurate. By incorporating two-handed gestures, these systems will be better suited to serve a
wider array of sign languages, including ISL and other regional variations. This transition will
not only improve the quality of communication for those who are Deaf but also promote
healthier long-term use of sign language, ensuring that it remains an accessible and sustainable
mode of communication.

Dept. of ISE, AMCEC 2024-25 13 |P a g e


MULTILINGUAL CONVERSION OF SIGN LANGUAGE TO TEXT

4.2 Proposed Systems


The proposed system for Multilingual Conversion of Sign Language to Text is designed to
recognize hand gestures from different sign languages and translate them into readable text in
multiple spoken languages. The system begins by capturing the hand gestures using a camera,
preferably a high-resolution RGB or depth-sensing device like Intel RealSense or a standard
HD webcam. These video inputs are processed frame-by-frame to detect and isolate the hand
region using techniques such as skin color filtering, contour analysis, or advanced landmark
detection using Media Pipe. This real-time gesture capture ensures that both static and dynamic
signs are accurately identified for further processing.

Fig. 4.2. Two hand Sign Language Gesture

Once the gesture is captured, the system performs preprocessing and feature extraction, where
the key characteristics of the hand—such as shape, orientation, finger positions, and
movement—are analyzed. These features are used as input for machine learning or deep
learning models like Convolutional Neural Networks (CNNs) or Recurrent Neural Networks
(RNNs), which are trained on large datasets containing labelled gestures from various sign
languages, including Indian Sign Language (ISL) and American Sign Language (ASL). The
classifier determines the most likely match for the gesture and outputs the corresponding word
or phrase. The recognized gesture is then mapped to a textual representation of the sign in a
base language (typically English). To achieve multilingual output, the system uses language
translation services such as the Google Translate API or pre-trained multilingual models to
convert the English text into the user's preferred language, such as Hindi, Kannada, or Tamil.
This adds flexibility to the system, making it suitable for users across different linguistic

Dept. of ISE, AMCEC 2024-25 14 |P a g e


MULTILINGUAL CONVERSION OF SIGN LANGUAGE TO TEXT

regions. The translated text is displayed in real time on the screen, and optionally, it can also
be converted to speech using text-to-speech technology to aid communication with non-
signers.

Overall, this system provides a comprehensive solution to bridge the communication gap
between the Deaf or hard-of-hearing community and the hearing population. By supporting
multiple sign languages and translating the recognized gestures into different spoken
languages, the system becomes more inclusive and culturally adaptive. Additionally, it
addresses ergonomic concerns by supporting two-handed gestures and minimizing repetitive
single-hand usage, making it both health-conscious and user-friendly. The modular design of
the system also allows future enhancements such as emotion detection, facial expression
interpretation, and bidirectional communication through text-to-sign generation.

To ensure smooth and logical progression through each stage of the sign language recognition
process, the system incorporates a Finite State Machine (FSM) architecture. FSM divides
the workflow into distinct states—such as Idle, Gesture Detection, Feature Extraction,
Recognition, Translation, and Output Display. Each state has defined transitions based on
specific triggers, such as successful detection or classification failure. This modular flow
prevents system confusion, enables better error handling, and ensures that each process is
executed only when its preconditions are met. For instance, the system will not proceed to the
Translation state until a valid gesture is successfully recognized. This structured design
significantly improves reliability, especially in real-time scenarios.

The user interface (UI) is designed to be intuitive and accessible for users with varying
technical skills. It features real-time video feedback, language selection dropdowns, gesture
capture indicators, and a text display panel for the final translated output. An optional voice
output toggle allows users to enable or disable text-to-speech functionality depending on the
context. The UI ensures responsive interactions and clear feedback at every stage, which is
essential for users who depend on visual cues. Additionally, the system can log gesture inputs
and translation results for performance monitoring or future training purposes, providing a
data-driven approach to improvement.

From a technical standpoint, the system is built with scalability and extensibility in mind.
The modular design allows new gestures, sign languages, and spoken languages to be added
with minimal changes to the core architecture. Gesture models can be updated or retrained as
new datasets become available, improving accuracy over time.

Dept. of ISE, AMCEC 2024-25 15 |P a g e


MULTILINGUAL CONVERSION OF SIGN LANGUAGE TO TEXT

Cloud integration can be employed to offload heavy computations or translation tasks,


making the system suitable for deployment on lightweight devices such as tablets or mobile
phones. This adaptability ensures the system remains relevant and useful across different
regions, age groups, and technical infrastructures.

Lastly, the system also considers future enhancements to enrich user interaction and expand
its functional scope. Planned improvements include incorporating facial expression and
emotion detection to support signs that depend on non-manual cues—an important aspect of
natural sign language communication. Another future goal is to enable bidirectional
communication, where not only can sign language be translated into text, but typed or
spoken language can also be converted into animated sign representations using 3D avatars or
gesture synthesis models. These additions would make the system even more interactive,
inclusive, and supportive of broader communication needs in education, healthcare, and
public services.

Dept. of ISE, AMCEC 2024-25 16 |P a g e


MULTILINGUAL CONVERSION OF SIGN LANGUAGE TO TEXT

Chapter 5

SYSTEM DESIGN
5.1 Block Diagram

Fig. 5.1. Systematic flow of Sign Language to text

The diagram represents a systematic flow of a sign language to text and voice conversion
system, typically focused on American Sign Language (ASL), but adaptable to other sign
languages as well. The process begins with image acquisition, where a camera captures live
video footage of a signer performing gestures. This input is critical as the foundation of the
system's operation. The captured frames are then forwarded to the hand detection and tracking
module, where advanced computer vision techniques identify and isolate the hand region in
each frame. This step ensures that only the relevant gesture parts are processed, discarding
unnecessary background elements and improving overall recognition accuracy.

After detecting and tracking the hand, the system proceeds to preprocessing, where the video
data is cleaned and standardized for analysis. This involves operations such as resizing images,
converting themto grayscale, removing noise, and segmenting the hand region. Preprocessing

Dept. of ISE, AMCEC 2024-25 17 |P a g e


MULTILINGUAL CONVERSION OF SIGN LANGUAGE TO TEXT

ensures consistency in the input data, which is essential for accurate recognition. Once the data
is pre-processed, the system moves to feature extraction. In this stage, critical details about the
hand's shape, movement, orientation, and finger positions are extracted. These features
uniquely represent each gesture and act as the basis for classification.

The extracted features are then sent to a training module, where machine learning algorithms
learn to associate these features with specific meanings. A training database is used to store
labelled gesture data that the system uses to learn from. This database contains various
examples of sign gestures and their corresponding text representations. Once trained, the
system moves into the recognition phase, where it matches incoming gesture data against the
trained models to identify the most appropriate label, such as a letter, word, or phrase.

Finally, the recognized gesture is translated into text output, which is displayed to the user. In
addition, the system may also include a text-to-speech (TTS) component to convert the text
into spoken words, making it easier for hearing individuals to understand the message. This
two-fold output—visual and auditory—enhances accessibility and supports smoother
communication between Deaf users and the general public. This end-to-end system effectively
bridges communication gaps and can be enhanced further to support multiple languages and
sign systems for broader societal impact.

Expanding on the system’s capabilities, the modular nature of this design allows for future
enhancements such as multilingual support. Currently, many existing sign language recognition
systems output only in English. However, by integrating language translation services (such as
Google Translate API), this system can be adapted to convert recognized signs into regional or
native languages like Hindi, Kannada, or Tamil. This multilingual feature broadens the
system’s applicability, particularly in a diverse country like India, where users may prefer
outputs in their mother tongue for better understanding and usability.

Additionally, the inclusion of both static and dynamic sign recognition capabilities makes this
system more robust. While static signs (such as letters or numbers) can be recognized from
single frames, dynamic signs (like "thank you" or "good morning") require temporal tracking
across multiple frames. This is where advanced models like Recurrent Neural Networks
(RNNs) or Long Short-Term Memory (LSTM) networks come into play, as they are designed
to learn from sequences. With such enhancements, the system can handle complete sentence
recognition in the future, making it a viable tool for real-time conversations between Deaf and
hearing individuals.

Dept. of ISE, AMCEC 2024-25 18 |P a g e


MULTILINGUAL CONVERSION OF SIGN LANGUAGE TO TEXT

In terms of real-world implementation, this system could be deployed in public service centers,
hospitals, schools, or customer service points to assist communication with Deaf individuals.
It can also be integrated into mobile applications or kiosks, making it portable and easily
accessible. By reducing reliance on human interpreters and enabling independent
communication, such a system supports inclusivity and empowers the Deaf community to
interact confidently in society.

Dept. of ISE, AMCEC 2024-25 19 |P a g e


MULTILINGUAL CONVERSION OF SIGN LANGUAGE TO TEXT

5.2 Modules
The project "Multilingual Conversion of Sign Language to Text" aims to develop a complete
pipeline that captures sign language gestures through a camera, processes the visual data,
recognizes the sign, and converts it into corresponding multilingual text and voice output. To
achieve this, the system is divided into well-structured functional modules, each responsible for
a specific task—from data collection to translation and final output generation. These modules
work together in a sequential manner to ensure accurate gesture recognition and user-friendly
communication.
5.2.1Data Collection Module

The data collection module is the foundational stage of the system. Its primary purpose is to
gather a comprehensive dataset of sign language gestures to train the machine learning models
effectively. This module can involve collecting images or videos of hand gestures using a live
webcam or importing data from publicly available sign language datasets such as the American
Sign Language (ASL) Alphabet dataset or Indian Sign Language (ISL) datasets. The quality and
diversity of the collected data directly affect the accuracy of gesture recognition.

In a practical implementation, this module may include a user-friendly interface for recording
gestures. Each gesture must be labeled accurately to associate it with the correct word, letter, or
phrase in the target language. Organizing the data in structured folders (e.g., one folder per
label) or maintaining a metadata file like a CSV or JSON file with annotations helps in the later
stages of training. Furthermore, collecting both static (single- frame) and dynamic (multi-frame
sequence) gesture data is important if the system aims to handle alphabet-level and sentence-
level translation. This module should also ensure ethical data usage and user consent if real user
recordings are involved.
Sources:
• Live gesture recordings via webcam.
• ASL/ISL datasets from open repositories

Tasks:
• Capture images or videos of individual signs.
• Label each sample accurately.
• Store them in a structured format (folders or CSVs).

Dept. of ISE, AMCEC 2024-25 20 |P a g e


MULTILINGUAL CONVERSION OF SIGN LANGUAGE TO TEXT

5.2.2 Pre Processing Module

Once data is collected, the preprocessing module prepares it for consistent and efficient model
training and recognition. Raw images and videos often contain noise, unnecessary background
objects, and inconsistencies in resolution or lighting, which can affect performance. This
module standardizes the input format, improves clarity, and highlights the region of interest—
the hand. Preprocessing begins with resizing all images to a fixed dimension (e.g., 224x224
pixels) and converting them to grayscale or HSV color space to simplify processing.

Segmentation techniques are applied to isolate the hand from the background. This can be
achieved using skin-color-based detection, background subtraction, or advanced methods like
Media Pipe segmentation. Noise reduction filters (e.g., Gaussian blur) are used to smooth the
images and remove irrelevant details. In dynamic gesture processing, this module may also
perform frame sampling to reduce redundancy. Normalization of pixel values (e.g., scaling to
0–1 range) ensures faster convergence during training. Overall, this step ensures uniformity in
the dataset, improving the learning and generalization capabilities of the model.

1. Background Subtraction using MOG2

Background subtraction is crucial for isolating the signer from a possibly cluttered
background. The MOG2 (Mixture of Gaussians) algorithm, available in OpenCV, models the
background using a series of Gaussian distributions for each pixel. It dynamically adapts to
lighting changes and detects moving objects (like hands and arms). This helps in focusing
only on relevant gestures by removing static backgrounds.

Algorithm: cv2.createBackgroundSubtractorMOG2()

2. Gaussian Blur (Noise Reduction)

To reduce noise in video frames and improve the accuracy of gesture detection, Gaussian Blur
is applied. It smoothens the image by averaging pixel values based on a Gaussian kernel. This
makes the image less sensitive to random noise or sensor inconsistencies, which is especially
helpful in low-light or variable-light conditions.

Algorithm: cv2.GaussianBlur(frame, (5,5), 0)

Dept. of ISE, AMCEC 2024-25 21 |P a g e


MULTILINGUAL CONVERSION OF SIGN LANGUAGE TO TEXT

3. Media Pipe Hand Detection (for ROI Isolation)

Google’s Media Pipe Hands is a powerful framework for real-time hand and finger tracking. It
detects 21 key landmarks on each hand and provides their coordinates in every frame. This
allows precise region-of-interest (ROI) extraction, which is critical for sign language where
finger positioning matters. It significantly simplifies gesture recognition in the later stages

Tool: Media Pipe Hands

5.2.3 Hand Detection and Tracking Module

The hand detection and tracking module plays a critical role in recognizing where the hand is
in real-time input frames. It ensures the system can isolate the hand regardless of the user's
position or environment. This module uses real-time object detection techniques to identify
hands within a frame and maintain focus on them as they move. For high accuracy and speed,
pre-trained models such as Media Pipe Hands (by Google) are commonly used. These models
not only detect hands but also return 21 key hand landmarks, including fingertips and joints,
which are essential for feature extraction.

Tracking ensures continuous gesture interpretation for dynamic signs and avoids jitter or signal
loss due to motion blur. If dual-hand recognition is required, this module can be extended to
identify and distinguish between left and right hands. Techniques such as bounding boxes,
landmark matching, or optical flow can be applied. Efficient hand detection and tracking
contribute significantly to the robustness of the entire system, ensuring that subsequent modules
work with clear, focused hand regions for analysis.

Dept. of ISE, AMCEC 2024-25 22 |P a g e


MULTILINGUAL CONVERSION OF SIGN LANGUAGE TO TEXT

5.2.4 Feature Extraction Module

In this module, meaningful numerical representations are extracted from the detected hand
regions to identify and classify gestures. Feature extraction converts visual information into a
machine-readable format, which the classification algorithm can interpret. The features may
include geometric attributes such as distances between fingers, angles between joints, hand
orientation, and relative positions of fingertips.

The extracted features are structured into vectors, which serve as input for the classifier. For
static gestures, a single feature vector per frame may be sufficient, while dynamic gestures may
require time-series data (a sequence of vectors). In such cases, temporal features like motion
flow, trajectory, and hand velocity can also be computed. The efficiency of this module depends
on its ability to produce distinct and consistent features for each gesture class, which directly
impacts recognition accuracy.

1. Media Pipe Landmarks for Pose and Hand Key points

One of the most effective ways to extract features from sign language videos is using Media
Pipe’s pose and hand landmark detection. Media Pipe provides 2D or 3D coordinates of
specific body parts (like elbows, wrists, shoulders) and fingers (21 landmarks per hand).
These coordinates serve as numerical feature vectors that represent hand shapes, positions,
and movements. These features are crucial for both static and dynamic sign recognition.

Tool: Media Pipe Hands + Pose

2. Optical Flow (Motion-Based Feature Extraction)

Optical Flow estimates the direction and speed of motion between consecutive frames. In sign
language, hand and arm movements are dynamic, so motion-based features like Optical Flow
help track how pixels (or hand regions) move across frames. It’s especially useful for
detecting dynamic gestures or transitions in continuous signing.

Algorithm: Lucas-Kanade or Farneback method (cv2.calcOpticalFlowFarneback)

Dept. of ISE, AMCEC 2024-25 23 |P a g e


MULTILINGUAL CONVERSION OF SIGN LANGUAGE TO TEXT

5.2.5. Model Training Module

This module is responsible for training the gesture recognition model using the extracted
features. It is a critical component that determines the intelligence of the system. The model
learns from the labeled dataset to identify patterns and associate them with corresponding
gesture labels. Various machine learning and deep learning algorithms can be used here. For
static signs, Convolutional Neural Networks (CNNs) are commonly used due to their ability to
process spatial data. For dynamic gestures, Recurrent Neural Networks (RNNs) or Long Short-
Term Memory(LSTM) networks are more effective as theycan learn fromsequences over time.

During training, the dataset is typically divided into training, validation, and testing sets. The
model’s performance is evaluated using metrics like accuracy, precision, recall, and F1-score.
Techniques such as data augmentation, dropout, and early stopping may be used to prevent
overfitting and improve generalization. Once trained, the model is saved and used by the
recognition module in real-time prediction. This module may also allow re-training or fine-
tuning if the system encounters new gestures or languages.

1. Convolutional Neural Network (CNN)

CNNs are widely used for static sign recognition, such as recognizing individual letters or
numbers in sign language. They excel at learning spatial features from images (e.g., hand
shapes, orientations). A CNN trained on preprocessed frames or hand images can accurately
classify signs.

Layers: Conv → ReLU → Pool → Fully connected

2. Long Short-Term Memory (LSTM)

LSTMs are a type of recurrent neural network (RNN) ideal for sequence modeling, such as
dynamic signs involving hand movement over time. LSTMs process sequences of features
(e.g., key points across frames) and capture long-term dependencies, like motion patterns or
sign transitions.

Input: sequences of features (e.g., landmark vectors)

Dept. of ISE, AMCEC 2024-25 24 |P a g e


MULTILINGUAL CONVERSION OF SIGN LANGUAGE TO TEXT

3. 3D Convolutional Neural Network (3D-CNN)

Unlike regular CNNs that operate on spatial data, 3D-CNNs extract features from both space
and time by using 3D kernels. They process short clips of video (stacked frames) and are
powerful for spatio-temporal gesture recognition.

Input: stacked frames (e.g., 16-frame video clips)

4. Transformer-Based Models (e.g., Video Transformer, TimeSformer)

Transformers are emerging as state-of-the-art in video understanding. Video transformers treat


each frame as a token and model attention across time and space. They’re highly effective for
both continuous sign language recognition and translation to text, especially in multilingual
contexts.

Examples: TimeSformer, VideoBERT, SignBERT

5. Hidden Markov Models (HMM) (classical approach)

HMMs were historically used in sign and speech recognition before deep learning. They
model the probabilistic transitions between gesture states. While not as powerful as modern
neural networks, they are lightweight and interpretable.

5.2.6. Gesture Recognition Module

The recognition module uses the trained model to identify hand gestures in real-time. When a
user performs a gesture in front of the camera, the system captures the input, applies
preprocessing, extracts features, and forwards them to the trained model for classification. The
model outputs the predicted label, which corresponds to a word or alphabet. This module must
process input quickly and accurately to provide real-time feedback to users.

In the case of dynamic gesture recognition, the module handles temporal inputs by processing
gesture sequences. The system may also include a confidence score for each prediction to
inform the user about recognition certainty. The predicted label is then passed on to the next
stage for translation and output. The efficiency of this module is essential for making the system
responsive and interactive.

Dept. of ISE, AMCEC 2024-25 25 |P a g e


MULTILINGUAL CONVERSION OF SIGN LANGUAGE TO TEXT

1. KNN (K Nearest Neighbour)

In the Gesture Recognition module of a sign language recognition system, K-Nearest


Neighbors (KNN) can play a vital role in classifying gestures into corresponding categories,
such as letters, words, or numbers in a specific sign language. The model works by analyzing
the features extracted from hand gestures, which might include factors like hand shape, joint
positions, movement trajectory, and angles. KNN then uses a distance metric (e.g., Euclidean
distance) to compare the newly extracted feature vectors with those in a pre-constructed
database of labeled gesture features. The gesture is classified based on the majority class of
the k nearest neighbors to the new input gesture, making KNN a powerful yet simple tool for
gesture recognition.

Example of KNN in Gesture Recognition:

1. Input: The user performs a hand gesture, and the system captures the gesture's feature
vector (e.g., hand position, shape, and joint angles).

2. Process: KNN calculates the Euclidean distance between the new feature vector and all the
feature vectors in the training set.

3. Output: The system identifies the k nearest neighbors (most similar gestures) and assigns
the gesture to the class that is most common among these neighbors (e.g., "A" for American
Sign Language or "1" for ISL).

Advantages of Using KNN for Gesture Recognition:

Simplicity and Intuitiveness: KNN is simple to implement and understand, making it a


practical choice for recognizing sign language gestures.

Adaptability: As new gestures are added to the system, KNN can easily adapt by storing
additional feature vectors and making classifications based on them.

No Need for Model Training: KNN does not require an expensive training phase and can
classify gestures as soon as the relevant features are extracted.

Dept. of ISE, AMCEC 2024-25 26 |P a g e


MULTILINGUAL CONVERSION OF SIGN LANGUAGE TO TEXT

5.2.7. Language Translation Module

To fulfill the goal of multilingual communication, this module translates the recognized English
gesture label into a regional language like Hindi, Kannada, Tamil, or any preferred language. It
bridges the gap between gesture recognition and user comprehension. Translation can be
achieved using external APIs such as Google Translate API, or offline methods involving
language mapping dictionaries. This module receives the output from the gesture recognition
system and maps it to its equivalent in the target language.

In more advanced systems, this module can also support contextual or phrase-based translation,
where the recognized words are translated in groups rather than as isolated tokens. This ensures
grammatical correctness and fluency in the target language. It may also provide transliteration
options and support multi-language switching. This module adds significant value to the system
by making it more inclusive and adaptable across regions.

1. Phrase-Based Statistical Machine Translation (PBSMT)

PBSMT models translation as a statistical process that finds the most probable output sentence
in the target language, given an input sentence. It uses a phrase table, where source phrases
(not just words) are mapped to likely target phrases based on probabilities learned from large
bilingual corpora. It also includes a language model (to ensure fluent output) and a decoder (to
find the best match). This approach was used by early systems like Google Translate before
they switched to neural methods..

Example:

Input (Hindi): "मैं स्कूल जाता हूँ"

Output (English): "I go to school"

Dept. of ISE, AMCEC 2024-25 27 |P a g e


MULTILINGUAL CONVERSION OF SIGN LANGUAGE TO TEXT

5.2.8. Output Generation Module

The output generation module is responsible for presenting the translated gesture in both text
and speech formats. This ensures that the recognized sign language can be understood by
hearing individuals who rely on spoken or written language. The recognized and translated word
or sentence is displayed on the screen in a user-friendly format. For voice output, text-to- speech
(TTS) engines like pyttsx3 (offline) or gTTS (online) can be used to vocalize the result.

1. Transformer-Based Text Generation (e.g., GPT)


The Transformer architecture is widely used for generating high-quality natural language text.
It works by processing input tokens (like glosses or keywords from gestures) through multiple
layers of self-attention to capture long-range dependencies between words. GPT (Generative
Pretrained Transformer) and similar models are often fine-tuned on specific tasks like text
generation, making them ideal for converting glosses or gesture sequences into fluent, human-
like text.
Architecture: Multi-layered attention network that processes input as sequences.

2. Sequence-to-Sequence (Seq2Seq) with Attention for Text Generation


Seq2Seq models are another popular choice for text generation. The Seq2Seq architecture
includes an encoder-decoder structure where the encoder processes input sequences (like
glosses), and the decoder generates the output sequence (text). When combined with attention
mechanisms, the model can focus on the most relevant parts of the input at each step,
improving the fluency and context of the generated text.
Architecture: Encoder (RNN/LSTM) processes the input, and decoder generates the output.
Attention: Allows the decoder to focus on relevant parts of the input at each generation step.
Output: Translates gloss sequences into grammatically correct text in the target language.
Usage: Used for translating sequences like gloss to text in real-time or batch processing.

Dept. of ISE, AMCEC 2024-25 28 |P a g e


MULTILINGUAL CONVERSION OF SIGN LANGUAGE TO TEXT

5.2.9. User Interface Module

The user interface (UI) is the layer through which users interact with the system. It integrates
all other modules and provides a seamless and intuitive experience. The UI can be built as a
desktop application using Tkinter or PyQt, or as a web application using React, HTML/CSS,
and JavaScript. The main features include live video feed display, real-time text output,
language selection dropdown, and start/stop recognition buttons.

The UI ensures that users with no technical background can easily use the system. It may also
include tooltips, progress indicators, and error alerts to enhance usability. Accessibility features
like large text, high contrast modes, and multilingual support ensure that the UI can be used by
people of all age groups and abilities. This module is critical in determining how well the system
is received by end-users.

1. Event-Driven Programming

Event-driven programming is the foundation of interactive UIs. In this approach, the


application reacts to user actions (events) such as clicks, key presses, or gestures. The program
runs in a loop, waiting for events and executing specific code in response to those events. This
algorithm is fundamental in graphical user interface (GUI) design, enabling applications to be
interactive and responsive.

Algorithm Type: Event loop, listener mode

2. Model-View-Controller (MVC) Architecture

The Model-View-Controller (MVC) design pattern is a traditional software architectural


pattern used to structure UI code. It separates the application into three components:

Model: Represents the application data (e.g., the sign language gloss or translated text)

Frameworks: Django (Python), ASP.NET MVC (C#), Spring MVC (Java)

3. State Machine (Finite State Machine):

A Finite State Machine (FSM) is a mathematical model used for designing the UI's state
transitions. It is especially useful for applications with distinct modes or states, such as
interactive forms, tutorial steps, or multi-step sign language translation. The FSM ensures that
the UI behaves predictably by defining a set of states and possible transitions based on user
actions.
Dept. of ISE, AMCEC 2024-25 29 |P a g e
MULTILINGUAL CONVERSION OF SIGN LANGUAGE TO TEXT

A Finite State Machine (FSM) provides a structured way to manage the flow of an
application's user interface by defining distinct states and the transitions between them. This
approach is especially beneficial for applications with clear sequential or conditional steps,
such as sign language translation systems. For instance, in such systems, the UI can be broken
into states like Idle, Capturing Gesture, Recognizing Gesture, Translating Text, and
Displaying Output. Transitions between these states are triggered by user interactions or
system events, such as starting the camera, detecting a hand gesture, or completing a
translation. By modeling these interactions explicitly, FSMs help ensure the system behaves
predictably and logically in response to inputs, reducing errors and making the interface easier
to understand and navigate.

FSMs also improve the maintainability and scalability of UI systems. Because each state and
its corresponding transitions are clearly defined, developers can easily isolate and troubleshoot
issues or extend the application with new functionality. In a multilingual sign language
translation app, for example, adding support for a new language might involve inserting new
states or transitions without disrupting the existing flow. FSMs can be implemented manually
or with state management libraries that visualize state transitions, enabling better debugging
and design validation. This structured approach ensures a seamless user experience, especially
in complex, multi-step applications where consistency and correctness are critical.

Algorithm Type: State transitions (states and events)

Frameworks: XState (JavaScript), Statecharts, SMACH (Python)

Dept. of ISE, AMCEC 2024-25 30 |P a g e


MULTILINGUAL CONVERSION OF SIGN LANGUAGE TO TEXT

REFERENCES

[1]. Ramesh M. Kagalkar, Nagaraj H.N, “Methodology for Translation of Static Sign Symbol
to Words in Kannada Language”, International Journal of Recent Technology and Engineering
(IJRTE) -2020.

[2]. Vishnu Sai Y, Rathna G N, “Indian Sign Language Gesture Recognition using Image
Processing and Deep Learning”, International Journal of Scientific Research in Computer
Science (IJSRCSEIT)- 2018.

[3]. Prof. Radha S. Shirbhate, Mr. Vedant D. Shinde, “Sign language Recognition Using
Machine Learning Algorithm.”, IJASRT -2020.

[4]. Kohsheen Tiku, Jayshree Maloo, Aishwarya Ramesh, Indra R, “Real-time Conversion of
Sign Language to Text and Speech,” Second International Conference on Inventive Research
in Computing Applications, Coimbatore, India, 2020

[5]. Zeeshaan W. Siddiqui, Vinay E. Koche, Shubham P. Tapre, Prof. Neema Amish Ukani,
Prof. Sandeep Sonaskar “GESTURE BASED VOICE MESSAGE SYSTEM FOR PATIENT”
International Research Journalof Engineering and Technology(IRJET) - May 2021.

[6]. Aditya Das, Shantanu, “Sign Language Recognition Using Deep Learning On Static
Gesture Images.”, Institute of Electrical and Electronics Engineering(IEEE)-2018

[7]. Ajay S, Ajith Potluri, Gaurav R, Anusri S “Indian Sign Language Recognition Using
Random Forest Classifier” IEEE International Conference on Electronics, Computing and
Communication Technologies (CONECCT) - 2021.

[8]. Pooja B S, Ramesh Ganesh Patgar and Pradheep S, “Smart Gloves for Hand Gesture
Recognition in Kannada.”, International Conference On Circuits, Control, Communication
And Computing – 2022.

• https://www.nidcd.nih.gov/health/american-sign-language.

• https://www.ibm.com/in-en/topics/convolutional-neural-networks.

Dept. of ISE, AMCEC 2024-25 31 |P a g e

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy