Final Khan2020 Article ANovelNaturalLanguageProcessin
Final Khan2020 Article ANovelNaturalLanguageProcessin
Final Khan2020 Article ANovelNaturalLanguageProcessin
https://doi.org/10.1007/s12559-020-09731-7
Received: 31 May 2019 / Accepted: 14 May 2020 / Published online: 31 May 2020
# Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract
Background/Introduction The deaf community in the world uses a gesture-based language, generally known as sign
language. Every country has a different sign language; for instance, USA has American Sign Language (ASL) and
UK has British Sign Language (BSL). The deaf community in Pakistan uses Pakistan Sign Language (PSL), which like
other natural languages, has a vocabulary, sentence structure, and word order. Majority of the hearing community is not
aware of PSL due to which there exists a huge communication gap between the two groups. Similarly, deaf persons are
unable to read text written in English and Urdu. Hence, the provision of an effective translation model can support the
cognitive capability of the deaf community to interpret natural language materials available on the Internet and in other
useful resources.
Methods This research involves exploiting natural language processing (NLP) techniques to support the deaf community by
proposing a novel machine translation model that translates English sentences into equivalent Pakistan Sign Language (PSL).
Though a large number of machine translation systems have been successfully implemented for natural to natural language
translations, natural to sign language machine translation is a relatively new area of research. State-of-the-art works in natural to
sign language translation are mostly domain specific and suffer from low accuracy scores. Major reasons are specialised language
structures for sign languages, and lack of annotated corpora to facilitate development of more generalisable machine translation
systems. To this end, a grammar-based machine translation model is proposed to translate sentences written in English language
into equivalent PSL sentences. To the best of our knowledge, this is a first effort to translate any natural language to PSL using
core NLP techniques. The proposed approach involves a structured process to investigate the linguistic structure of PSL and
formulate the grammatical structure of PSL sentences. These rules are then formalised into a context-free grammar, which, in
turn, can be efficiently implemented as a parsing module for translation and validation of target PSL sentences. The whole
concept is implemented as a software system, comprising the NLP pipeline and an external service to render the avatar-based
video of translated words, in order to compensate the cognitive hearing deficit of deaf people.
Results and Conclusion The accuracy of the proposed translation model has been evaluated manually and automatically.
Quantitative results reveal a very promising Bilingual Evaluation Understudy (BLEU) score of 0.78. Subjective evaluations
demonstrate that the system can compensate for the cognitive hearing deficit of end users through the system output expressed as
a readily interpretable avatar. Comparative analysis shows that our proposed system works well for simple sentences but
struggles to translate compound and compound complex sentences correctly, which warrants future ongoing research.
Keywords Machine translation . Natural language processing . Deaf people communication . Pakistan Sign Language .
Cognition . Rule-based translation
Introduction
which is called sign language. A gesture is a body language text into PSL gestures will be a key step towards the integra-
that comprises of hand shapes, location, movements, and tion of Pakistan’s deaf community into society, as the transla-
some non-manual features. The non-manual features mostly tion system may be used in developing several useful appli-
include facial expressions that may include lifting eye-brows, cations including the translation of study material, news arti-
smile, eye blinks, and mouth and tongue movements [2]. The cles, cell phone messages, and other useful information re-
sentence structures of sign languages are different from those sources available in English into equivalent PSL.
of written languages; therefore, the deaf community is unable To the best of our knowledge, this is the first machine
to read and understand the text written in natural languages. translation model which translates English language sentences
This challenges the cognitive abilities of the deaf community into equivalent PSL sentences. To this end, we provide the
to benefit from any written material. following: (i) a systematic approach to design and develop
Global sign language does not exist and every country has such system which involves people from the deaf community
its own sign language [3]. In addition, a variety of regional and PSL experts; (ii) a dataset has been generated with the
sign languages exist within a country. The developed coun- help of deaf subject and PSL language experts; (iii) based on
tries have been working to create inclusive communities by rigorous analysis of the dataset, a grammar has been defined to
developing new and improving existing assistive technologies translate and verify PSL sentences; (iv) a translation model
for their people with disabilities, including the deaf commu- has been proposed and a software system has been developed
nity. A significant amount of research has been performed on using this translation model that takes English language sen-
the development of communication tools and strategies for tence as input and converts it into equivalent PSL sentence
American Sign Language (ASL), British Sign Language using the defined grammar; (v) subjective and automatic eval-
(BSL), and many other languages. However, it is important uation of the proposed translation system has been carried out
to consider that majority of the deaf population lives in the using appropriate evaluation measures, and by involving both
developing countries. A number of developing countries have deaf subjects and sign language experts.
also recently begun focusing on improving assistive technol- The rest of the article has been presented in the following
ogy for their deaf populations. Some excellent research work manner. “Related Works” provides relevant literature review.
has been published that focuses on Arabic Sign Language “Translation Process” discusses high-level details of the pro-
(ArSL), Indian Sign Language (ISL), and several other sign posed translation process. The analysis of collected data has
languages. But, no significant work on standardisation and been presented in “Tense-Based Analysis”, while “Machine
translation of Pakistan Sign Language (PSL) has been Translation” discusses the details of the established grammar
accomplished. rules for the translation system. Implementation details of a
In order to improve the cognitive abilities of the deaf com- system that translates English language sentences into equiv-
munity of Pakistan, this work aims to propose a machine alent PSL sentences using the defined grammar have been
translation model that translates natural language text into discussed in “Machine Translation System Implementation”.
Pakistan Sign Language. The following main challenges have The evaluation of the effectiveness and correctness of the
been identified in this regard: developed translation system has been presented in
“Evaluation and Results”. Lastly, conclusion and future direc-
& Absence of a substantial sentence level corpus that may tions have been presented in “Conclusion and Future
help in translating several types of English sentences into Directions”.
PSL
& Lack of availability of linguistic information for PSL
& No written grammar rules of PSL Related Works
This research chooses English as a natural language, as it is Machine translation has been an important subject in comput-
linguistically well-studied language as compared with any er science for almost sixty years. During most of this time,
other scripted natural languages. It is important to mention however, machine translation for natural to sign languages has
here that Urdu, Pakistan’s national language, has not been been largely overlooked. It is only within the last decade that
used as the source language because it does not have adequate there is a renewed interest in the automated translation for sign
linguistic resources in the current state-of-the-art to properly languages. It is linked to both progresses in linguistic studies
process the Urdu sentences, while another important reason is in sign language [4, 5] and an increase in computing re-
that the end product of the proposed translation system pipe- sources. Machine translation from natural to sign languages
line is sign language avatar which plays PSL gestures for the is pertinent for seamless integration of the deaf community
words in translated sentence in given sequence. Thus, it does into society [6]. Several such domain specific translation sys-
not affect whether the intermediate translated output is in Urdu tems have been developed for different languages including
or in English language. Such translation of natural language American Sign Language [7–10], British Sign Language [11],
750 Cogn Comput (2020) 12:748–765
Spanish Sign Language [12, 13], South African Sign corpus comprising of PSL gestures, and sentences of English
Language [14], Italian Sign Language [15], and German language and their equivalent PSL sentences. Lastly, it inte-
Sign Language [16]. grates the proposed translation model and compiled corpus
Direct translation system has been used in TESSA system into an information system that translates English sentences
to translate English Text into BSL gestures [17], while most of into PSL gestures, and evaluates the results.
the translation systems are based on rule-based or grammar-
based translation; for instance, ZARDOZ [18], ASL
Workbench [19], and ViSiCAST [20] are few systems for
Translation Process
English-based sign languages. Similarly, there exists a
grammar-based translation system for South African sign lan-
The proposed machine translation model is based on a sys-
guage [14], and for Spanish sign language [21]. Another var-
tematic and step by step process. Figure 1 shows high-level
iant of machine translation systems is statistical machine
steps involved in the process for materialising this translation
translation [22]. But, not many sign language translation sys-
activity. The approach starts with data collection of natural
tems have been developed using statistical machine transla-
language sentences, which in this case is English language.
tion models, as they build upon sequence of words in large
English has been selected because it is linguistically well-in-
data sets, which are not available for most of the sign lan-
vestigated, and presents a pool of NLP resources for various
guages. Translation systems for ASL and BSL have been de-
forms of natural language analysis, while data collection in-
signed and are getting mature. However, in recent years tools
volves gathering a complete variety of English language
and technologies have also been developed for the translation
sentences which are then translated into PSL by involving
of specific European and Asian sign languages, for instance,
different deaf people, interpreters, and PSL experts. A thor-
Greek Sign Language [23], Spanish Sign Language [13],
ough study of the collected data is carried out to determine the
Arabic Sign Language [24], Vietnamese Sign Language
similarities and variations between the English and PSL lan-
[25], Indian Sign Language [26], Bangla Sign Language
guage structures. It is evident from the translated data that the
[27], and Thai Sign Language [28].
English and PSL sentence structure is different. The PSL
Creation of Pakistan Sign Language resources and technol-
grammar rules are eventually developed and formalised using
ogies is in its infancy [2]. The work begins in early 2000 [29,
context-free grammar. Finally, the effectiveness of the pro-
30] with a study that attempted to collect data for words and
posed grammar is evaluated by translating a different set of
phrases in PSL. Following that, however, no significant effort
English sentences into PSL with the help of an automated
has been made to automate the translation of natural language
translation process. The details of all these steps have been
into PSL. Some recent efforts have been made to layout an
presented in the following sections and subsections.
architectural platform to convert natural language text into
PSL, and vice versa [2, 31]. But , this initiative only illustrates
the key components of such a structure and does not entail any Data Collection of Source Language
implementations.
Another orthogonal research direction is of recognising For any machine translation model to work, it is necessary to
sign language gestures and converting them into text. In this have a clear understanding of the syntactic and semantic struc-
regard, a significant amount of research work has been done tures of both source and target languages. The data collection
on American Sign Language [9, 10]. Some fundamental work process for this research has been driven by the taxonomy of
in this direction has also been done about PSL [32] [33]. English language sentences as shown in Fig. 2, which shows
Similarly, another relevant yet different dimension of this re- that the sentences are precisely categorised in terms of
search is to create a standard corpus for sign language trans-
lation and gesture recognition purposes. Some basic collec-
tions for PSL gestures exist, but there is not corpus to facilitate
natural to sign language translation for PSL [5].
To the best of our knowledge, the translation model pro-
posed in this research is the first detailed and technically rich
instance of machine translation from natural language to the
Pakistan Sign Language. It is pertinent to mention here that
rule-based translation approaches are the most suitable
starting points for developing the machine translation systems
in the absence of any sizeable translation corpus. Therefore,
this research proposes a rule-based translation model for trans-
lating natural language into PSL. It also compiles first parallel Fig. 1 Translation and evaluation
Cogn Comput (2020) 12:748–765 751
Sentence structure constituents. This can be observed from the sentences shown
in Table 1.
Most of the English language sentences use subject, verb, and Adjectives and Adverbs: The sample sentences in Table 1
object (S+V+O) word order in sentence formation, whereas show that in PSL, the adjectives are always placed after the
the data which has been obtained from the deaf subjects shows noun in the sentence, whereas the position of adverb remains
that the deaf use different subject, verb, and object permuta- the same as it was in the source language sentence.
tions. An example PSL sentence, with different subject, verb, Negations in PSL: In all types of PSL sentence negation is
and object variations, has been presented in Table 2. The represented by adding the word “not” at the end of the
frequency of each permutation of subject, verb, and object is sentence.
computed to figure out which variant is most frequently used Lemmatised Form of Words: Pakistan sign language
by the deaf community. The results reveal that more than 80% sentences are formed using only the lemmatised words. PSL
of the sentences collected from the deaf community followed does not use linking verbs and suffixes.
the (S+O+V) word order. Therefore, (S+O+V) is considered Time Words: Words like tomorrow, yesterday, or soon are
to be the default word order for PSL sentences. used in English to indicate the time of an event. If any of the
time words exist in the English sentence, it must be placed at
Grammatical Differences Between English and PSL It is perti- the beginning of the sentence in PSL.
nent to discuss the grammatical differences between English Interrogative: Sentences having questions in them are
and PSL. The data shown in Table 1 helps us in identifying the called interrogative sentences. They are further divided into
differences between the sentence structures of English and two subcategories:
PSL in many different ways, which is discussed below:
Removal of Articles and Propositions: Written languages 1. Wh question, if the source sentence is an interrogative
like English use articles and prepositions in their written then it’s wh-question part is moved to the end of the sen-
scripts, while the gesture-based languages do not include such tence, except if it starts with “when”, and in that case, the
word “time” is placed at the end of the sentence.
2. Auxiliary question: if the source sentence is an interroga-
Table 2 PSL word order tive, then its auxiliary question part is replaced by “yes
permutations Sentence Word order no” and is moved to the end of the sentence.
Like badminton she. V+O+S
She badminton like. S+O+V
Part of Speech (POS) Tagging
She badminton like. S+O+V
Penn Treebank contains part of speech tags, and there are
about 48 different POS tags available in Penn treebank. The
Badminton like she. O+V+S
analysis of translated PSL sentences reveals that in PSL sen-
Badminton she like. O+S+V
tence formation only employs basic forms of POS tags unlike
Like she badminton. V+S+O
richer POS tags used for English sentences. Figure 4 shows
Cogn Comput (2020) 12:748–765 753
POS tags for an example English sentence along with simple been collected from deaf subjects and validated by sign lan-
POS tags of mapped PSL sentence. It can be observed from guage experts and interpreters. This section provides a de-
the figure that “is” that is tagged as VBZ has been eliminated; tailed overview of these various types of tenses while
similarly, “chasing” is assigned VBG tag which has been sim- highlighting differences for each tense between English and
plified as verb’s root form in corresponding PSL sentence. PSL sentences.
Table 3 shows the mapping of Penn Treebank POS tags for
English onto POS tags used in PSL. Simple Present and Simple Past Tense
The translated sentences show that regardless of a singular represent future tense in English sentences have been elimi-
or plural subject, PSL uses only the root form of the verb. For nated in the translated PSL sentences. It is quite interesting to
articles, prepositions, negation, and interrogation, the same note that in PSL, the sentences belonging to past and future
rules hold which have been drawn in simple present and past tenses follow the same grammatical structure that has been
tense cases earlier; also the words “does” or “do” also get observed in the present tense, while additional words like
eliminated in the translated PSL sentences. “was” and “after” are added for past and future tenses,
In English the past indefinite affirmative sentences consist respectively.
of simple-past form of the verb as shown in the examples: “Ali
played a piano”; “They sang the national song”. While for past Present, Past, and Future Continuous Tense
indefinite negative sentences, “did not” is used along with the
base form of verb, e.g. “Ali did not play a piano” and “They The present continuous tense formation rules in English in-
did not sing the national song”. Similarly, in the case of past volve the usage of helping verbs like “is”, “am”, “be”, and
indefinite interrogative sentences, “did” comes at the begin- “are” along with applying the suffix “ing” to the verb. This is
ning of the sentence along with a base form of the verb, e.g. also known as present participle form of the verb. The data of
“Did Ali play a piano?” and “Did they sing the national different variants of continuous tense has been presented in
song?”. Table 6, and it shows that for continuous tense the word
The translation of these different variants of past indefinite “now” is included at the end of the PSL sentences, while
sentences into PSL reveals one commonality; i.e. the word helping verbs along with articles and propositions are omitted
“was” has been added at the start of the sentences, while the from the translated PSL sentence. Likewise, the present parti-
simple-past form of the verb changes to its base form, whereas ciple form of the verb is converted back to the simple-present
no change has been observed for negative and interrogative form of the verb. Lastly, the words “was” and “after” are used
sentences. Lastly, the word “did” also gets eliminated from the for past and future tenses, respectively.
translated sentences. The translation for different variants of
the present, past, and future indefinite sentences has been pre- Present, Past, and Future Perfect Tense
sented in Table 5. It can be observed that the word “after” is
added at the end of each sentence to reflect that the task per- Perfect tense sentences are considered the next category. In
formed by the subject may complete later w.r.t time. English language the sentences belonging to this group use
Furthermore, the words “will” and “shall” which are used to “has” or “have” for the present, “had” for past, and “will have”
or “shall have” for future tenses, whereas these sentences use Machine translation of natural language to sign language is
past-participle forms of the verbs. Some sample translated relatively a new area of research which is gaining importance
sentences for different variants of perfect tense have been in recent years. For any kind of translation, the essential re-
presented in Table 7. It can be observed from the translated quirement is to preserve the meaning of source language
sentences that for all tenses the word “full” has been added at sentences after converting it into the target language.
the end of the sentence in PSL, whereas the use of “was” and Machine translation is mainly classified into semantic, statis-
“after” for past and futures tenses remains consistent as pre- tical, and neural translation models [8, 33].
vious observations.
Statistical and Neural Machine Translation Models These
Present, Past, and Future Perfect Continuous Tense techniques need a bilingual dictionary consisting of sentences
and their translations. Specific statistical methods manipulat-
In perfect continuous tenses, the phrases “has been”, “have ing the sequence of words are applied to the bilingual corpus
been”, “had been”, “will have been”, or “shall have been” during sentence translation [34, 35]. These models generate
are used along present form of the verb. The examples of quick and accurate translations, but at the same time, the trans-
translation of perfect continuous tense have been presented lation accuracy is heavily depended on the availability of such
in Table 8. It can be observed that even in these sentences, a significant corpus that covers all the grammatical categories
the PSL sentences use the basic S+O+V structure, while the of source and target language. These approaches cannot be
translated sentences follow the same rule that have been ob- applied to our system because, to the best of our knowledge,
served in the continuous and the perfect tense categories, i.e. no such corpus is available for PSL. As a matter of fact, no
the addition of “full” and “now” in the sentences. such corpora exist for almost any other sign language as well
[36].
Machine Translation Semantic Translation Models These models are based on rules
or grammars to translate from one language to another.
Machine translation (MT) is also known as computerised Following are different types of semantic translation models.
translation. During the machine translation process, automat-
ed software techniques are used to translate the text written in Direct Translation Models Word to word substitution is ap-
one natural language such as English to another, such as Urdu. plied in this kind of translation; no grammatical and semantic
Table 8 PSL translation of simple present past and future perfect continuous tense
Present perfect continuous affirmative He has been going to school. He school go full now.
Present perfect continuous negative He has not been going to school. He school go full now not.
Present perfect continuous interrogative Has he been going to school? He school go full now Yes-No.?
Past perfect continuous affirmative He had been going to school. Was he school go full now?
Past perfect continuous negative He had not been going to school. Was he school go full now not?
Past perfect continuous interrogative Had he been going to school? Was he school go full now Yes-No?
Future perfect continuous affirmative He will have been going to school. He school go full after now.
Future perfect continuous negative He will have not been going to school. He school go full after now not.
Future perfect continuous interrogative Will he have been going to school? He school go full after now Yes-No?
understanding of target language is considered. It is quite a Need for Grammar and Its Formalism Grammar comprises of
visible form of this fact that such type of translations only building blocks of any language’s sentence structure. Every
consider the source language. It has been learned from the spoken language has a grammatical structure for its sentence
literature survey that sign languages are natural languages formation. The differences and similarities between English
with their sentence structure, grammar, and semantic rules. and PSL sentences have been identified during the analysis of
Therefore, this direct translation approach of word by word the sentence level corpus. Based on this analysis, generalised
substitution shall not be able to preserve the semantics of the rules have been formulated which are helpful in proposing an
source sentence being translated. appropriate grammar-based translation model. To this end,
instead of implementing the rules using basic if-else state-
Rule-Based Translation Models A rule-based translation sys- ments, a much better idea is to use any formal notation to
tem requires a collection of rules, and a software system to express these rules that can be quickly processed using some
process those rules. An important requirement in a rule-based algorithmic process. In this research work, context-free gram-
machine translation model is the existence of grammar for mar (CFG) has been used as a formal notation to express these
both languages so that all specific sentence categories of the translation rules. The main advantages of using grammar for-
source language can be translated into the target language malism are as follows: (a) it is extendable, and new rules can
keeping the syntax intact. Different sign languages like easily be added as and when required; (b) rules can be mod-
ASL, BSL, and ArSL are well studied, and their grammatical ified and fixed if they required amendments; (c) state-of-the-
structures are well defined. Due to this reason, there exist a art natural language parsing techniques can be applied easily.
few rule-based MT systems for these languages [37, 38].
However, PSL is not linguistically well investigated and no
rules are available in the literature for the formation of PSL Grammar Generations
sentences. The main challenge for English to PSL machine
translation using a rule-based approach is the generation of Grammar is a structure of a language, also referred to as rules
PSL grammar. An in-depth analysis has been performed to of a language. Better familiarity with the grammar of a lan-
determine the differences between the source and target lan- guage results in clear and understandable communication in a
guage as discussed in the previous section. Furthermore, the given language. Thus, the understanding of grammar in-
transformation rules for translating English sentences into creases the effectiveness of communication. Context-free
PSL have been identified. grammar has been used to represent the rules for PSL.
It is pertinent to highlight again that English is chosen as a A context-free grammar (CFG) is a notation for describing
source natural language in this research mainly because not languages. It is a set of recursive rewriting rules or productions
only the grammar of English is well documented but various used to generate a string of a language. Formally, a CFG
computational tools are also available for the pre-processing consists of four tuple information G = {V, ∑, R, S} and each
of the source language sentences, including POS tagging and component of the tuple is defined as follows:
dependency illustrations of these POS. Furthermore, to trans-
late English text to equivalent PSL sentence, there is a strong 1. V is a finite set; each element v ε V is called a non-
need for PSL grammar which can be used to transform an terminal or variable. Each variable represents a different
English language sentence into equivalent PSL sentence. type of phrase or clause in the sentence.
Cogn Comput (2020) 12:748–765 757
2. ∑ is a finite set of terminals, disjoint from V, which makes of POS tags, PSL tree formation using appropriate
up the actual contents of the sentence. The set of terminals grammar productions, and PSL sentence generation, as
is the alphabet of the language defined by the grammar G. illustrated in Fig. 4.
3. R is a finite relation from V to (V U ∑)* where the aster- English sentence is an input to the translation module,
isk represents the Kleene Star operation. The members of and an equivalent sentence in PSL is the output. The
R are called the (rewrite) rules or productions of the whole process is shown in Fig. 5. In the first stage, each
grammar. word in the sentence is tagged into POS using Stanford
4. S is the start variable (or start symbol), used to represent parser and Penn treebank [39, 40]. The output is in string
the whole sentence (or program). It must be an element of format and is converted into a parse tree using the tree
V. generation algorithm. In the next step, a dependency tool
is used to identify the relationship between different
An excerpt from the CFG productions which are applied on words that constitute the sentence [41]. This dependency
an English sentence to produce an equivalent PSL tree has information is added in the parse tree and results in the
been presented in Table 9. The part of CFG presented in construction of the annotated parse tree. As mentioned in
Table 9 only handles affirmative sentences in the present, past, the analysis section, deaf people have very little knowl-
and future indefinite tenses of English language, along with edge of English grammar. Furthermore, it is extracted
different productions of subject, noun, verb, and adjectives. from the dataset that they use very few and fundamental
POS tags that are used in the grammar generated for PSL.
Therefore, as the next step mapping of Penn treebank tags
Machine Translation System Implementation into PSL tags is defined using the mappings shown in
Table 3, where the annotated parse tree nodes are
This section presents the implementation details of renamed using PSL POS tags. The next step in the trans-
English to PSL translation system using a grammar gen- lation pipeline is to classify the sentence based on tense
erated in the previous section. The system comprises of and meaning so that the appropriate rule from PSL gram-
two main components: a pre-processing component and mar can be selected to translate the given source sentence.
a translator, whereby the preprocessing component pro- In order to generate PSL tree, the English parse tree is
cesses the input English language sentence by converted into its equivalent PSL tree after applying the
performing POS tagging and dependency analysis, and CFG production.
finally identifies the sentence type, whereas the second During the transformation of English parse tree into PSL
component uses the grammar proposed in the previous tree, specific tree nodes of the English language are deleted,
section to convert the input sentence into equivalent some new nodes are added, and due to the differences in the
PSL sentence. This component involves the renaming word order of both the language the position of some nodes is
Dependency Analysis
appropriate rules of PSL grammar that have been applied, and only the proof of concept application, the proposed system
also the functions applied on the source tree. only plays the gestures of words that are already present in
the gesture data bank, while for the other words it uses finger
spellings. However, the system can be seamlessly enriched by
PSL Sentence Generation
extending the databank with more GIF images representing
more words.
The PSL tree shown in Fig. 11 is traversed using leave order
traversal, and PSL sentence is generated as the outcome of the
translation activity. The pseudo-code for this traversal is given
in Fig. 12. Evaluation and Results
Manual or User-Based Evaluation In such techniques, human was challenging to evaluate all 2000 sentences manually.
experts are involved for evaluations. The accuracy of the trans- Therefore, a subset comprising of nearly 500 sentences in-
lation output is evaluated based on the responses gathered by volving all types of sentences are manually assessed by two
these experts who rated the output on different scales based on deaf persons and two bilingual experts of PSL and English.
the syntactic and semantic understanding of the generated out- The evaluation results are divided into two groups valid and
put produced by the automatic machine translation system. invalid. Sentences that preserve the syntactic and semantic
structure of source language after translation are classified as
Testing Corpus Generation According to English taxonomy, valid sentences, whereas the ones which are not able to pre-
the sentences are categorised on the bases of tense, meaning, serve the meaning of source language sentence were listed as
and structure. From contextual inquiry, it was observed that invalid. Out of 500 sentences, 476 sentences listed as valid by
there are fourteen different types of sentences on the basis of both the evaluators, which makes the overall accuracy of the
tenses, three with respect to meanings, and four based on translation system about 95%. The 24 invalid sentences be-
structural variants. The testing corpus developed in this re- long to complex and complex compound categories. The con-
search consists of a total of 2000 sentences covering all types sistency of evaluations among different evaluators was very
of English sentences, as shown in Table 11. high with Kappa statistic of about 0.96.
Apart from different categories of sentences, we also tested
Results of Experiments the general applicability of our system on the story of a
“thirsty crow”, which is a famous story in early school educa-
Manual Evaluation In the first stage, experts assessed the sys- tion. We enriched the databank to incorporate the GIF images
tem output manually to determine the system’s accuracy. It for all the words included in this story. The deaf students were
Present continuous affirmative Present Continuous Affirmative Present Continuous Affirmative Add a child node in the tree having value “now’
>SUBJ VP >SUBJ VP
“now”
SUBJ SUBJ No Change
>PRPS >PRPS
NOUN NOUN
VP ➔ AUX VP ➔ VP Remove Aux node from parent VP
VP
VP ➔ VERB OBJ VP ➔ OBJ
VERB
OBJ ➔ DT ADJ NOUN OBJ ➔ ADJ NOUN Remove DT child Node
762 Cogn Comput (2020) 12:748–765
Conclusion and Future Directions representation of a gesture that can be read by machine
and can further be converted into sign language gestures.
This research presents a grammar-based machine translation This will help to extend our system so that it can convert
model for translating English language sentences into equiv- the translated PSL sentence into an animated avatar that
alent PSL sentences. PSL is used by the deaf community of expresses word by word gestures of the translated sen-
Pakistan, which has a different sentence structure, word order, tence. Lastly, there is a need to extend the translation
and lexicon than English or Urdu. The contribution of this system to incorporate the pointing of pronouns in sign
work is threefold. (1) A sentence level corpus is developed, space [29, 30] by enriching sentence meta-information to
which consists of 2000 sentences covering all English lan- ensure more accurate translation.
guage sentence variations. Every English language sentence
in this corpus is manually translated by involving native Compliance with Ethical Standards
signers into PSL. This database has been made freely available
for researchers. (2) The grammatical structure of PSL is for- Conflict of Interest The authors declare that they have no conflict of
interest.
mally transcribed. To this end, a systematic approach has been
adopted involving data collection, analysis, and grammar
Ethical approval This article does not contain any studies with human
rules generation, to define the grammar for PSL using CFG participants or animals performed by any of the authors.
representation, which helps to textually represent PSL
sentences. (3) A rule-based machine translation model is pro-
posed to translate English text into PSL. The input English
sentence to be translated into PSL is morphologically, syntac- References
tically, and semantically analysed. The outputs of the machine
translation system result in sentences that satisfy the structure 1. Sofiane Boucenna, Antonio Narzisi, Elodie Tilmont, Filippo
and grammar of PSL. Finally, the translation system has been Muratori, Giovanni Pioggia, David Cohen, Mohamed Chetouani
(2014) Interactive technologies for autistic children: a review, cog-
evaluated by two domain experts and deaf subjects. The eval- nitive computation, Volume 6, Number 4, Page 722
uation reveals that the system translates with a 0.78 BLEU 2. Khan NS, et al. Speak Pakistan: challenges in developing Pakistan
score. A qualitative evaluation reveals that the developed Sign Language using information technology. South Asian Studies.
translation system can help compensate for the cognitive hear- 2015;30(2):367.
ing deficit of deaf people to understand English language. 3. Luqman H, Mahmoud SA. Transform-based Arabic sign language
recognition. Procedia Computer Science. 2017;117:2–9.
In the future, we intend to improve the accuracy of the 4. Nießen, S., Och, F.J., Leusch, G., Ney, H., Informatik, L.F.: An
translation system by covering compound and compound- evaluation tool for machine translation: fast evaluation for MT re-
complex sentences which the current system is unable to search. In: In Proceedings of the 2nd International Conference on
translate correctly. Furthermore, such a system could gen- Language Resources and Evaluation (LREC-2000) (2000).
erate more data to enable development of a natural to sign 5. Abid, K., Khan, N. S., Farooq, U., Farooq, M. S., Naeem, M. A., &
Abid, A. (2018). A roadmap to elevate Pakistan Sign Language
language translation system with increased generalisation among regional sign languages. South Asian Studies (1026-
capability, using state-of-the-art deep learning approaches 678X), 33(2).
(e.g. [46]). We also intend to design a parallel corpus for 6. Mohandes M, Deriche M, Liu J. Image-based and sensor-based
PSL gestures, since current corpora only contain words approaches to Arabic sign language recognition. IEEE transactions
on human-machine systems. 2014;44(4):551–7.
and their gestures. On the other hand, we aim to use
7. Zhao L, et al. A machine translation system from English to
signwriting notations along with videos, which are textual American Sign Language. Conference of the Association for
764 Cogn Comput (2020) 12:748–765
Machine Translation in the Americas. Berlin, Heidelberg: Springer; 26. Verma, V. K., & Srivastava, S. (2018). Toward machine translation
2000. linguistic issues of Indian Sign Language. In Speech and language
8. Bonham, M.E.: English to ASL Gloss Machine Translation. M. Art processing for human-machine communications (pp. 129-135).
thesis, Brigham Young University (2015). Springer, Singapore
9. Wang P, Song Q, Han H, Cheng J. Sequentially supervised long 27. Yasir, F., Prasad, P. W. C., Alsadoon, A., Elchouemi, A., &
short-term memory for gesture recognition. Cogn Comput. Sreedharan, S. (2017). Bangla Sign Language recognition using
2016;8(5):982. convolutional neural network. In Intelligent computing, instrumen-
10. Kröger BJ, Birkholz P, Kannampuzha J, Kaufmann E, Mittelberg I. tation and control technologies (ICICICT), 2017 International
Movements and holds in fluent sentence production of American Conference on (pp. 49-53). IEEE.
Sign Language: the action-based approach. Cogn Comput. 28. Tumsri, J., & Kimpan, W. (2017). Thai sign language translation
2011;3(3):449. using leap motion controller. In Proceedings of The International
11. Marshall, Ian, and Éva Sáfár. A prototype text to British Sign MultiConference of Engineers and Computer Scientists 2017 (pp.
Language (BSL) translation system. Proceedings of the 41st 46-51).
Annual Meeting on Association for Computational Linguistics- 29. Zeshan, U. (2000). Sign language in Indo-Pakistan: a description of
Volume 2. Association for Computational Linguistics, 2003. a signed language. John Benjamins Publishing.
12. San Segundo Hernández, R., Lopez Ludeña, V., Martin Maganto, 30. Zeshan U. Indo-Pakistani Sign Language grammar: a typological
R., Sánchez, D., & García, A. (2010). language resources for outline. Sign Language Studies. 2003;3:157–212.
Spanish-Spanish Sign Language (LSE) translation. 31. Abbas, A., & Sarfraz, S. (2018). Developing a prototype to translate
13. Porta J, López-Colino F, Tejedor J, Colás J. A rule-based translation text and speech to Pakistan Sign Language with bilingual subtitles:
from written Spanish to Spanish Sign Language glosses. Comput a framework. Journal of Educational Technology Systems
Speech Lang. 2014;28(3):788–811. 32. Khan N, Shahzada A, Ata S, Abid A, Khan Y, ShoaibFarooq M. A
14. Van Zijl, Lynette, and Andries Combrink. The South African sign vision based approach for Pakistan Sign Language alphabets rec-
language machine translation project: issues on non-manual sign ognition. Pensee. 2014;76(3).
generation. Proceedings of the 2006 annual research conference
33. Hassan B, Farooq MS, Abid A, Sabir N. Pakistan Sign Language:
of the South African institute of computer scientists and information
computer vision analysis & recommendations. VFAST
technologists on IT research in developing countries. South African
Transactions on Software Engineering. 2015;9(1):1–6.
Institute for Computer Scientists and Information Technologists,
34. Othman A, Jemni M. Designing high accuracy statistical machine
2006.
translation for sign language using parallel corpus: case study
15. Shoaib, Umar, et al. Integrating multiwordnet with Italian sign lan-
English and American Sign Language. Journal of Information
guage lexical resources. Expert Systems with Applications41.5
Technology Research (JITR). 2019;12(2):134–58.
(2014): 2300-2308.
35. Stoll, Stephanie, Necati Cihan Camgoz, Simon Hadfield, and
16. Bungeroth, J., & Ney, H. (2004). Statistical sign language transla-
Richard Bowden. Text2Sign: towards sign language production
tion. In Workshop on representation and processing of sign lan-
using neural machine translation and generative adversarial net-
guages, LREC (Vol. 4, pp. 105-108).
works. International Journal of Computer Vision (2020): 1-18.
17. Cox S, Lincoln M, Tryggvason J, Nakisa M, Wells M, Tutt M, et al.
The development and evaluation of a speech-to- sign translation 36. Bragg, D., Koller, O., Bellard, M., Berke, L., Boudreault, P.,
system to assist transactions. Int J Hum Comput Interact. Braffort, A., ... & Vogler, C. (2019). Sign language recognition,
2003;16(2):141–61. generation, and translation: an interdisciplinary perspective. In The
21st International ACM SIGACCESS Conference on Computers
18. Veale T, Conway A, Collins B. The challenges of cross-modal
translation: English-to-sign-language translation in the zardoz sys- and Accessibility (pp. 16-31).
tem. Mach Transl. 1998;13(1):81–106. 37. Shaalan K. Rule-based approach in Arabic natural language pro-
19. d’Armond, L.S.: Representation of American sign language for cessing. The International Journal on Information and
machine translation, Ph.D. thesis, Georgetown University (2002). Communication Technologies (IJICT). 2010;3(3):11–9.
20. Marshall, I., Sáfár, É.: Extraction of semantic representations from 38. Filhol M, Hadjadj MN, Testu B. A rule triggering system for auto-
syntactic SMU link grammar linkages. In: Proceedings of Recent matic text-to-sign translation. Univ Access Inf Soc. 2016;15(4):
Advances in Natural Language Processing, pp. 154–159 (2001) 487–98.
21. San-Segundo R, Montero JM, Macías-Guarasa J, Córdoba R, 39. Toutanova, K., Klein, D., Manning, C. D., & Singer, Y. (2003).
Ferreiros J, Pardo JM. Proposing a speech to gesture translation Feature-rich part-of-speech tagging with a cyclic dependency net-
architecture for Spanish deaf people. J Vis Lang Comput. work. In Proceedings of the 2003 conference of the North American
2008;19(5):523–38. chapter of the association for computational linguistics on human
22. Othman A, Jemni M. Statistical sign language machine translation: language technology-volume 1 (pp. 173-180). Association for
from English written text to American sign language gloss. Int J Computational Linguistics .
Comput Sci Issues. 2011;8(5):65–73. 40. Santorini, B. (1990). Part-of-speech tagging guidelines for the Penn
23. Kouremenos D, Ntalianis K, Kollias S. A novel rule based machine Treebank Project
translation scheme from Greek to Greek Sign Language: production 41. De Marneffe, M. C., & Manning, C. D. (2008). Stanford typed
of different types of large corpora and Language Models evaluation. dependencies manual (pp. 338-345). Technical report, Stanford
Comput Speech Lang. 2018;51:110–35. University.
24. Luqman H, Mahmoud SA. Automatic translation of Arabic text-to- 42. Hadla LS, Hailat TM, Al-Kabi MN. Evaluating Arabic to English
Arabic sign language. Univ Access Inf Soc. 2018:1–13. machine translation. Int J Adv Comput Sci Appl (IJACSA).
25. Nguyen TBD, Phung TN, Vu TT. A rule-based method for text 2014;5(11):68–73.
shortening in Vietnamese Sign Language translation. In 43. Gonàlez, M., Giménez, J., Màrquez, L.: A graphical interface for
Information systems design and intelligent applications. MT evaluation and error analysis. In: The 50th Annual Meeting of
Singapore: Springer; 2018. p. 655–62. the Association for Computational Linguistics (2012).
Cogn Comput (2020) 12:748–765 765
44. Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: Bleu: a method for 46. Mahmud M, Kaiser MS, Hussain A, Vassanelli S. Applications of
automatic evaluation of machine translation. In: Proceedings of the deep learning and reinforcement learning to biological data. IEEE
40th annual meeting on association for computational lin- guistics, Transactions in Neural Networks and Learning Systems.
Association for Computational Linguistics, pp. 311–318 (2002). 2018;29(6):2063–79.
45. Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A
study of translation edit rate with targeted human annotation. In: Publisher’s Note Springer Nature remains neutral with regard to jurisdic-
Proceedings of association for machine translation in the Ameri- tional claims in published maps and institutional affiliations.
cas, vol. 200 (2006).