Behera PDF

ANALYSIS OF SANSKRIT TEXT: PARSING AND SEMANTIC
RELATIONS
Pawan Goyal Vipul Arora Laxmidhar Behera

Electrical Engineering, Electrical Engineering, Electrical Engineering,
IIT Kanpur, IIT Kanpur, IIT Kanpur,
208016, UP, 208016, UP, 208016, UP,
India India India
pawangee@iitk.ac.in vipular@iitk.ac.in lbehera@iitk.ac.in
Abstract other machine learning tools. These models,

in turn, lend themselves to a small number
In this paper, we are presenting our work of algorithms from well-known computational
towards building a dependency parser for paradigms. Among the most important of these
Sanskrit language that uses determinis- are state space search algorithms, (Bonet, 2001)
tic finite automata(DFA) for morpholog- and dynamic programming algorithms (Ferro,
ical analysis and ’utsarga apavaada’ ap- 1998). The need for unambiguous representation
proach for relation analysis. A computa- has lead to a great effort in stochastic parsing
tional grammar based on the framework (Ivanov, 2000).
of Panini is being developed. A linguis-
tic generalization for Verbal and Nomi- Most of the research work has been done for
nal database has been made and declen- English sentences but to transmit the ideas with
sions are given the form of DFA. Verbal great precision and mathematical rigor, we need a
database for all the class of verbs have language that incorporates the features of artificial
been completed for this part. Given a intelligence. Briggs (Briggs,1985) demonstrated
Sanskrit text, the parser identifies the root in his article the salient features of Sanskrit
words and gives the dependency relations language that can make it serve as an Artificial
based on semantic constraints. The pro- language. Although computational processing
posed Sanskrit parser is able to create of Sanskrit language has been reported in the
semantic nets for many classes of San-
skrit paragraphs(

). The parser is
literature (Huet, 2005) with some computational
toolkits (Huet, 2002), and there is work going
taking care of both external and internal on towards developing mathematical model and
sandhi in the Sanskrit words. dependency grammar of Sanskrit(Huet, 2006), the
proposed Sanskrit parser is being developed for
1 INTRODUCTION using Sanskrit language as Indian networking lan-
Parsing is the ”de-linearization” of linguistic in- guage (INL). The utility of advanced techniques
put; that is, the use of grammatical rules and other such as stochastic parsing and machine learning
knowledge sources to determine the functions of in designing a Sanskrit parser need to be verified.
words in the input sentence. Getting an efficient
and unambiguous parse of natural languages has We have used deterministic finite automata
been a subject of wide interest in the field of for morphological analysis. We have identified
artificial intelligence over past 50 years. Instead the basic linguistic framework which shall facili-
of providing substantial amount of information tate the effective emergence of Sanskrit as INL. To
manually, there has been a shift towards using achieve this goal, a computational grammar has
Machine Learning algorithms in every possible been developed for the processing of Sanskrit lan-
NLP task. Among the most important elements guage. Sanskrit has a rich system of inflectional
in this toolkit are state machines, formal rule endings (vibhakti). The computational grammar
systems, logic, as well as probability theory and described here takes the concept of vibhakti and
karaka relations from Panini framework and uses Relation >.
them to get an efficient parse for Sanskrit Text. The structure contains the root word (<Base>)
The grammar is written in ’utsarga apavaada’ ap- and its form <attributes of word> and relation
proach i.e rules are arranged in several layers each with the verb/action or subject of that sentence.
layer forming the exception of previous one. We This analogy is done so as to completely disam-
are working towards encoding Paninian grammar biguate the meaning of word in the context.
to get a robust analysis of Sanskrit sentence. The
paninian framework has been successfully applied 2.1 <Word>
to Indian languages for dependency grammars Given a sentence, the parser identifies a singular
(Sangal, 1993), where constraint based parsing is word and processes it using the guidelines laid out
used and mapping between karaka and vibhakti in this section. If it is a compound word, then the
is via a TAM (tense, aspect, modality) tabel. We
have made rules from Panini grammar for the example:

!" #
compound word with
=
has to be undone. For
+ .
mapping. Also, finite state automata is used for
the analysis instead of finite state transducers.
The problem is that the Paninian grammar is 2.2 <Base>
generative and it is just not straight forward to The base is the original, uninflected form of the
invert the grammar to get a Sanskrit analyzer, i.e. word. Finite verb forms, other simple words and
its difficult to rely just on Panini sutras to build compound words are each indicated differently.
the analyzer. There will be lot of ambiguities For Simple words: The computer activates the
(due to options given in Panini sutras, as well DFA on the ISCII code (ISCII,1999) of the San-
as a single word having multiple analysis). We skrit text. For compound words: The computer
need therefore a hybrid scheme which should shows the nesting of internal and external
take some statistical methods for the analysis of using nested parentheses. Undo $ %
changes be-
sentence. Probabilistic approach is currently not tween the component words.
integrated within the parser since we don’t have
a Sanskrit corpus to work with, but we hope that 2.3 <Form>
in very near future, we will be able to apply the The <Form> of a word contains the information
statistical methods. regarding declensions for nominals and state for
verbs.
The paper is arranged as follows. Section 2
explains in a nutshell the computational process- • For undeclined words, just write u in this col-
ing of any Sanskrit corpus. We have codified the umn.
Nominal and Verb forms in Sanskrit in a directly
• For nouns, write first.m, f or n to indicate the
computable form by the computer. Our algorithm
gender, followed by a number for the case (1
for processing these texts and preparing Sanskrit
through 7, or 8 for vocative), and s, d or p to
lexicon databases are presented in section 3. The
indicate singular, dual or plural.
complete parser has been described in section
4. We have discussed here how we are going • For adjectives and pronouns, write first a, fol-
to do morphological analysis and hence relation lowed by the indications, as for nouns, of
analysis. Results have been enumerated in section gender (skipping this for pronouns unmarked
5. Discussion, conclusions and future work follow for gender), case and number.
in section 6.
• For verbs, in one column indicate the class
&
( ) and voice. Show the class by a num-
2 A STANDARD METHOD FOR
ANALYZING SANSKRIT TEXT ber from 1 to 11. Follow this (in the same
column) by ’1’ for parasmaipada, ’2’ for
The basic framework for analyzing the Sanskrit ätmanepada and ’3’ for ubhayapada. For fi-
corpus is discussed in this section. For every nite verb forms, give the root. Then (in the
word in a given sentence, machine/computer is same column) show the tense as given in Ta-
supposed to identify the word in following struc- ble 3. Then show the inflection in the same
ture. < W ord >< Base >< F orm >< column, if there is one. For finite forms, show
Table 1: Codes for Table 4: Codes for <Relation>
<Form>
Table 3: Codes for
pa/ passive Finite verb Forms, v main verb
ca/ causative showing the Tense vs subordinate verb
de/ desiderative s subject(of the sentence or a subordinate clause)
fr/ frequentative o object(of a verb or preposition)
pr present g destination(gati) of a verb of motion
if imperfect a Adjective
Table 2: Codes for Fi- iv imperative n Noun modifying another in apposition
nite Forms, showing the op optative d predicate nominative
Person and the Number ao aorist m other modifier
1 '( - .*/) ) 0++, , pe
fu
perfect
future
p Preposition
2 c Conjunction
3 132 *) +, f2
be
second future
benedictive
u vocative, with no syntactic connection
s singular q quoted sentence or phrase
co conditional r definition of a word or phrase(in a commentary)
d dual
p plural
have arrived at the skeletal base upon which many

the person and number with the codes given
different modules for Sanskrit linguistic analysis
in Table 2. For participles, show the case and
number as for nouns.
such as: relations, ,$ %
can be worked
out.
2.4 <Relation>
The relation between the different words in a 3.1 Sanskrit Rule Database
sentence is worked out using the information Every natural language must have a representa-
obtained from the analysis done using the guide- tion, which is directly computable. To achieve
lines laid out in the previous subsections. First this we have encoded the grammatical rules
write down a period in this column followed by and designed the syntactic structure for both the
a number indicating the order of the word in the nominal and verbal words in Sanskrit. Let us
sentence. The words in each sentence should illustrate this structure for both the nouns and the
be numbered sequentially, even when a sentence verbs with an example each .
ends before the end of a text or extends over
more than one text. Then, in the same column, Noun:-Any noun has three genders: Mas-
indicate the kind of connection the word has to culine,Feminine and Neuter. So also the noun
the sentence, using the codes given in table 4. has three numbers: Singular, Dual and Plural.
Again there exists eight classification in each
Then, in the same column, give the number number: Nominative, Accusative, Imperative,
of the other word in the sentence to which this Dative, Ablative, Genitive, Locative and Vocative.
word is connected as modifier or otherwise. The Interestingly these express nearly all the relations
relation set given above is not exhaustive. All the between words in a sentence .
6 karakas are defined as in relation to the verb.
In Sanskrit language, every noun is deflected
following a general rule based on the ending al-
phabet such as
#4567
. For example, is in
68
3 ALGORITHM FOR SANSKRIT
class
4568 which ends with (a). Such clas-
RULEBASE
sifications are given in Table 5. Each of these have
In the section to follow in this paper, we shall different inflections depending upon which gender
explain two of the procedures/algorithms that we they correspond to. Thus
#4567
has different
have developed for the computational analysis of masculine and neuter declensions, has
4568
Sanskrit. Combined with these algorithms, we masculine and feminine declensions, has 9 4$:67
masculine, feminine and neuter declensions. We forms arising from different declensions of the
have then encoded each of the declensions into masculine and feminine form. We have codified
ISCII code, so that it can be easily computable the pronouns in a form similar to that of nouns .
in the computer using the algorithm that we have
developed for the linguistic analysis of any word . Adjectives:- Adjectives are dealt in the same
manner as nouns. The repetition of the linguistic
morphology is avoided .
Table 5: attributes of the declension for noun
Verbs:- A Verb in a sentence in Sanskrit
=; <?>A@>CBED F <?<>C>C@@>C>ABEBEDD (14) GHI>
?< Case
η
Class∗ Genderζ
;=>C<?>C@>CBED
(1)
<?P I (2)(1) J K"LCMN
(1) expresses an action that is enhanced by a set of
V"\ <?>C@>CBED O Q RTS7< LCUN
W <?<?>A>C@@>C>CBEBED D (16) <?@X (3)
(2) (15) (2) auxiliaries”; these auxiliaries being the nominals
<?>C@>CBED W0J KZY [ LCUN
[^; ]`_a> F >> W (4)
(3) (3)
W <?>C@>CBED (17 ) that have been discussed previously .
bc<>C@>ABED
(4)
J
1
J F W (5) d <?eleaf f W
N umber @
g<?>A@>CBED
(5)
h@<<>C>C@@>A>ABEBED D (19)
(18)
[; ^]ji <?B @O X (6)
(1)
LAk W
mn<?>C@>CBED
(6)
LoO (7) iaK p eaf W
(2) The meaning of the verb is said to be both
<?>C@>CBED
(7)
ea<?>C@>CBED (20) (3) vyapara (action, activity, cause), and phala (fruit,
; dq r <?>C@>CBED
(8)
t <?<?>C>C@@>C>CBEBED D (22)
(21) [^]`iCrOsW (8) result, effect). Syntactically, its meaning is in-
;"ua<?>A@>CBED (9)
v variably linked with the meaning of the verb ”to
f^<?>C@>CBED (10)
[ <?<?>C>C@@>C>CBEBED D (24)
(23)
w=<?>A@>CBED (11)
p do”. In our analysis of Verbs, we have found that
&
D0<?>C@>CBED
(12)
(13)
(25)
they are classified into 11 classes( , Table 7).
While coding the endings, each class is subdivided
according to ” ” knowledge,
y 9
, and
Let us illustrate this structure for the noun
#4$:67 ; each of which is again sub-classified as into 3
?)?
)?68 ?)?
with an example . For , masculine, sub-classes as , and , 13~ .)
nominative, singular declension: which we have denoted as pada. Each verb sub-
class again has 10 lakaaras , which is used to ex-
This is encoded in the following syntax: press the tense of the action. Again, depending
(163{1∗ , 1η , 1ζ , 1@ }) . upon the form of the sentence, again a division
of form as ,
4 2 `| y.
and has
4$E|jy?. ~ yy?.
Where 163 is the ISCII code of the declension been done. This classification has been referred
(Table 6). The four 1’s in the curly brackets repre- to as voice. This structure has been explained in
sent Class, Case, Gender and Number respectively Table 7.
(Table 5) .
Table 7: attributes of the declension for verb
Table 6: Noun example

45y?z jeaClass
> X [; ql (1) ; >A@ EP P WqaJaF (1)
γ η
∗
it pada T enseζ
Singular(x ) ; F > LCLCFF X (1)(2) U
(1)
Masculine
Endings ISCII Code
LCF eaea> > LCF X X (3) eqlLCW0 (3) (2) J b h7Q 8 JaJlF F (3)(2) U N
(2)
U
(3)
Nominative { 163
D QK"F > LCLCFF X (4) Ue %ra (4)
0c> LCF X (5) ;=L > LotO0S I LCLCUUN N
(5)
Pronouns:-According to Paninian grammar D W > LCF X (6) LCU

(6)
> LAF X (7)(8) (7)
UK
and Kale, (Kale) Sanskrit has 35 pronouns which
y| y?} . f @> X (8)
are:
. . H6 9 136 ~ 13y~ y
, , , , , , 9 6? 9 .H w K p r K LC> FLCF X (9)(10) UN
U K"N
(9)
, , , , , , , <?j ea> LCF X (11) (10)
.y ) yT| 6 )?6 .

y6

H&

, , , , , , , 1
)?6 46 <?G VII eaoice
>C (1) P?_ P P J K v (1) d ?< le eaf f W (1)
λ @ δ
P erson N umber
,
.
,
`

y x 45 9
, , , , , x <P ea>C (2)

, , ~
and . h >Ceael>C (3) bcG P J K J K v v (3)(2) LAk W (2)
iaK p elf W (3)
We have classified each of these pronouns into
9 classes: Personal, Demonstrative, Relative, In- .
definitive, Correlative, Reciprocal and Possessive. Let us express the structure via an example for
Each of these pronouns have different inflectional
y?
8&
,
)?67 ?)
, Present Tense, First person,

Singular. This is encoded in the following syntax: ISCII code 161 + i. Thus, if we define a DFA =
(219(194{1∗ , 1γ , 2η , 1ζ , 1λ , 1@ , 1δ })). M (Q, S, d, q0, F ) for our Sanskrit Rule Database,
Where 219194 is the ISCII code of the endings each of the DFA entities are as follows:
(Table 8). The numbers in curly brackets represent
class, ”it”, pada, tense, voice, person and number • Q = {q0 , qC0 , qC1 , . . . , qC73 } × {0, 1}. 0 rep-
respectively (Table 7). resents that the state is not a final state and 1
tells that the state is a final state.
P
• = {C0, C1, . . . , C73}
Table 8: Verb example
PRESENT
Singular( x 45y?z
)
• δ((qx , a), Y )
{0, 1}
= δ(qY , a)or δ(qY , b) a,b
Endings ISCII Code
First 219194 • q0 =< q0 , 0 >
• F ⊂ {qC0 , qC1 , . . . , qC73 } × {1}
Separate database files for nominals and verbs In this work, we have made our DFA in a matrix
have been maintained, which can be populated as form with each row representing the behavior of
more and more Sanskrit corpsuses are mined for a particular state. In a given row, there are 74
data. The Sanskrit rule base is prepared using the columns and entries in a particular column of the
”Sanskrit Database Maker” developed during this corresponding row store the state we will finally
work. move to on receiving the particular input corre-
sponding to the column. In addition, each row
3.2 Deterministic Finite Automata: Sanskrit carries the information whether or not it is a final
Rule Base state.
We have used deterministic finite automata (DFA) For example: D[32][5] = 36 conveys that in the
(Hopcraft, 2002) to compute the Sanskrit rule DFA matrix D[i][j], in 32nd state, if input is C5,
base, which we developed as described in section we will move to state no. 36 .(To be noted: C5
III A. Before we explain the DFA, let us define it. is the character corresponding to the ISCII code
A deterministic finite automaton consists of: 166.).
In the graph below, we are giving an example
1. A finite set of states, often denoted Q. how the DFA will look as a tree structure. The
particular graph is constructed for the verb declen-
2. A finite set of input symbols, often denoted
sions for the class
y
&
. The pada is
)67 ?)?
S.
and the tense is present tense. The search in this
3. A transition function that takes as arguments DFA will be as follows:- If the first ending of the
a state and input symbol and returns a state, input corresponds to one of the state 163, 195 or
often commonly denoted d. 219, we will move ahead in the DFA otherwise
the input is not found in this tree. On getting a
4. A start state, one of the states in Q, denoted match, the search will continue in the matched
q0. branch. .
5. A set of final or accepting states F. The set
In general, the search in the DFA is done as fol-
~ y (¡{
F is a subset of Q.
lows (We take the example of searching for
Thus, we can define a DFA in this ”five-tuple” in the DFA tree constructed above:-
notation: A = (Q, S, d, q0, F ). With this short
discussion of the DFA, we shall proceed to the • Firstly, an input word is given as the input to
DFA structure for our Sanskrit Rule Base. Since the user interface in Devanagari format .
we are representing any word by ISCII codes that • The word is changed to its equivalent ISCII
range from 161 to 234, we have effectively 74 in- code (203212195163 in this case).
put states. In the notation given below, we are rep-
resenting the character set by {C0, C1, . . . , C73}, • The automaton reads the forms in the reverse
where Ci is the character corresponding to the order to lemmatize them. In our DFA, we
We can verify it from the graph given.
232 219194232
198 • Remaining part of the word is sent to
219194 database engine of program to verify and to
194 219194232198 get attributes. The word corresponding to the
stem, Devanagari equivalent of 203212, that
219
204
219204 218 is ~ y
) will be sent to database.
215 219204218 • If both criteria are fulfilled (final state match

219 and stem match through database), we will
219215
get the root word and its category (verb in
this case). The attributes such as tense, form,
218 voice, class, pada, person, number are coded
START 163212 163212218 FINAL
in the final state itself according to the nota-
195
tions given in table 5 and 7. All the possible
attributes are stored and it is left up to the fi-
212 195 nal algorithm to come up with the most ap-
163
218 propriate solution.
204 163204 163204218
163 Let me just explain how we have obtained the
deterministic finite automata. Clearly, the states
195 195
are obtained via input symbols. Ambiguity re-
194 mains in {0, 1}. If the state is not a final state at
all, it is declared as intermediate state without any
194
ambiguity to be considered for non-deterministic.
y
& When the state is a final state, for example con-

Figure 1: DFA tree obtained for
)?68 ?)?
sider and `.
. When we encounter
present tense.
in

, we get the root as

. (Of course,
we have to add the
" ¢^£n
and go through
y?
¤
`.
give one by one the last three digits of the getting as root verb.) But in , is
ISCII code till the matching is there. not a final state. It seems at this point that we could
have obtained a non-deterministic finite automa-
– Start state: 000
ton. We have resolved the problem by accepting
– input to DFA: 163, i.e character C2. the following facts:
– In the DFA matrix we will check the en-
try D[0][2]. If it is zero, no match is 1. Final state can be intermediate state too but
there for this entry and hence no match not the other way round.
either for the word. Else we will move
2. Our algorithm doesn’t stop just as it gets to
to the state specified by the entry.
a final state, it goes to the highest possible
– In this case, we get the entry corre- match, checks it for being final state and, in
sponding to state ”163”. That means it case it isn’t, it backtracks and stops at the op-
is either an intermediate state or a final timal match which satisfies the two criteria
state. From the graph, it is visible that as told in the algorithm (final state match and
the tree accepts 163 just after the start stem match through database).
state. Also, it is not a final state. Now
we will have 195 (i.e. C34) as next input There might be another ambiguity too, for exam-
and 34th column of the row correspond- ple, in
67." #.
, is a final state but it
ing to state 232 will be checked and the refers to
4567& ¥'
8
, , and
#)?

karaka. This
search continues till no match. seems to be non-deterministic. We have avoided
– Final match will be 163195. this problem by suitably defining the states. Fi-
nal state represents all possibilities merged in a
• The final match will be checked for being eli- single state. It is up to the algorithm to come up
gible for a final state which is true in this case. with the unique solution. There could be situation
where longest match is not the right assignment. 2. Second, each word is checked against the
To deal with this, all other possible solutions are Sanskrit rules base represented by the DFA
also stacked and are substituted (if needed) when trees in the following precedence order: Each
we go for relation analysis. For example, let us word is checked first against the avavya
take the word . .:"
. We assume that
, database, next in pronoun, then verb and
: ~
and are valid root words. Our al- lastly in the noun tree.
gorithm will choose H
as root word along with
the attributes (3 possibilities here). But the other The reason for such a precedence ordering is pri-
solutions are also stacked in decreasing order of marily due to the fact that avavya and pronouns
the match found. It is discussed in the relation are limited in number compared to the verbs, and
analysis, how we deal with this situation. verbs are in-turn limited compared to the infinite
number of nouns that exist in Sanskrit.
4 ALGORITHM FOR SANSKRIT
4.1.1 Sandhi Module
PARSER
In the analysis, we have done, the main prob-
The parser takes as input a Sanskrit sentence and lem was with words having external sandhi. Un-
using the Sanskrit Rule base from the DFA Ana- less we are able to decompose the word into its
lyzer, analyzes each word of the sentence and re- constituents, we are unable to get the morph of the
turns the base form of each word along with their word. So, a rulebase sandhi analyzer is developed
attributes. This information is analyzed to get re- which works on the following principles.
lations among the words in the sentence using If-
Then rules and then output a complete dependency • Given an input word, it checks at each junc-
parse. The parser incorporates Panini framework tion for the possibility of sandhi.
of dependency structure. Due to rich case endings
• If it finds the junction, it breaks the word into
of Sanskrit words, we are using morphological an-
possible parts and sends the first part in the
alyzer. To demonstrate the Morphological Ana-
DFA.
lyzer that we have designed for subsequent San-
skrit sentence parsing, the following resources are – If it finds a match, it sends the second
built: part in DFA.
∗ If no match, it recursively calls the
• Nominals rule database (contains entries for
sandhi module (For the possibility of
nouns and pronouns declensions)
multiple sandhi in a single word).
• Verb rule database (contains entries for 10 ∗ If match is found, terminates and re-
classes of verbs) turns the words.
– If no match, it goes to the next junction.
• Particle database (contains word entries)
Now using these resources, the morphological The rules for decomposing the words are taken
analyzer, which parses the complete sentences of from Panini grammar. The search proceeds en-
the text is designed. tirely backwards on the syllabic string. Emphasis
is given on minimum possible breaks of the string,
4.1 Morphological Analysis avoiding overgeneration.
In this step, the Sanskrit sentence is taken as in- Panini grammar has separate sections for vowel
put in Devanagari format and converted into ISCII sandhi as well as consonant sandhi. Also, there
format. Each word is then analyzed using the DFA is specification of visarga sandhi. Below, we are
Tree that is returned by the above block. Follow- describing the simplified rules for undoing sandhi.
ing along any path from start to final of this DFA
Vowel Sandhi:- We have considered

¦E|
§ y!¨ ©*§ 0?) & 67«¬)§ª ) . yT& |`«¬) § #.
tree returns us the root word of the word that we

, , , and
' 4 ® ~ :y
wish to analyze, along with its attributes. While
evaluating the Sanskrit words in the sentence, we §
in vowels. ( , and
have followed these steps for computation: are not taken into account yet.)
1. First, a left-right parsing to separate out the 1.

¦E| ¯§
:- If the junction is the corre-
°
words in the sentence is done.
± ²
sponding to , , or , it is a candidate ³
for

¦E| §ª . The algorithm for an example Consonant Sandhi:- For dealing with consonant
word ~ !
8. is explained. sandhi, we have defined some groups taking clue
4 z )
from panini grammar such as , , , ,
• We assume that we don’t get any match
at the junction after .
# ~ each of which have 5 consonants which are similar
• The junction ²
is a candidate for

¦E| in the sense of place of pronunciation. Also, there
is a specific significance of first, second, third etc.
§ª 8
.
. So the following breaks are made:

1. +~ 1, 2. + ~ , 3. ²
8. letter of a specific string. The following ruleset is
~ :Z 1
.
+ , 4. ~ : Z ²
.
+ . For each
made:

break, the left hand word is first sent to • Define string s1, with first five entries of
DFA and only if it is a valid word, right
,
and 6th entry as . Also, define s2, with first
word will be sent. In this case, first so-

five entries of and 6th entry as . The rule
lution comes to be the correct one. says,
2.
y!¨ ©´§
:- In this case, the junction is x , µ .
The junction is a + halanta + c, and the
breakup will be b + halanta and c, where
The corresponding break-ups are: a, c s1, b s2 and the position of a and b are
• x x x
:- ( or ) + ( or ). same in the respective strings.
68`,?º {
0µ ¶ µ For example, in the word , the junc-
• :- ( or ) + ( or ). ,
tion is + halanta + . The break-up will
,
, 68
, º {
The algorithm remains the same as told in be, +halanta and . Hence we get +
previous case. .
3.
& §ª :- In this case, the junction is z»
x , 0¶ , #6o , £ . The corresponding break-
• Define string s1, with first five entries of
¤
and 6th entry as . Also, define s2, with first
ups are:
five entries of and 6th entry as . The rule

• x :- ( or ) + (9 or ± ). says,
•
0¶ :- ( or ) + (1 or ² ). The junction is a + halanta + c, and the
•
6o :- ( or ) + (· or ³ ). breakup will be b + halanta and c, where

• £ :- ( or ) + (¸ or ¹ ).
a, c s1, b s2 and the position of a and b are
same in the respective strings.
The algorithm follows the same guidelines. For example, in the word ¼¾½
, the junction
.& ¯§ ½
is + halanta + . ½ ½
is the third character

4. :- In this case, the junction is a ha-

lanta followed by , , , . The corre-
. y 6 £ of string s1. The break-up will be, +halanta
½
and . Hence we get +

½ .
sponding break-ups are: ¦¶?, and #¦¶?,
•
.
halanta + :- ( or ) + . 9 ± • We have defined strings as
#¦¶, containing first two characters of
y 1
halanta + :- ( or ) + . ²
with
4 z , , , ) as well
• all the five strings ,
•
6 ¸
halanta + :- ( or ) + . ¹ ¤ ,
as , , .
¦¶?, contains all other conso-
• £
halanta + :- ( or ) + . · ³ nants and all the vowels. The rule says, if we
get a junction with a + halanta + c, where
The algorithm follows the same guidelines.
a,c
¦¶?,
, a will be changed to corresponding
#.
§ :- In this case, the junction is #¦¶?, while undoing the sandhi. Similarly,
5.
#. , #. , #yj , #yj followed by any other rules are made.
vowel. The corresponding break-ups are:
• The vowels are categorized into ¿ y and
¦E|
. y
• + vowel:-
is retained.)
x + vowel. (same vowel
and

¦E| ¿
categories.
contains ,
± ² 9 ³ , .1 If, the
contains ,
, ,
·
." where a
• + vowel:-
yj x0¶ + vowel. junction is a + + halanta + ,
¿ y
+ halanta
, the break-up will be: a +
• + vowel:-
yj
+ vowel.
0µ + vowel. and φ, where φ denotes null, i.e.

other is
+ vowel:- À67Á. breaks up
68Á.
•
removed. For example,
The algorithm follows the same guidelines. into and .
Visarga Sandhi:- We have looked at visarga sentences are to be dealt in the similar manner but
sandhi in a single word. The rules made are as the description will be given later.
follows:
• The junction is
¤
+ halanta + a
z» . The
1. If there is a single verb in the sentence, de-
clare it as the main verb.
break-up will be and a. {
• The junction is + halanta + a
, . The
2. If there are more than one verb,
break-up will be and a. { (a) The verbs having suffix
0 , Â , Ã .)
• The junction is
+ halanta + a
. The
are declared subverbs of the near-
est verb in the sentence having no such
break-up will be and a. { affix.
6
• The junction is + halanta + a consonant. (b) All other verbs are main verbs of the
The break-up will be and a. { sentence and relations for all other
6
• The junction is + halanta + a vowel. The
words are given in regard to the first
main verb.
break-up will be and a. {
4.2 Relation Analysis 3. For the nouns and pronouns, one state may
have many possibilities of the cases. These
With the root words and the attributes for each
ambiguities are to be resolved. The hand
word in hand for the previous step, we shall written rules for determining these ambigu-
now endeavor to compute the relations among the ities are as follows (Rules are written for
words in the sentence. Using these relation values nouns. Adjective precede nouns (May not
we can determine the structure of each of the sen-
precede too due to free word order nature.)
tences and thus derive the semantic net, which is
and hence get the same case as nouns. For
the ultimate representation of the meaning of the
pronouns, rules are same as that for nouns.):
sentence.
For computing the relations, we have employed (a) Nominative case: The assumption is that
a case-based approach i.e., nominals were classi- there is only one main subject in an ac-
fied as subject, object, instrument, recipient (ben- tive voice sentence. We proceed as fol-
eficiary), point of separation (apaadaana) and lo- lows:
cation, to the verb based on the value of the case • All the nouns having nominal case
attribute of the word, as explained under noun ex- as one of the attributes are listed.
ample in Section 3.1. (For example, Ä¡£ "
has both pos-
The Sanskrit language has a dependency gram- sibilities of being nominative or ac-
mar. Hence the karaka based approach is used to cusative case.)
obtain a dependency parse tree. There are reasons • All those connected by
z
are
for going for dependency parse: grouped together and others are kept
1. Sanskrit is free phrase order language. separate. We now match each group
Hence, we need the same parse for a sentence along the following lines:
irrespective of phrase order. – The number matches with that of
the verb(Singular/dual/plural).
2. Once the karaka relations are obtained, it is – The root word matches with the
very easy to get the actual thematic roles of person of the verb(i.e root word
the words in the sentence. ”
#

” for 3rd person, ” ”
. j

The problem comes when we have many possi- for 2nd person).
ble karakas for a given word. We need to disam- If ambiguity still remains, the one
having masculine/feminine as gen-
biguate between them. We have developed some
der is preferred for being in
4$|
If-Then rules for classifying the nouns, pronouns,
verb, sub-verbs and adjectives in the sentence. The karaka and declared as subject of the
rules are as follows: First we are looking at the main verb.
sentences having at least one main verb. Nominal In passive voice,
• Nominative case is related to main as ablative (Noun with genitive case
verb as an object. After grouping marker is not followed by a verb.)
and going through the match, the • If suffixes
H68)j )j
, are used in the
noun is declared as object of main sentence, the noun is declared as ab-
verb. lative.
(b) Accusative case: Assuming that the dis- • If
Ã .
,
® ¤
are following the
ambiguation for nominative case works noun, it is declared as genitive.
well, there is no disambiguation left for • Finally, we look for the disam-
this case. All those left with accusative biguating verbs as done in previous
case indeed belong to that. The noun is case.
declared as object to nearest sub-verb or (f) Genitive case: The ambiguity is there in
main verb. dual with respec to locative case. We
(c) Instrumental case: If the sentence is in have used that by default, it will be
passive voice, the noun is declared as genitive since we have not encountered
subject of the main verb. any noun with locative case and dual in
For active voice, ambiguity remains if number.
the number is dual. The folowing rules
(g) Locative case: The ambiguities are al-
are used:
ready resolved.
• We seek if the indeclinable such as
¢ 45 |`"
, , , follow the Only problematic case will be the situation
noun. In that case, noun is declared discussed in section 3.2 with the example of
as instrument. H:.
. If the algorithm is able to generate a
• If the noun is preceded by time or parse taking the longest possible match, we will
distance measure or is itself one of not go into stacked possibilities, but if the subject
these, it is declared as instrument. disagrres with the verb (blocking), or some other
For example ."
8. mismatch is found, we will have to go for stacked
6A¶ Å{ ½ {

, here ."
is the possibilities.
disambiguating feature. Thus, we have got the case markings. Relation for
• If Æ? 6 { 4$:& {
, are following noun, nominative and accusative case markings have al-
the noun is declared as instrumental. ready been defined. For other case markings,
(d) Dative case: For dative case, disam-
biguity is with respect to ablative case • Instrumental: related as an instrument to
main verb in certain cases (taken from
in terms of dual and plural nnumbers #ÊË- ..`
).
The disambiguating feature used here is
main verb. That is, there are certain • Dative: related as recipient to main verb in
verbs which prefer dative case and cer- certain cases, but also denotes the purpose.
tain verbs prefer ablative. For example:
• The verbs preferring dative case are • Ablative: related as separation point.
Ç
È ¢ ± j.| . )!® ¢
, , , , etc.
• The verbs preferring ablative case • Genitive: this is not considered as karaka
are ½ 0É y67 '
?y 6
, , , etc. since karaka has been defined as one which
Initially, we have populated the list us- takes role in getting the action done. Hence it
ing
#Ê- ..`
knowledge as well as is related to the word following it.
some grammar books but this has to be
• Locative: related as location to the main verb.
done statistically using corpus analysis.
(e) Ablative case: The ambiguity here is for Still, we have not given any relation to adjectives
certain nouns with the genitive case in and adverbs. For each adjective, we track the
singular person. The ambiguity resolu- noun it belongs to and give it the same attributes.
tion proceeds along the following lines: It is defined as adjective to the noun. The adverbs
• If the noun having ambiguity has are related to the verb it belongs as adverb.
a verb next to it, it will be taken
Based on these relations, we can obtain a se- • Particle (
#Öj.. ) database.
mantic net for the sentence with verb as the root
node and the links between all the nodes are
made corresponding to relations with the verb and Along with these databases, we have developed
interrelations obtained. some user interfaces (GUI) to extract information
Sanskrit has a large number of sentences which from them. For example, if we want to get the
are said to be nominal sentences, i.e. they don’t forms of a particular verb in a particular tense, we
take a verb. In Sanskrit, every simple sentence has can just open this GUI and give also obtained. the
a subject and a predicate. If the predicate is not a root word and tense information.
finite verb form, but a substantive agreeing with
5.2 Parser Outputs
the subject, the sentence is a nominal sentence. In
that case, the analysis that we have done above Currently, our parser is giving an efficient and ac-
seems not to be used as it is. But in Sanskrit, curate parse of Sanskrit text. Samples of four of
there is a notion called
?)
, that is, if one of the the paragraphs which have correctly been parsed
verb or subject is present, other is obtained to a are given below along with snapshot of one sen-
certain degree of definiteness. Take for example, tences per paragraph.
¢ "Ì4 £ ? £nÍ ÏÎ Ð4 ? # ?
) . 0+z$67& 67&:. × 0+4
the sentence
of saying the full sentence, I say
. If instead
¢ , £ ' : { . ? ¼ - .{ 1 )?.Ø{ y? { ¤¡`.£
1:-
4 £¡Í
is determined as verb. Similarly, if I say $4 ?¥z. 4oÙÚ- . )ÎÚ. 4 ? z$Û Î¯ 0+µ?Þ^4 £ ÜÎÚ# ?+H4 ?z$j 67¢ &67 {µ? :Þ^ Ý
Î
£ £¡Í
, the subject ¢ "
is determined.
:°H:$Æ )£ ß.{ ÝÎ ' µ?Þ^' {5Æ { £ 5)?ß. :{Ñ Ý' Î y|
¢is a kind of appositive expression to the
y?
° y¤! © «¬?) & 13à :67. ÝÎá 0+z68& {
inflectional ending of the verb £nÍ
. We
have used this concept for analyzing the nominal ¤¡: `.ây.
8 .ã y| ( |`¤näj. ) Æ .¶ æ . ç Ý)è Î '?
8
4éyT| : ÝÎ
sentences. That is, verb is determined from the 'êy?{å
ë y? {
#. ÝÎÏ . ?
subject. Mostly, the forms of
only are used
Öj.:.?4é y| ÝÎ x ¡{ ¤nj. {^x : y E|`.ì 6 &
and relations are defined with respect to that.
Although, the analysis done is not exhaustive,
4 67j. ÝÎ.
some ruleset is built to deal with them. Most
of the times, relations in a nominal sentence are
indicated by pronouns, adjectives, genitive. For
example, in the sentence Ñ{Ò
86 {ÒÆ £ 4 {
,
there is
# ?)
of the verb 4
in the sentence
by the subject Æ £ 4 {
. Hence Æ £ {
is related
to the verb as subject.
to
Æ £ 4 {
and

6 { Ì {
is a pronoun referring
is an adjective referring
to Æ £ 4 {
8"
. Similarly, 9
"Ó.Ô® ¢ EÎ ®
. In this
sentence,
. 9
is a pronoun referring to
o® ¢ and ¢
is a genitive to . Here again, there will
be
)
of the verb and o® ¢
will be
related to the verb as subject.
5 RESULTS Figure 2: Parser output for

)Z ¼ . { 0+z68& {
68&`. Ò 0+4 £ ' {í . ? $ - . 1 )?:.
5.1 Databases Developed y? {5 ¤¡`. $¥ .4oëî- .)?. ÝÎ
The following Sanskrit Rule Databases have been
#
® 45 )?y?ïy?C|jåÎ
) x 4$ð¢ ¶?x ¤¡ £ ð1 y?C|jåÎ ¢
developed during the project:-
¤¡Õ`
«¬) )1 ?y?. `2:-
• Nominals ( ) rule database contains
entries for nouns and pronouns declensions '?¤òµ? z
8
845 ' y { . .¶|!
¶.¤n Ù) y|jåÙ¤níñ. Ï
. Â
Îó°
along with their attributes. Ô µ { ' åÎô£ õ¶ { H: y £
• Verb ( 0«¬)
) rule database contains entries Û Î y y :£ :. {&y )!®¤¡. Ï { Îó

.
µ?6Zõ ÛÎ
{5 öî:.
for 10 classes of verb along with their tenses. ¢ £ '¥
45Á÷ . åÎ '? À$ ~ y ÝÎµ6Zé )Û4567.
¶?§ y ¢.. ø0Î ü¡y ¤¡)?Êùä Ô
ú¶?¤¡ £ :û ½ |j. øÎ
ª®Ë& õz 67. øÎ
{ , ~ y £ . 0 y?
Ô 4¶? £ ýª®Ë&

å
Îõ# 67Åµ6Zï 0þ$ ÏÎ Ïÿ
y?:H68 y £ .åÎ y £ . { ²
68ä#y?6 ÝÎc )?ùä: ¢ y £ :é

ï Î Ð .4$E| Îï¶ ?y?45 £ ¢
¶?¢ :¢ x '? H)
845 y?
{ ° ê
8ê
ïA®Ëz67& y|!. {
068y??6 & 13à 67. ÏÎ. 6Figure
¶ 45 4:ÍÙ Parser
÷ output &z y?6 {
.45for
?, ' {Ù1 )?. { ¶y § ¤ y?Î0¶ { y § ¤ ?y { ,
Æ?¶?£ y¤ y?
| { y!®Ë, ~ |j{ y , Ýy Î { , y!® © y {Ý Hy {Ï y|
¶?§ .½ zõ { .~ ~)?4é«¬{ Æ?£ Î |jH:
.ð ¶? °
4 ,T|` 1 )?. ¼ . åÎ0¤n1 .:H:${ .ÆH£ :H.{î~ .
¶?. ½ . 1 y|j6745& 9 68)?
8 ( |` 9 H {
'y!®:, )??&6 :y . ¤n4 åÎò# ' .¶¶? y { ¤¡#.ú y .4 { Îc¤¡4 ù
4 ®¾,?ù× y?
45ùä~ ã#) 1 yT| {¡ $¥Æ?%¡{ { yC|`§ åÎ# x yú¶ {¡1 )?

Ì ¤
ïåÎ ~ { x 45Úèµ {Ì9 . )

H

:
y Ùy å)´Î ® § 4 ®¾åÒÎ¯4 ® µ ,{Ñy 9|` .Û)?¤¡
á ° µ ) éyù
Figure 3: Parser output for ¢
.¶|!
.:¡) y|jn¤¡ñ. . Â ¤ò'?µ? z
845 ' y{ 13~ ¤n |`Ó 4 ?z$ê ¦4$
ÈÖ`.¤n45 .^¶4

.
4éy|jåÎ ( y?
. '( # z|j4 4 ® ' , y?,?.
.*¶?¤n £ é ÏÎ
¥~ 67z$z| ' :É`.ïÎ54$ 67y { ¢ ,|j. {
# ?
® 45 )?y?åyC|`åÎ ) &E| 4 .&. 4 ® , y .4o
)1 ?y? y?¤¡3:- £ ê ¢ Ýx Î 1 1 ) y? $' z$¥ 67.½ Î 4 ® { ,4$EÃ|j& ½ ' 67 y { Îå$¥ y?
,
Än£ y!® {Ð$ ÝÎ Æ?¢ y { ) j)?67y { ) ~ { 9 ó4 .åÎð. ( ï
y?C| åÎ#4$ {¡£ { ) y ïÎ. y|
4 )? £ . Ò ( )Z® y y|j ½ yÌ#À ? Än£ {
y?)?. { ) : ÝÎ ¢ ?'
H . § 4$ £ y )?
(j| { )? £ . ÝÎ . ° { , £ ? ,
167:4$6 ¶ 4$Ï E|ï4éy|Î ¢ ¢ 4567 &óª®Ë& 6 .
È Öj. { 4 ¤n4 { z . 4 ® , Ç#.
Î y? 86 & y?|j yH)? ? y|,? y?Ì¤ y y ÝÎÐ#?
þ øÎ . § 45 ¢ £ 1 ?) y? )? ®z$67& {5~ ÛÎ 45E|`& y|!,?:½ ' &= `.x &þ ~ ¢ ¢
{# 45 ~ ª® ½ { ) ~ ÝÎ45
8 z ½ .ïÎ .
µ?6Z )è° :. ÛÎ Ñ{ï9 { y? ?
y ÝÎc¤¡ 68ª®& z$6 ÝÎ. 68 )
° # . 8£ ª® ¢^. ÝÎ 67 Ñ- y Hî{ 4âyT|jå Î
y!® ?, y67 { Ç÷ ÝÎ y y { ) & { y!® ?,
4 ½Ù : ÝÎ¯ûª®Ëz68& { y|!. { 6 ¶ 45 Í Á÷
# .4$
&
8Ý z y?6 : ÝÎ 9
8Ý
® .ø
~ y ÝÎ
# ' ?
4:- ?
¤n. 68y?,|jÎ8

¤ y?
8:H:Ú- .. ~ y ÝÎ~ y??
, ' &
0j.&:þ ½ y?. #y .45 {3 H. { y &E|` {
ÝÐ
y? :
ï Î y??
, . {ë· ?y
{ Î · y?
x 4$ Figure 5: Parser output for y??
, ' &
y?C|j ,”¢ y ! y?" 4 ® ,? #y . `.&þ ½ y?. #y .45 {^ . { y &E|` {
4é+ÛÎ” ¥ ) & | ½ ½Ù Z ½ y? 6 { $ ÝÎ
#À yåÎ#Àþ 4 ® , y?í?y $¥~ y ÝÎ¬# {
~4 ® 67,?45E.| ¤ { &:z:. .| { y y|,? ½ )?y?.â?4 ¤ Ã .. &Û .
Æ¢ 1 Î The parse results pave the way for represent-
ing the sentence in the form of a Semantic Net. To this end, we have successfully demonstrated
We here give the semantic net for the parse output the parsing of a Sanskrit Corpus employing
given in Figure 2.The Semantic Net is shown in techniques designed and developed in section 2
Figure 6. and 3. Our analysis of the Sanskrit sentences in
the form of morphological analysis and relation
; > Ja8L BED vs
b J > Q analysis is based on sentences as shown in the four
[^]j < adverb
object
paragraphs in previous section. The algorithm for
analyzing compound words is tested separately.
subject location adverb
[ Bj C> P Hence future works in this direction include
K sf^@X7> K 0<K U q _ >CD
object adverb parsing of compound sentences and incorporating
Stochastic parsing. We need to take into account

CL tA > W adjective
the as well. We are trying to come up
LAWs P qaW
adjective with a good enough lexicon so that we can work
J > adjective @P7X S¾ q in the direction of y? ?
in Sanskrit
sentences. Also, we are working on giving all
P q0O > L e W the rules of Panini the shape of multiple layers.
In fact, many of the rules are unimplementable
because they deal with intentions, desires etc.
For that, we need to build an ontology schema.
Figure 6: Semantic net representation of the sen-
) . +z67& 76 &:. +H4 £ ¤¡`' .: {
The Sandhi analysis is not complete and some
. ? ¼ {- . 1 ){ . y? {
tence [

exceptional rules are not coded. Also, not all the
¥ .4o Ú- . :)?. : ÝÎ]

derivational morphology is taken care of. We
have left out many '?. . ..
. Reason behind not
incorporating the '? was that it is difficult
to come up with a general DFA tree for any of
'? ..
6 CONCLUSIONS AND FUTURE
WORK the because of the wide number of rules
applicable. For that, we need to encode the Panini
Our parser has three parts. First part takes care of grammar first.
the morphology. For each word in the input sen-
tence, a dictionary or a lexicon is to be looked up, Acknowledgment
and associated grammatical inforation is retrieved. We humbly acknowledge our gratitude to revered
One of the criterion to judge a morphological Aacharya Sanskritananda Hari, founder and direc-
analyzer is its speed. We have made a linguistic tor of Kaushalya pitham Gurukulam, Vadodara for
generalization and declensions are given the form educating us in all aspects of Sanskrit language.
of DFA, thereby increasing the speed of parser.
Second part of the parser deals with making
”Local Word Groups”. As noted by Patanjali, References
any practical and comprehensive grammar should Blai Bonet and Hctor Geffner 2001. Planning as
be written in ’utsarga apavaada’ approach. In heuristic search. Artificial Intelligence 129.
this approach rules are arranged in several layers
Ferro, M.V., Souto, D.C., Pardo, M.A.A.. 1998. Dy-
each forming an exception of the previous layer. namic programming as frame for efficient parsing.
We have used the ’utsarga apavaada’ approach Computer science, 1998.
such that conflicts are potentially taken care
Ivanov, Y.A., Bobick, A.F. 2000. Recognition of vi-
of by declaring exceptions. Finally, words are sual activities and interactions by stochastic pars-
grouped together yielding a complete parse. The ing. Volume 22, Issue 8, Aug. 2000 Page(s):852
significant aspect of our approach is that we do - 872. IEEE Transactions on Pattern Analysis and
not try to get the full semantics immediately, Machine Intelligence.
rather it is extracted in stages depending on when Briggs, Rick. 1985. Knowledge Representation in
it is most appropriate to do so. The results we Sanskrit and artificial Intelligence, pp 33-39. The
have got are quite encouraging and we hope to AI Magazine.
analyze any Sanskrit text unambiguously. G. Huet. 2002. The Zen Computational Linguistics
Toolkit. ESSLLI 2002 Lectures, Trento, Italy.
G. Huet. 2005. A Functional Toolkit for Morphologi-
cal and Phonological Processing, Application to a
Sanskrit Tagger. Journal of Functional Program-
ming 15 (4) pp. 573–614.
G. Huet. 2006. Shallow syntax analysis in Sanskrit
guided by semantic nets constraints. International
Workshop on Research Issues in Digital Libraries,
Kolkata. Proceedings to appear as Springer-Verlag
LNCS, 2007.
Bureau of Indian Standards. 1999. ISCII: Indian
Script Code for Information Interchange. ISCII-91.
Akshar Bharati and Rajeev Sangal. 1993. Parsing Free
Word Order Languages in the Paninian Framework.
ACL93: Proc. of Annual Meeting of Association for
Computational Linguistics. Association for Com-
putational Linguistics, New Jersey, 1993a, pp. 105-
111.
Kale, M.R. A Higher Sanskrit Grammar. 4th
Ed,Motilal Banarasidass Publishers Pvt. Ltd.
Hopcroft, John E., Motwani, Rajeev, Ullman, Jeffrey
D. 2002. Introduction to Automata Theory, Lan-
guages and Computation. 2nd Ed, Pearson Educa-
tion Pvt. Ltd., 2002.

Behera PDF

Uploaded by

Copyright:

Available Formats

Behera PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Behera PDF

Uploaded by

Copyright:

Available Formats

ANALYSIS OF SANSKRIT TEXT: PARSING AND SEMANTIC

Pawan Goyal Vipul Arora Laxmidhar Behera

Abstract other machine learning tools. These models,

have arrived at the skeletal base upon which many

with an example . For , masculine, sub-classes as , and , 13~ .)

Table 6: Noun example

Pronouns:-According to Paninian grammar D W > LCF X (6) LCU

, , , , , , , <?j ea> LCF X (11) (10)

.y ) yT| 6 )?6 .

, Present Tense, First person,

• F ⊂ {qC0 , qC1 , . . . , qC73 } × {1}

215 219204218 • If both criteria are fulfilled (final state match

tree returns us the root word of the word that we

1. First, a left-right parsing to separate out the 1.

4. :- In this case, the junction is a ha-

5 RESULTS Figure 2: Parser output for

¥ .4o Ú- . :)?. : ÝÎ]

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Behera PDF

Uploaded by

Copyright:

Available Formats

Behera PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Behera PDF

Uploaded by

Copyright:

Available Formats

ANALYSIS OF SANSKRIT TEXT: PARSING AND SEMANTIC

Pawan Goyal Vipul Arora Laxmidhar Behera

Abstract other machine learning tools. These models,

have arrived at the skeletal base upon which many

with an example . For , masculine, sub-classes as , and , 13~ .)

Table 6: Noun example

Pronouns:-According to Paninian grammar D W > LCF X (6) LCU 

, , , , , , , <?j  ea> LCF X (11) (10)

.y  ) yT| 6 )?6 .

, Present Tense, First person,

• F ⊂ {qC0 , qC1 , . . . , qC73 } × {1}

215 219204218 • If both criteria are fulfilled (final state match

tree returns us the root word of the word that we

1. First, a left-right parsing to separate out the 1.

4. :- In this case, the junction is a ha-

5 RESULTS Figure 2: Parser output for

¥ .4o Ú- . :)?. : ÝÎ]

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

with an example . For , masculine, sub-classes as , and , 13~ .)

Pronouns:-According to Paninian grammar D W > LCF X (6) LCU

, , , , , , , <?j ea> LCF X (11) (10)

.y ) yT| 6 )?6 .

¥ .4o Ú- . :)?. : ÝÎ]