Grammar Writers Cookbook
Grammar Writers Cookbook
Grammar Writers Cookbook
page iii
Copyright
c 1999
CSLI Publications
Center for the Study of Language and Information
Leland Stanford Junior University
Printed in the United States
17 16 15 14 1e 2 3 4 5
ISBN 1-57586-171-2
ISBN 1-57586-170-4 (pbk.)
∞ The acid-free paper used in this book meets the minimum requirements
of the American National Standard for Information Sciences–Permanence of
Paper for Printed Library Materials, ANSI Z39.48-1984.
Abbreviations xiii
1 Introduction 1
1.1 Parallel Grammars 1
1.2 Overview of LFG 3
1.3 Levels of Representation 6
1.4 Implementation and Environment 13
2 The Clause 19
2.1 Root Clauses 19
2.1.1 Declaratives 19
2.1.2 Interrogatives 25
2.1.3 Imperatives 29
2.2 Embedded Clauses 31
2.2.1 Subcategorized Declaratives 32
2.2.2 Subcategorized Interrogatives 36
2.3 Clausal Adjuncts 38
2.3.1 Infinitival Adjuncts 38
2.3.2 Participial 39
2.3.3 Finite 40
2.4 What about X0 Theory? 42
v
vi / A Grammar Writer’s Cookbook
3 Verbal Elements 45
3.1 Subcategorization 45
3.2 Nonverbal Subcategorization 47
3.3 Types of Grammatical Functions 48
3.3.1 Subjects 48
3.3.2 Objects 50
3.3.3 Secondary Objects (OBJ2) 51
3.3.4 Obliques 52
3.3.5 XCOMP and COMP 53
3.3.6 Adjuncts 58
3.4 Altering Subcategorization Frames 61
3.5 Auxiliaries 63
3.5.1 Brief Introduction to the Auxiliary Systems 64
3.5.2 Previous Analyses 65
3.5.3 Flat F-structure Analysis 66
3.5.4 Morphosyntactic Structure 68
3.5.5 The Treatment of Tense/Aspect 69
3.6 Modals 69
3.7 Particle Verbs 71
3.8 Predicatives 73
3.8.1 Controlled Subject Analysis 74
3.8.2 Predlink Analysis 74
3.9 Bridge Verbs 75
3.10 Verbal Complexes 76
3.10.1 German Coherent Verbs 76
3.10.2 French Causatives 76
3.10.3 Noun-Verb Constructions 77
4 Nominal Elements 79
4.1 Pronouns 79
4.1.1 Personal and Demonstrative Pronouns 79
4.1.2 Interrogative and Relative Pronouns 80
4.1.3 Expletive Pronouns 81
4.1.4 Reflexives 82
4.1.5 Clitics 85
4.2 Full Noun Phrases 87
4.2.1 English 88
4.2.2 German 88
Contents / vii
4.2.3 French 90
4.2.4 F-structure 91
4.3 Compounds and N-N Sequences 94
4.4 Relative Clauses 97
4.4.1 Bound Relatives 98
4.4.2 Free Relatives 99
4.5 NPs without a Head Noun 102
4.5.1 Nominalized Adjectives 102
4.5.2 Headless NPs 103
4.6 The NP Squish 104
4.6.1 Gerunds 104
4.6.2 Sentential Subjects 105
8 Coordination 145
8.1 Basic Approach 145
8.2 Same Category Coordination 148
8.2.1 General Schema 148
8.2.2 Special Rules for Clauses 148
8.3 NP Coordination 149
8.3.1 Basic Structure 150
8.3.2 Agreement 151
8.4 Problems 152
14 Performance 203
14.1 Robustness 203
14.1.1 Extraction of Subcategorization Frames 205
14.1.2 Statistical Methods and Chunk Parsing 205
14.1.3 Optimality Theory 207
14.2 Testing 212
14.2.1 Types of Testsuites 212
14.2.2 Further Tools and Databases 215
14.3 Measuring Performance 217
14.3.1 Rule Interactions 217
14.3.2 Grammar Internal Performance 221
14.3.3 Cross-grammar Performance 224
References 231
xi
Abbreviations
xiii
1
Introduction
1
2 / A Grammar Writer’s Cookbook
and Moshi 1990), the explicit listing of grammatical functions in the subcategoriza-
tion frame of a predicate became inapplicable, as the explicit mapping from argu-
ment structure (lexical semantics) to grammatical functions takes place in terms of
linking principles. The f-structure in (1) does not reflect this, but would make it
appear that drink subcategorizes directly for a subj and an obj rather than, for ex-
4 / A Grammar Writer’s Cookbook
np vp
Peter v np
drinks
coffee
c. f-structure:
pred 0 drink<subj,obj>0
h i
pred 0 Peter0
subj
h i
obj pred 0 coffee0
vp −→ v np
↑=↓ (↑obj)=↓
In each rule or lexical entry constraint, the ↑ metavariable refers to the
φ-image of the mother c-structure node, and the ↓ metavariable refers
to the φ-image of the nonterminal labeled by the constraint (Kaplan
and Bresnan 1982:183). The annotations on the rules indicate that the
f-structure for the s has a subj attribute (↑ in the annotation on the
np node) whose value is the f-structure for the np daughter (↓ in the
annotation on the np node), and that the s node corresponds to an
f-structure which is the same as the f-structure for the vp daughter.
The functional projection of a c-structure node is the solution of con-
ample, an agent and a patient. As the f-structures are much easier to read with the
subcategorization frame spelled out in terms of grammatical functions, and since
we have not implemented a version of linking theory, we have retained the earlier
convention. Note that in the original version of lfg, semantic forms encoded the
relation between grammatical functions and thematic roles.
Introduction / 5
np vp
Peter
6 / A Grammar Writer’s Cookbook
preds, which by definition cannot unify. The second are instantiated forms, which
are defined by the user as attributes whose values cannot unify.
Introduction / 7
x0 principles, so that nps dominate nouns or pronouns, vps dominate verbs, aps
dominate adjectives, etc. (see Bresnan 1982b for x0 Theory within lfg and Sells
1985 for a summary and comparison across theories). However, we avoided using
explicit x0 terminology such as n0 or v0 in order to avoid any potential confusions
as to what the phrase structure trees are expressing: they are not always binary
Introduction / 9
Other
Morph
Tokenization
char fsm Analysis fsm Transducers
fsm
Unification
Lexical
Chart Chart decorated
Look-up Initial
Parser with constraints Complete
Chart Analysis
Graphical
User
Interface
LFG
Lexicon Rules
As shown in (15), xle parses and generates sentences on the basis of
grammar rules, one or more lfg lexicons (see section 11.4), a tokenizer
(see 11.3.1), and a finite state morphological analyzer (see 11.3.2), as
well as other finite state modules such as a guesser and a normalizer.
A more complete discussion of the various parts of the xle architec-
ture in (15) as pertaining to grammar development within ParGram can
be found in Part II. In addition, an xle “user manual” documenting
ism. As such, it is highly recommended for teaching purposes, and smaller (perhaps
experimental) grammars. It is available at
http://www.parc.xerox.com/istl/groups/nltt/medley/.
Introduction / 15
The Clause
19
20 / A Grammar Writer’s Cookbook
French, and English. When dealing with punctuated input, we parse the punctuation
as part of the input (punctuation differs widely crosslinguistically, but not in our
sample set — see Nunberg 1990 for a discussion of the linguistics of punctuation).
However, when dealing with text derived from natural speech recordings, in which
punctuation is not part of the original input, punctuation is not parsed.
The Clause / 21
in which the finite v must precede all other arguments and adjuncts.
On the face of it, this does not appear to be so different from what
is done for the English and French grammars. The crucial difference,
however, lies not in the expansion of s, but in the functional annotations
associated with these expansions, as shown in (5) (leaving aside pps for
ease of exposition).
(5) s −→ np vp
(↑xcomp* gf)=↓ ↑=↓
vp −→ v np*
↑=↓ (↑xcomp* gf)=↓
The functional annotations make use of regular expressions to allow
for an infinite disjunction of possibilities. The Kleene star ‘*’ on the
xcomp indicates that the np in question may be embedded under any
number of verbal complements, while the gf is shorthand notation for
a disjunction of governed (subcategorized for) grammatical functions
such as subject, object, and oblique.3 Thus, (6a) will be instantiated as
in (7a) and (6b) as in (7b). For purposes of illustration, the c-structure
and f-structure of (6b) are shown in (8). Note that the f-structure of
the German (6b) and of the English equivalent are essentially identical,
despite the difference in word order. It is only the c-structures which
differ.
(6) a. Der Fahrer startet den Traktor.
the.Nom driver starts the.Acc tractor
‘The driver starts the tractor. (German)
b. Den Traktor startet der Fahrer.
3 The implementation realized within xle is a nonconstructive one. Layers of
complementation (xcomps here) are only instantiated if there is evidence for them
elsewhere. That is, the expansion of the Kleene star is very constrained in practice.
For a detailed discussion on why the power introduced by functional uncertainty
does not render the formalism of lfg undecidable see Kaplan and Maxwell 1988a.
24 / A Grammar Writer’s Cookbook
(7) a. s −→ np vp
(↑subj)=↓ ↑=↓
vp −→ v np
↑=↓ (↑obj)=↓
b. s −→ np vp
(↑obj)=↓ ↑=↓
vp −→ v np
↑=↓ (↑subj)=↓
s period
.
np vp2
d n
der Fahrer
The Clause / 25
0
c. pred starten<subj,obj>0
stmt-type declarative
" #
tense pres
tns-asp
mood indicative
0
Fahrer0
pred
pers
3
gend masc
case nom
subj
num sg
ntype count
" #
spec spec-form der
spec-type def
0
Traktor0
pred
pers
3
gend masc
case acc
obj
num sg
ntype count
" #
spec spec-form der
spec-type def
2.1.2 Interrogatives
Interrogatives often have substantially different c-structures from their
declarative counterparts. The syntactic encoding of questions also dif-
fers widely crosslinguistically. For example, some languages form yes-no
questions solely by a change in intonation, whereas others insert a spe-
cial morpheme indicative of the type of question. Some languages (such
as English) place interrogative words (wh-words) in a certain position
(clause initial in English), while other languages leave the interrogative
words in situ.
This wide variation in syntactic encoding is reflected by the three
ParGram languages. In French, for example, yes-no questions involve
the appearance of the special phrase est-ce que, as in (9b). In all three
languages, yes-no questions may also be formed by subject-auxiliary
inversion, as in (9c), (10b) and (11b).
26 / A Grammar Writer’s Cookbook
In keeping with the aim of parallel grammar development and the un-
derlying tenets of lfg, the c-structure analysis of interrogatives differs
from language to language as it interacts with other syntactic properties
of the language (i.e., auxiliary inversion, English do-support, German
scrambling, the position of subjects, etc.). However, at the level of f-
structure, the analysis aims to encode a more universal representation
of the constructions. As such, the c-structures for the German interrog-
ative in (14) and its English counterpart in (15) differ, but the resulting
f-structure analyses are essentially identical.5
(14) a. Was hat er gesehen?
what has he seen
‘What did he see?’ (German)
b. root
sint int-mark
?
npint vp2
pron vc
er
v
gesehen
language particular morphosyntactic forms. As these differ in (14a) and (15a), this
difference is preserved. The fact that the German perfect tense is often interpreted
as a simple past tense must be handled in the semantics. The vsem encodes the
properties of unaccusativity vs. unergativity and is instrumental in the treatment
of auxiliary selection in French and German.
28 / A Grammar Writer’s Cookbook
0
c. pred sehen<subj,obj>0
stmt-type interrogative
vsem unerg
" #
tense perf
tns-asp
mood indicative
0
pro0
pred
3
pers
gend masc
subj case nom
num sg
pron-type pers
pron-form er
0 0
pred pro
pers
3
gend neut
case acc
obj
num sg
int +
pron-type int
pron-form was
(15) a. What did he see?
b. root
cpint int-mark
?
npint auxdo s
did
pronint np vp
what
pron vpv
he
v
see
The Clause / 29
0
c. pred see<subj,obj>0
stmt-type interrogative
" #
tense past
tns-asp
mood indicative
0
pro0
pred
pers
3
gend masc
+
anim
subj
case
nom
num
sg
pron-type pers
pron-form he
0 0
pred pro
pers
3
+
int
gend neut
obj
case acc
num sg
pron-type int
pron-form what
2.1.3 Imperatives
Imperatives have a number of distinctive features which separate them
from declaratives and interrogatives. In many languages, they have dis-
tinct morphology on the verb and are not tensed, but instead show a
different mood. For example, English imperatives use the base form of
the verb, while in French and German imperatives can either be bare
infinitives or a special imperative form. Some sample imperatives are
shown in (16).
(16) a. Push the button.
b. Tourne-le doucement.
turn.Imp-it gently
‘Turn it gently.’ (French)
c. Den Hebel vorsichtig drehen.
the.Acc lever carefully turn
‘Turn the lever carefully.’ (German)
30 / A Grammar Writer’s Cookbook
given in (19), where the verb think subcategorizes for two arguments:
a subject the driver and an embedded that-clause.
(19) The driver thinks [(that) she has started the tractor].
Following standard lfg analyses as formulated in Bresnan 1982a, we
encode the argument status of these embedded clauses by treating them
either as a comp(lement) or xcomp(lement) argument of the matrix
verb. Both these complements are in turn headed by verbs. In (19)
above the verb start heads the complement clause. The two types of
complements differ from one another in terms of how the embedded
subject (she in (19)) is bound. An xcomp can be thought of as an
“open” function in the sense that the embedded subject must be con-
trolled by an argument in the root (matrix) clause. This is the case
in sentences such as The driver wants to start the tractor , where the
driver and the starter of the tractor must be one and the same person.
Such occurences of control are referred to as functional control and
are contrasted with instances of anaphoric control , which are argued
to occur with the “closed” complement comp. A comp either displays
an overt subject of its own, as in (19), or it can have an anaphorically
controlled pro subject. In other words, the subject of the comp is not
identical with an argument of the matrix clause, but instead must be re-
constructed (anaphorically) from the larger context. For details on the
notion of control within lfg, and in particular the distinction between
functional and anaphoric control, see Bresnan 1982a.
Finally, xcomps are generally (but not always) associated with non-
finite complements and comps with finite complements. As with root
clauses, embedded clauses can be declarative or interrogative (but not
imperative). A given verb will require a given type and form of embed-
ded clause; these requirements are stated as part of the verb’s lexical
entry. For a more detailed exposition on these grammatical functions
and the encoding of verbal lexical entries, see Chapter 3. In the remain-
der of this section we go through various types of embedded clauses and
present the analyses implemented in ParGram.
2.2.1 Subcategorized Declaratives
The c-structure of embedded declaratives6 crosslinguistically usually
differs from that of root declaratives. In German, French and English,
embedded declaratives must generally be introduced by an overt com-
plementizer, as in (20). In German the position of the finite verb is
clause final, as opposed to in a matrix clause, where it appears in sec-
6 Here the term declarative is used to encompass those embedded clauses which
ing English, French and German. For example, an embedded clause indicating a
past event will appear in the pluperfect if the root verb is also past tense.
He said that he had started the tractor.
(=He said, “I started the tractor.”)
There are also often constraints on the mood of the embedded clause, e.g., sub-
junctive or indicative. The rules governing this phenomenon are semantic and are
as yet not well understood (Kamp and Reyle 1993, Abush 1994). As such, the Par-
Gram grammars simply parse and record the morphosyntactic tense for potential
semantic processing, but do not try to establish semantically based wellformedness
constraints.
34 / A Grammar Writer’s Cookbook
0
fragen<subj,comp>0
pred
stmt-type declarative
" #
tns-asp tense past
mood indicative
0 0
pred pro
3
pers
case nom
subj
num sg
gend fem
pron-type pers
pron-form sie
0 0
pred laufen<subj>
stmt-type interrogative
comp-form ob
" #
tense past
tns-asp
mood indicative
0 0
pred traktor
comp
pers
3
gend masc
case nom
subj
num sg
ntype count
" #
spec-form der
spec
spec-type def
The fact that an embedded clause is interrogative is usually sig-
nalled by the presence of particular lexical items or with a distinctive
c-structure. Some languages use distinctive interrogative pronouns and
particles for embedded clauses, e.g., English whether and French ce que
‘what’. As with root interrogatives, embedded interrogatives are as-
signed stmt-type interrogative in the f-structure; this feature can
be used to satisfy the subcategorization requirements of verbs which
take interrogative complements.
Note that while the matrix clause (headed by the verb fragte ‘asked’) is
declarative, the embedded comp clause is marked as interrogative. As
mentioned before, an embedded declarative clause would receive essen-
tially the same f-structure analysis, but the value for the stmt-type
38 / A Grammar Writer’s Cookbook
and the comp-form would differ. Also note that while the encoding of
the features comp-form and stmt-type are used to help in the for-
mulation of wellformedness conditions in minor ways in the grammar,
their primary reason for existence is a registration of information that
is presumably useful for subsequent semantic analysis.
Verbal Elements
3.1 Subcategorization
Subcategorized arguments are those arguments which are required by a
verb or other predicate, i.e., if they do not appear the clause will either
be ungrammatical or have a different meaning. There are a number of
issues to be considered concerning the subcategorization of grammatical
functions.
First, what are the possible grammatical functions which a predicate
can subcategorize for? This in part depends on the linguistic theory.
All versions of lfg assume the following functions: subj, obj, comp,
xcomp, obl. Some versions subdivide obl into different types, depend-
ing on their thematic role, usually indicated as oblθ , e.g., oblloc for
locative obliques. Similarly, xcomp can be divided into types depend-
ing on the head of the xcomp, e.g., ones with verbal heads are vcomps,
while ones with adjectival heads are acomps (cf. Kaplan and Bresnan
1982). Most versions of the theory assume either obj2 (e.g., Kaplan and
Bresnan 1982) or objθ for double object constructions in languages like
45
46 / A Grammar Writer’s Cookbook
type and number of arguments. Verbs which subcategorize for one ar-
gument are generally referred to as intransitives; ones with two argu-
ments are transitives; and ones with three arguments are ditransitives.
Additionally, many verbs have more than one subcategorization frame,
as in (5), each of which must be part of the lexical entry of the verb.
(5) a. Il laisse jouer les enfants.
he lets play the children
‘He lets the children play.’ (French)
b. Elle laisse ses clefs dans son sac.
she leaves her keys in her bag
‘She leaves her keys in her bag.’ (French)
c. Il laisse sa clef à la gardienne.
he entrusts his key to the door keeper
‘He entrusts his key to the door keeper.’ (French)
Furthermore, lexical rules such as passive may affect whole classes of
subcategorization frames (see section 3.4 for a general discussion and
section 13.2.3 for a more formal description of lexical rules), system-
atically creating new ones. These two factors can make determining
subcategorization frames less straightforward than it initially seems.
A more detailed discussion on the ParGram efforts at building large
lexicons is provided in Part II, section 14.1.1, which includes the semi-
automatic extraction of subcategorization frames from corpora and the
use of previously existing resources such as machine-readable dictionar-
ies.
Note that this is not the approach taken here. In our approach the
subordinating conjunctions are treated as functional elements which
serve to mark a subordinated clause (see section 2.2 in the previous
chapter).
There is much debate in the linguistic literature as to the extent
to which nominal elements subcategorize for arguments and, if they
do, what type of grammatical functions these represent (e.g., Chomsky
1970, Grimshaw 1990). The issue arises largely due to pairs like those
in (8) in which the modifiers of the noun correspond to arguments in
the verbal construction.
(8) a. np: the Romans’ destruction of the city
b. s: the Romans destroyed the city
In many cases, it is possible to simply treat these as adjuncts (section
3.3.6) in the case of the of the city and a specifier (spec, see section
5.1.1 for a discussion of specifiers) and leave the task of connecting the
semantic relatedness of adjuncts to the noun to a semantic module.
However, it is also possible to develop an analysis whereby nouns de-
rived from verbs, such as destruction in (8) above, would retain the
verb’s subcategorization frame. The syntactic rules for nps would then
have to ensure that modificatory nps such as of the city and The Ro-
mans’ fill the right grammatical function slots. Within our approach,
we have chosen to represent these modificatory nps as adjuncts to the
main noun because we believe that the proper level of analysis for the
dependency relations between the main noun and its modifiers in this
case is at the level of thematic argument structure, not at the level
of grammatical functions. That is, the generalization that unifies the
common dependencies between (8a) and (8b) is that in both cases the
Romans and the city function as agent and patient, respectively, of
the main predicate (cf. Grimshaw 1990). In the verbal manifestation
in (8b) these thematic arguments are realized in terms of the gram-
matical functions subj and obj, while in the nominal case in (8a) the
arguments are realized as adjuncts or specifiers. Sample analyses show-
ing how nps are analyzed within our approach can be found in Chapter
4.
analysis of obj2s in the French grammar after one of our co-authors had already
left the project.
Verbal Elements / 53
(31) a.
The driver thinks that the tractor will start.
0
b. pred think<subj,comp>0
stmt-type declarative
mood indicative
tns-asp
tense pres
0
driver0
pred
spec spec-form the
spec-type def
subj ntype count
anim
+
pers
3
case
nom
num sg
0
start<subj>0
pred
mood indicative
tns-asp
tense fut
pred 0
tractor 0
spec spec-form the
comp
spec-type def
subj ntype count
−
anim
3
pers
case nom
num sg
In the example with the comp in (31), on the other hand, the matrix
subject does not control the embedded subject: each of the subjects is
independent of each other. Also note that the finite comp in (31) has
its own tns-asp specification.
In many languages the difference between open complements (xcomp)
and closed complements (comp) correlates with a difference between
finite and nonfinite clauses. However, this seeming correlation is mis-
leading. Although it is rare, finite clauses can appear with controlled
subjects and must thus be analyzed as xcomps. More frequently, non-
finite clauses may have a nonovert subject which is not functionally
controlled, as in (32). These clausal arguments are analyzed as closed
categories (in (32) as a subj) and are assumed to be anaphorically
controlled (see Bresnan 1982a): that is, the subject of the clausal com-
plement must be determined from the context of the utterance, as in
Verbal Elements / 57
(32), where it is not clear who pinched the elephants when the sentence
is uttered in isolation. Note that these discourse considerations cannot
be part of a syntactic analysis. As shown in (32b), all the f-structure
analysis encodes is the fact that the subject is a noncontrolled pro
whose referent is yet to be determined.
(32) a. Pinching those elephants was foolish.
0
be<subj,predlink>0
b. pred
stmt-type declarative
mood indicative
tns-asp
tense past
0 0
pred pinch<subj, obj>
h i
tns-asp prog +
ntype gerund
pers 3
case nom
num sg
0
pro0
pred
subj
pron-type null
subj
0 0
pred elephant
spec-form those
spec spec-type demon
obj deixis distal
ntype count
pers 3
case acc
num pl
0 0
pred foolish
predlink
atype predicative
The example in (32) also illustrates another fact that may not be
obvious: clausal complements are not restricted to the object role. In
(32) the gerundive clause functions as the subject of the sentence, as is
the case for the German that-clause in (33).
(33) Daß die Erde rund ist, hat ihn gewundert.
that the.Nom earth round is has him surprised
‘That the world is round surprised him.’ (German)
A good discussion of the properties of German comps and further
examples of usage in which the clausal complement stands in alterna-
58 / A Grammar Writer’s Cookbook
tion with with genitive nps and differing types of pps can be found in
Berman 1996.
3.3.6 Adjuncts
Adjuncts are grammatical functions which are not subcategorized for by
the verb. Adjuncts include a large number of disparate items, including
adverbs (Chapter 7), prepositional phrases (Chapter 6), and certain
embedded clauses (section 2.3). Examples of all three kinds are found
in (34a). Adjuncts are analyzed as belonging to a set, which can occur
with any pred, i.e., they are not subcategorized for. So, (34a) would
have the simplified f-structure in (34b).
(34) a. When the light flashes, quickly push the lever towards the
seat.
0
push<subj,obj>0
b. pred
h i
pred 0 pro0
subj
h i
0 0
obj
pred lever
h i
pred 0 quickly0
0 0
pred towards<obj>
h i
0 0
obj pred seat
adjunct
0 0
pred flash<subj>
comp-form when
h
i
pred 0 light0
subj
Another problem with the collection of adjuncts into one and the
same set is that adjuncts may differ systematically with regard to their
syntactic properties and that it would therefore be more convenient to
be able to identify the type of adjunct one is dealing with in a more
obvious manner than to go check each of the members of the set of all
adjuncts for a particular feature. For example, if we wanted to check
whether a given adjunct was a relative clause, the grammar could check
for the presence of the feature pron-rel.
Another way to go about this is to define different types of adjuncts
according to systematic syntactic criteria. In our grammars, for exam-
ple, relative clauses are encoded as adjunct-rel, as illustrated by (36),
parentheticals are adjunct-paren, and comparatives are adjunct-
comp.
a restricted form, e.g., with the preposition by in English. This analysis has the
advantage that it captures the fact that the obl plays a special role with respect
to the verb, i.e., that it is the logical subject. However, this analysis also has the
disadvantage that the obl is not obligatory since the demoted subject need not be
overtly expressed. This means that every passive occuring with an argument of the
appropriate form will have two analyses, one in which the argument is the obl and
one in which it is an adjunct. In our grammars passives do not subcategorize for an
obl, meaning that the demoted agent is always an adjunct; the semantics can then
determine whether a member of the adjunct set is the agent of the verb.
62 / A Grammar Writer’s Cookbook
This lexical rule is called by verbs which can passivize, e.g., by tran-
sitive verbs. The lexical rule has the obj become the subj and the old
subj become null. In addition, it requires a passive form of the verb
to be used. schemata refers to the subcategorization frame of the verb
which is provided by verbs calling pass.
Different languages may require variations of the passive lexical rule.
For example, German allows passivization of certain intransitive verbs,
as in (39), and hence needs a variant of passivization which does not
require the obj to become the subj.
(39) Gestern wurde viel gelacht.
yesterday was much laughed
‘There was much laughter yesterday.’ (German)
The lexical rule for these cases looks as in (40) and is called by the
set of verbs which undergo this type of passivization.5
(40) nosubj-pass(schemata) = schemata
(↑subj) −→null
(↑passive)=c +
The ability to passivize and the type of passivization must thus be
specified in the lexical entry of each verb. The lexical rules not only
serve as a useful tool for encoding the same generalization over a subset
of the verbs in a given language, but also encode precisely the fact that
a linguistically relevant generalization can and must be made over the
lexicon of a given language.
A second example of a productive alternation is the formation of
medial passives in French. These are signalled by the reflexive pronoun
se and are illustrated in (41).
(41) a. Cette chemise se porte avec une cravate rouge.
this shirt refl wears with a tie red
‘This shirt is worn with a red tie.’ (French)
b. Ces livres se vendent bien.
these books refl sell well
‘These books sell well.’ (French)
The lexical rule for this construction is similar to that of the regular
passive with the additional restriction that refl have the value +. This
value is provided by the reflexive pronoun, which provides no other
information to the f-structure (see section 4.1.4 on reflexives).
5 This lexical rule also applies to transitive verbs in German which have a dative
object, as in Ihm wurde geholfen ‘Him was helped.’ In these cases the object remains
a dative rather than becoming a nominative subject.
Verbal Elements / 63
3.5 Auxiliaries
Auxiliaries crosslinguistically are a closed class of verbal elements. They
generally have developed from main verbs such as be, stay, have or
64 / A Grammar Writer’s Cookbook
b. root
s period
.
np vp
pron vpaux
he
aux vpaux
will
aux vpv
have
v np
driven
the tractor
0
c. drive<subj,obj>0
pred
stmt-type declarative
tense fut
tns-asp
perf +
mood indicative
0 0
pred pro
pers
3
+
anim
gend
masc
subj
pron-type pers
pron-form he
num sg
case nom
0 0
pred tractor
spec spec-form the
spec-type def
obj
pers 3
anim −
num sg
case acc
Under this analysis the parallelism between the examples in (48) can
be captured at a functional level as each of the sentences receives the
same f-structure analysis in each of the grammars.
68 / A Grammar Writer’s Cookbook
3.6 Modals
Unlike auxiliaries which are analyzed as having a flat f-structure and
no pred, modals have a pred and subcategorize for xcomp comple-
ments. Some languages, such as English, are restricted to a single modal
70 / A Grammar Writer’s Cookbook
per clause; others, like German, allow modals to take other modals as
their complements and also allow scrambling that does not respect the
hierarchy of embedding. An example is shown in (52).
Again, we do not attempt to provide a semantic analysis. On the other
hand, the scrambling phenomena of German can be treated very simply
under the application of functional uncertainty whereby arguments and
adjuncts can be base generated in any of the positions they might be
found in and connected to the clause they belong in via a functional
uncertainty path such as (↑xcomp xcomp obj). For an introduction
and discussion of functional uncertainty, see Johnson 1986, Kaplan and
Zaenen 1989, Zaenen and Kaplan, and Kaplan and Maxwell 1996.
(52) a. Einen raschen Erfolg müßte er erzielen können.
a.Acc rapid success should.Subj he achieve can
‘He should be able to achieve a rapid success.’ (German)
Verbal Elements / 71
b.
pred 0 müssen<xcomp>subj0
stmt-type declarative
tense pres
tns-asp
mood subjunctive
pred 0 pro 0
pers 3
num sg
case
subj nom
gend masc
pron-type pers
pron-form he
pred 0 können<xcomp> subj 0
subj
0 0
pred erzielen< subj,obj>
subj
0 Erfolg0
pred
spec-form ein
spec
spec-type indef
pers 3
xcomp
num sg
xcomp case acc
obj gend masc
pred 0 raschen0
case acc
adjunct gend masc
num sg
atype attributive
3.8 Predicatives
Predicative constructions involve a linking or copular verb which has
a subject and another argument, as in (56). The postverbal argument
can be of a number of categories, e.g., np, pp, ap.
(56) a. John is a professor.
b. Le gyrophare est sur le toit.
the beacon is on the roof
‘The beacon is on the roof.’ (French)
c. Der Traktor ist rot.
the.Nom tractor is red
‘The tractor is red.’ (German)
74 / A Grammar Writer’s Cookbook
Due to the semantic relationship between the subject and the phrase
after the linking verb, these verbs are given special subcategorization
frames. Traditionally, this has been done by having the postverbal
phrase be an xcomp whose subject is controlled by the linking verb’s
subject. However, a new analysis, termed the predlink analysis, is
used by the ParGram grammars.
Under both approaches, linking verbs may have their own c-structure
category and their own vp rule which allows the postverbal np, ap, and
pp to be assigned the appropriate grammatical function; note that most
verbs do not allow these c-structure categories to have such grammat-
ical functions.
3.8.1 Controlled Subject Analysis
Under the controlled subject analysis, the subcategorization frame of a
verb like linking be is as in (57).
(57) (↑pred)=0 be<(↑xcomp)>(↑subj)0
(↑subj)=(↑xcomp subj)
The main drawback of this approach is that the postverbal con-
stituent must have a subj to be filled by the control equation of the
verb. However, as nps, aps, and especially pps do not generally have an
overt subject, we believe the representation of the relationship between
the noun and thing predicated of it should either be encoded at the
level of argument structure, or of semantic structure. Note that if one
does implement the controlled subject analysis, it becomes necessary
to provide two subcategorization frames for each of these categories:
one without a subj argument for simple nps such as the a cat in A cat
ate my food , and one for predicatively used nps such as a cat in Harry
is a cat.
3.8.2 Predlink Analysis
The predlink analysis avoids these difficulties by positing a grammati-
cal function predlink. Under this analysis the subcategorization frame
of a verb like the copula be is shown in (58). This representation for
predicative constructions models the fact that a particular property is
predicated of the subject in a syntactically reasonable way and provides
enough information for subsequent semantic analysis. As predlink is
a closed category, there is no control equation between the subj and
the predlink and hence no need for nps, aps, and pps to have subject
arguments.
(58) (↑pred)=0 be<(↑subj)(↑predlink)>0
Verbal Elements / 75
the fact that the two verbs ‘make’ and ‘repair’ appear to act as a
single predicate with respect to phenomena such as clitic climbing. As
illustrated in (64), the clitic lui may appear in the domain of the verb
fait ‘make’ even though it is actually an argument of the verb réparer
‘repair’.
(64) Marie lui a fait réparer la voiture.
Marie him has made repair the.F car
‘Marie made him repair the car.’ (French)
Such clitic climbing is not possible in the usual verb-complement con-
structions and as such a different analysis, involving neither comp
or xcomp, must be posited. Recent argumentation about two differ-
ent approaches within lfg may be found in Frank 1996 and Alsina
1996. Pointers to previous work and to differing analyses within vari-
ous frameworks may be found in the references therein.
3.10.3 Noun-Verb Constructions
Finally, there are a number of noun-verb constructions which also ap-
pear to act as a single predicate in German. These constructions, illus-
trated here by (65), are referred to as Funktionsverbgefüge (see Helbig
1984 for a description).
(65) Der Fahrer traf eine Entscheidung.
the driver hit a.Acc decision
‘The driver made a decision.’ (German)
In these constructions the main predicational force comes from the
noun, while the verb serves to encode aspectual, Aktionsart, or other
semantic information that has only been vaguely defined in the litera-
ture. As can be seen from the translation in (65), English has a similar
construction, as in make a decision, make a claim, take a shower , etc.
These constructions, along with the French complex predicates and
the German coherent constructions, currently receive treatments in our
grammars; however, the current state of the implementation represents
work in progress.
4
Nominal Elements
Nominal elements include nps, free relative clauses, pronouns and cli-
tics, and other elements which can sometimes act as nominals, such as
gerunds. Relative clauses are also discussed here due to their similarities
with free relatives.
A full np may include a determiner, modifiers, such as adjectives,
prepositional phrases, relative clauses, and pre- and post-nominal ver-
bal modifiers (see section 4.2). All nps have case, but the manner in
which case is assigned varies in the three grammars. In English, for
example, nps are not case marked morphologically. Case is assigned by
the c-structure rules, according to the position in which the np appears.
This is in contrast to German, where the verbs specify the case of the
arguments they subcategorize for (i.e., they assign case), and any overt
morphological case marking on a noun or a determiner plus noun (see
Chapter 5) must be compatible with the case assigned by the verb.
4.1 Pronouns
There are a number of different types of pronouns. Some of these
share c-structure categories, such as English expletives and personal
pronouns, and have differing f-structures, while others have specific
c-structure categories as well, such as the interrogative and relative
pronouns. The French clitic pronouns are dicussed in section 4.1.5.
4.1.1 Personal and Demonstrative Pronouns
Personal pronouns supply information about person, number, gender,
and case. In English, they also supply information about animacy. Pro-
nouns, like proper nouns, generally cannot appear with determiners or
prenominal modification.1 They are thus instantiated directly under
np. However, pronouns do allow relative clauses, as in (1). In English,
1 Exceptions are fixed expressions like lucky me, poor you, or a certain someone.
79
80 / A Grammar Writer’s Cookbook
these pronouns form a restricted set and are treated as full nouns within
the c-structure. At f-structure they are given an analysis which is in
accord with other pronouns. In German, the ability to take relative
clauses is part and parcel of being a pronoun, so no special rules are
needed.
(1) a. someone that I know
b. Ihn, den ich kenne
him whom I know
‘him, who I know’ (German)
As can be seen in (2), pronouns are analyzed as having a pred
value of pro, indicating that these are anaphors awaiting resolution
within the semantic component. In order to provide such a component
with as much information as possible, the surface form of the pronoun
is encoded in the pron-form feature. Also encoded are the gender,
number, person and animacy features which are needed for a semantic
evaluation of the pronoun.
(2) a. np
pron
he
0
pro0
b. pred
pers
3
num sg
gend masc
case nom
pron-type pers
pron-form he
anim +
pronrel
which
They are assigned a pron-type of int or rel, respectively. In addi-
tion, they also contribute a pron-form attribute, whose value registers
the actual form of the pronoun.
0
(4) pred pro0
pron-type rel
pron-form which
anim −
4.1.3 Expletive Pronouns
Expletive pronouns are distinct in that they do not refer to an actual
entity. That is, they are not anaphoric and as such need not undergo
semantic resolution. These form a restrictive class, and in many lan-
guages only appear in subject position. English and French are exam-
ples of such languages. In German, however, the expletive es ‘it’ may
also appear in object position.2 Expletives are encoded by not being
assigned a pred value: they are predicationally empty (Kaplan and
Bresnan 1982). They are encoded with a pron-type expletive, and
their surface form is registered in pron-form.
(5) a. There is a light in the tractor.
b. Il est possible de démonter le tracteur.
it is possible of disassemble the tractor
‘It is possible to disassemble the tractor.’ (French)
c. Es regnet.
it rains
‘It is raining.’ (German)
d. Er hat es nicht gewußt, daß die Lampe kaputt ist.
he has it not known that the.Nom lamp broken is
‘He didn’t know that the lamp is broken.’ (German)
2 English may also have object expletive pronouns in constructions like I prefer
it when it is cold.
82 / A Grammar Writer’s Cookbook
4.1.4 Reflexives
Many languages have two types of reflexives: one with a pred value
and one without. The type with a pred value is predicational, as in
(8) (for a comprehensive discussion on anaphoric relations, including
reflexives, see Dalrymple 1993). The second type is found with inherent
reflexive verbs, such as the French and German verbs in (9). Here, the
reflexive has been incorporated into the lexical meaning of the verb so
that the reflexive itself does not perform a predicational function any-
more; instead, it merely functions as a morphosyntactic requirement of
the verb. This second type also occurs as the result of certain argu-
ment changing processes, such as the medio-passive in (10a) and the
intransitivization of certain causatives, as in (10b).
(8) a. She saw herself.
b. *She saw myself.
Nominal Elements / 83
perjure oneself; whether the reflexive in these verbs is an argument remains an open
question.
84 / A Grammar Writer’s Cookbook
sadj period
.
s
npsubj vp
proncl vpverb
il
cl v
donne
cl1 cl2 cly
y
cldat clacc
nous les
Nominal Elements / 87
0
(18) pred donner<subj,obj,obj2>0
stmt-type declarative
mood indicative
tns-asp
tense pres
0 0
pred pro
pron-form il
gend
masc
case
subj nom
pron-type pers
pers 3
num sg
0
pro0
pred
pron-form le
case acc
obj
pron-type pers
pers 3
num pl
0 0
pred pro
pron-form nous
case dat
obj2
pron-type pers
pers 1
num pl
0 0
pred à<obj>
psem loc
0 0
adjunct pred pro
pron-form y
obj
pcase
à
pron-type pers
count. This distinguishes them from mass nouns such as water, which
are given ntype mass. This distinction is syntactically necessary as
mass nouns can appear without a determiner and may occur in con-
structions such as a liter of water.
Proper names also display a syntactic behavior which differs from
that of count and mass nouns. They are therefore distinguished by
an ntype proper. Proper names normally cannot be modified with
determiners, adjectives, prepositional phrases, etc. A further distinction
made through the ntype feature is to set titles like Professor, Ms., or
Herr off from other nouns.
The c-structure treatment of nps differs considerably in the three
grammars as the internal structure of nps in German, English and
French is characterized by different syntactic properties.
4.2.1 English
The relative order of determiners, adjectives, pps and relative clauses
is fixed in English; so, the English grammar provides a heavily struc-
tured analysis. As can be seen in (19), an explicit level of attachment
is provided for pps and aps, compounding (nmod), determiners, and
postnominal modifiers (i.e., the relative clause). This strict hierarchical
structure in terms of np constituents like npap is motivated by coor-
dination, as each of these levels of attachments may be coordinated as
constituents (see Chapter 8). np
(19)
d npap cprel
the
numberp apattr npzero pp
nmod n which
tractors are red
three heavy on the
highway
forestry
4.2.2 German
The internal structure of the German np is as complex as that of its En-
glish counterpart; however, in keeping with the general flexible word or-
der of German, the prenominal elements within the np are not amenable
to the rigid analysis presented for English. The basic structure of the
np is similar to that of English: a determiner followed by modifiers,
Nominal Elements / 89
npcore
detp npap
npdet cprel
detp nppp
4.2.4 F-structure
Despite the differences in the analyses of nps at c-structure, the analy-
ses in terms of f-structures are very similar across the languages. This is
because the c-structures shown in the previous sections just represent
different ways of putting together essentially the same information, as
demonstrated by the f-structures in (24) and (25). These correspond to
the nps in (18) and (19) above. Not all the pieces of nps can be treated
in parallel. Structures which have no analog in another language, such
as certain types of appositions or participial prenominal modifiers in
German, are given an f-structure analysis which does not have a parallel
counterpart in English or French.
Note that despite the fact that the German Forsttraktoren ‘forestry
tractor’ appears as a single lexical item at c-structure, it is decomposed
into its components at f-structure and thus corresponds almost exactly
to its English counterpart (see section 4.3 on compounds). This decom-
position is done as part of the morphological analysis. Not all lexical
vs. syntactic compounds will have exact correspondences. In the struc-
tures below, for example, the English highway is not a compound, while
its German counterpart Auto-bahn ‘car track’ is.
92 / A Grammar Writer’s Cookbook
(24) a. the three heavy forestry tractors on the highway which are
red
b. pred 0
tractor0
pers 3
num pl
anim
−
ntype count
h i
spec spec-form the
spec-type def
0 0
pred forestry
pers 3
num sg
compound
−
anim
ntype mass
spec spec-type def
0 0
pred on<obj>
psem locative
ptype sem
pred 0 highway0
pcase on
pers 3
case acc
obj num sg
adjunct anim −
ntype count
h i
spec-form the
spec
spec-type def
0 0
h i
pred heavy
atype npmod
" #
0 0
pred three
atype cardinal
num pl
0
be<subj predlink>0
pred
stmt-type hdeclarative i
tns-asp tense pres
mood indicative
0 0
pred pro
pers 3
num pl
adjunct-rel subj
case
nom
−
anim
pron-type rel
pron-form which
topic-rel
h 0 0
i
pred red
predlink
atype predicative
Nominal Elements / 93
(25) a. die drei schweren Forsttraktoren auf der Autobahn, die rot
sind.
‘the three heavy forestry tractors on the highway which are
red’
b. pred
0
Traktor0
pers 3
gend masc
num pl
ntype count
h i
spec-form die
spec spec-type def
pred 0 Forst0
compound
0
pred auf<obj>0
psem locative
ptype sem
0 0
pred bahn
pcase auf
pers 3
gend
fem
case dat
num sg
obj
anim −
ntype count
adjunct h
i
spec-form die
spec
spec-type def
0 0
compound pred Auto
pred 0 schwer0
atype attributive
num pl
gend masc
" #
0 0
pred drei
atype cardinal
num pl
0
sein<subj predlink>0
pred
stmt-type declarative
h i
tns-asp tense pres
mood indicative
0 0
pred pro
pers 3
num pl
case
adjunct-rel subj nom
gend masc
pron-type rel
pron-form die
topic-rel
" #
0 0
pred red
predlink atype predicative
adegree positive
94 / A Grammar Writer’s Cookbook
npap
npzero
nmod n
filter
apattr n
oil
a
hydraulic
Nominal Elements / 95
0
b. pred filter0
ntype count
anim
−
3
pers
num sg
0 0
pred oil
ntype mass
h i
spec spec-type def
−
anim
compound
pers 3
num sg
( )
0
hydraulic0
pred
adjunct
atype attributive
(28) a. French: np
npdet
detp nppp
d npap pp
le
n p np
filtre à
npdet
nppp
npap
n ap
huile
a
hydraulique
b.
96 / A Grammar Writer’s Cookbook
0
filtre0
pred
ntype count
gender masc
pers
3
num sg
spec-form le
spec
spec-type def
pred 0 á<obj>0
0 0
pred huile
ntype count
gender fem
3
pers
adjunct
pcase à
obj num
sg
0
hydraulique0
pred
atype attributive
adjunct
gend fem
num
sg
Nominal Elements / 97
npcore
npap
n
Hydraulikölfilter
0
filter0
b. pred
0
Hydraulik0
pred
compound h i
compound pred 0 öl0
ntype count
case acc
gend masc
pers 3
num sg
While the rules necessary to treat most n-n sequences found in the
languages are in place, a problem remains. Almost any noun can be
turned into a “title”, as in grammar writer John, or a name, as in
the German Tankstelle Greifenberg ‘gas station Greifenberg’.5 Hand
coding each nominal lexical entry for precise information is unfeasible,
and loosening the c-structure rules to allow for new creations of n-n
sequences can lead to overgeneration. Thus, while the grammars can
parse most n-n sequences, the issue has not as yet been completely
resolved.
stituent as a topic.
npdet
detp nppp
d npap
un
ap
a
petit
Nominal Elements / 103
not as gerunds. That is, these are deverbal nouns, which are provided as nouns by
the morphology and as such receive no special treatment either morphologically or
syntactically. Whether their semantics warrants special treatment is not examined
here.
Nominal Elements / 105
npgerund vpcop
vp vcop ap
is
v np
driving
good
the tractor
5.1 Determiners
5.1.1 Types of Specifiers
Articles (determiners), quantifiers and prenominal genitives pattern
similarly in English in that they appear in the first position in an np:
they precede any modifiers, as in (1), and cannot be preceded by an-
other article or quantifier which modifies the noun, as in (2).
(1) The/a/every/Kim’s small dog barks.
(2) *The a/every/Kim’s dog barks.
The intuition that has guided most of the modern syntactic ap-
proaches to these constructions is that they serve to “specify” the head
noun rather than simply “modify” it.1
Although most specifiers are introduced through a category d or
detp (determiner phrase), the analysis of specifiers differs slightly at
the level of c-structure.2 Articles, quantifiers and prenominal genitives
are treated uniformly in all three grammars in that they are represented
under a spec feature in the f-structure, as in (3).
1 The special properties of these specifiers have prompted a reanalysis of the
traditional notion of nps (the one implemented here) in which the n is considered
to be the head of the np. The alternative dp-hypothesis (Abney 1987) maintains
that the specifiers are the heads of the constituents we used to think of as nps,
and that in keeping with the requirements of x0 theory, a dp should be posited.
We have not implemented the dp hypothesis within ParGram, though we do use a
determiner phrase (detp).
2 For example, the English grammar introduces genitive names such as John’s
through a specialized rule within the np, while the German grammar treats genitives
on a par with other determiners and generates them within a detp.
107
108 / A Grammar Writer’s Cookbook
gloss for determiners and adjectives. For the sake of greater perspicuity, we left
out information as to the strong/weak (s/w) declension in previous chapters, and
only gradually introduced information about case, number and gender, as it became
relevant in each chapter.
Determiners and Adjectives / 109
ment. With the prepositions de and à, the plural and masculine singular definite
determiners have special forms (des, du, aux, au).
110 / A Grammar Writer’s Cookbook
5.2 Adjectives
Adjectives are characterized by the fact that they modify nouns and,
in some languages including French and German, inflect for agreement.
English adjectives do not inflect to show gender or number agreement
with the head noun. However, it is possible to form comparative and
superlative forms of English adjectives, as in (9); so, in a sense adjectives
inflect even in the morphologically impoverished English.
(9) green/greener/greenest
Furthermore adjectives may be modified by a small set of adverbs
such as very. Some adjectives subcategorize for arguments (section
5.2.4), but the majority do not.
At c-structure, adjectives form aps (adjective phrases) which contain
the adjective and any adverbial modifiers, e.g., very red, trés rouge
(French), sehr rot (German). The ap also contains any np or clausal
arguments, and pp modifiers or arguments. In German, all of these
constituents may appear in a prenominal ap, as in (10). Note that
(10b) is an example of a deverbal adjective.
(10) a. Die ihrer Firma treue
the.F.Sg.Nom.S her.F.Sg.Dat.S company loyal.F.Sg.Nom.W
Frau lacht.
woman laughs
‘The woman loyal to her company laughs.’ (German)
b. Die im Garten den
the.F.Sg.Nom.S in.the.M.Sg.Dat.W garden the.M.Sg.Acc.S
Tee schnell trinkende Frau lacht.
tea quickly drinking.F.Sg.Nom.W woman laughs
‘The woman drinking tea quickly in the garden laughs.’ (Ger-
man)
These German constructions must be analyzed as instances of aps,
and not, for example, the equivalent of English reduced relative clauses
because the adjectives treue ‘loyal’ and trinkende ‘drinking’ inflect to
agree with the head noun and otherwise pattern like adjectives.
At f-structure, adjectives are analyzed as adjuncts when encoun-
tered prenominally (section 5.2.1), and as predlinks when used pred-
icatively (section 5.2.3). All adjectives are marked with an atype at
f-structure in order to distinguish the various uses and types of adjec-
tives.
112 / A Grammar Writer’s Cookbook
Cardinals, on the other hand, do not inflect, but do require the noun
to be plural (unless the cardinal is one), and exhibit a slightly different
syntactic pattern at c-structure, as shown by the contrast given in (19).
(19) a. The three brown dogs bark.
b. The *brown three dogs bark.
Cardinals are thus also introduced by a special rule at c-structure
(numberp), and are distinguished at f-structure by being assigned
atype cardinal, as in (20).
(20) a. the three dogs
0
dog0
b. pred
pers
3
num pl
case nom
anim
+
ntype count
" #
spec spec-form the
spec-type def
0
three0
pred
adjunct
atype cardinal
num
pl
116 / A Grammar Writer’s Cookbook
icative (uninflected) form can serve as adverbs in German. See Chapter 7 for dis-
Determiners and Adjectives / 117
5.2.2.2 French
French postnominal adjectives are basically equivalent to their prenom-
inal counterparts, though some adjectives must be interpreted slightly
differently in accordance with the position they appear in (see Nolke
cussion.
118 / A Grammar Writer’s Cookbook
npdet
detp nppp
d npap
le
ap n ap
témoin
a a
grand rouge
Determiners and Adjectives / 119
0
b. pred témoin0
gend masc
pers
3
num sg
" #
spec spec-type def
spec-form le
0 0
pred grand
atype attributive
gend masc
num sg
adjunct
0 0
pred rouge
atype attributive
gend masc
num
sg
h i
c. apos pre
i
non-dep h
apos post
lowing for the fact that an adjective can require an obj, obl, xcomp
or comp at the level of f-structure, as in (30).7
(30) a. The driver is proud of the tractor. (obl argument)
b.
0 0
pred be<subj,predlink>
stmt-type declarative
tense pres
tns-asp
mood indicative
0 0
pred driver
3
pers
num
sg
case
nom
subj
anim
+
ntype count
spec-form the
spec
spec-type def
0 0
pred proud<obl>
atype predicative
0
of< obj>0
pred
ptype sem
0 0
pred tractor
3
pers
num
sg
predlink case
acc
obl
pcase of
obj
−
anim
ntype count
spec-form the
spec
spec-type def
based on data extraction from very large corpora. Similarly, the subcategorization
frames of deverbal adjectives are derived from the subcategorization frames of the
verbs, which were also produced semi-automatically from data extraction over large
corpora and various other resources (see section 14.1.1 for some more discussion).
7 The grammatical function predlink in (31) indicates that the construction
is predicational and that the material in the predlink is being predicated of the
subject of the sentence (section 3.8). Other lfg analyses of adjectives (e.g., Bresnan
1982b) assume that adjectives also subcategorize for a subject (i.e., tractor in red
tractor).
122 / A Grammar Writer’s Cookbook
The c-structure adjective rules are augmented to allow for the possi-
bility of an np (German only), pp, vp or cp either following or preceding
(German only) an adjective as its argument.
5.2.4.2 Extraction
As mentioned previously, one of the possible arguments of an adjective
is a comp. As also discussed previously in section 3.3.5, a comp can
be either a nonfinite vp as in (31), or a that-clause as in (32). As an
example, the f-structure for (31b) is shown in (33).
(31) a. It is important to laugh.
b. It is important that the dogs bark.
(32) a. It is important that the dogs bark.
b. pred 0
be<predlink>subj0
stmt-type hdeclarative i
tns-asp tense pres
mood indicative
pron-type expletive
pron-form it
pers 3
subj num sg
case nom
gend neut
anim −
pred 0 important<comp>0
atype predicative
0
bark<subj>0
pred
stmt-type declarative
h
tense pres
i
tns-asp
mood indicative
comp-form that
predlink 0
dog0
pred
comp pers 3
num pl
case nom
subj
anim +
ntype count
i
h
spec-form
the
spec
spec-type def
In all three languages, these comps can be preposed and thus ex-
tracted from the local subcategorization domain of the adjective, as
shown for English in (33).
(33) a. To laugh is important.
b. That the dogs bark is important.
Determiners and Adjectives / 123
The adjectives which subcategorize for comps and allow the extrac-
tion in (34) are specially marked in the lexicon. In the nonextraction
cases, the expletive it (es in German, il in French) is treated as a sub-
ject subj of the copula be (see section 3.8 for a more detailed discussion
of copula constructions).
(34) a. That the dog barks is important.
0
b. pred be<subj,predlink>0
stmt-type declarative
tense pres
tns-asp
mood indicative
0 0
pred bark<subj>
pers 3
num sg
case nom
stmt-type declarative
tense pres
tns-asp
mood indicative
comp-form that
subj
0 0
pred dog
pers
3
num
sg
case
nom
subj
+
anim
ntype count
spec-form the
spec
spec-type def
0
important<comp>0
pred
predlink atype predicative
h i
comp
analysis as well. Note that pers and num features of the extraposed
clause must be written since that-clauses are not inherently marked for
person and number; if these features were not provided, improper verb
forms such as in *That the dog barks are important could not be ruled
out.
5.2.5 Degrees of Comparison
Adjectives in all three of these languages have morphological markers
which allow the formation of comparatives or superlatives from a base
form.
(35) a. a heavier truck (comparative)
b. the heaviest truck (superlative)
In addition, the base form of the adjective can occur in equatives and
with periphrastic comparative constructions.
(36) a. That dog is as heavy as a truck. (equative)
b. more pleasant (periphrastic comparative)
c. le plus rouge (periphrastic superlative)
the more red
‘reddest’ (French)
Although these differ in terms of how they are realized at c-structure,
the f-structures corresponding to both the periphrastic and the mor-
phological comparatives receive the same analysis (see section 5.2.5.1
for sample f-structures).
The degree of comparison and the positive or negative force (e.g., the
difference between more pleasant and less pleasant) are represented at
f-structure in terms of the features adegree and adeg-type, respec-
tively.
In terms of grammatical functions at f-structure, we analyze com-
paratives as consisting of two main parts: the degree (comparative,
equative, or superlative) of the adjective (e.g., heavier) and the further
(often optional) comparative phrase it can license (e.g., than a truck).
The degree adjective is treated as an adjunct, like all other adjectives.
However, it is treated as a special kind of adjunct which has special
syntactic properties, and is therefore encoded as an adjunct-comp at
f-structure. This indicates that it is an adjunct, but that it specifies a
degree of comparison. The comparative phrase (e.g., than a truck) is
analyzed as an oblique argument of the adjective. Again, in order to
indicate that a degree of comparison is indicated, a special grammatical
function obj-comp is posited for comparative phrases. The intuition
behind this is that the than-phrases appear to function somewhat like
Determiners and Adjectives / 125
pps which are subcategorized for by a predicate, and which have tra-
ditionally been analyzed as obliques (obl) at f-structure (cf. Bresnan
1982a). The lexical item than (que in French, als in German) is treated
as a special type of conjunction since it may head complete clauses, as
in (38).
In addition to prenominal constructions as in (35), comparatives can
also appear postnominally, as in (37), and predicatively, as in (38).
Postnominal comparatives are analyzed at f-structure strictly in paral-
lel to prenominal comparatives, i.e., an adjunct-comp is introduced
as a modifier of the head noun. Predicatives, on the other hand, receive
a differing analysis, discussed in section 5.2.5.2.
(37) A dog heavier than a tractor barked.
(38) She is taller than I am.
Note that at c-structure the comparative adjectives occupy exactly
the same position as simple adjectives. The than-phrase, however, is
introduced through a special c-structure node conjpcomp in all three
grammars. Additionally, in some degree constructions, as in (39), the
adjective phrase forms a constituent with the than-phrase. This con-
stituent is introduced at c-structure via a specialized rule called apcomp.
(39) a. It is [more comfortable than a tractor].
b. apcomp
ap conjpcomp
advcomp a conjcomp np
more comfortable than
a tractor
5.2.5.1 Comparatives
A representative f-structure for the German np in (43a) is given in
(43b). There is no than-phrase and the comparative adjective is encoded
under the adjunct-comp within the subj.
(43) a. eine schnellere Katze
a.F.Sg.Nom.S quick.Comp.F.Sg.Nom.W cat
‘a quicker cat’ (German)
0
b. pred katze0
pers
3
num sg
case nom
gend fem
ntype count
spec-form ein
spec
spec-type indef
0 0
pred schnell
adegree
comparative
adeg-type positive
adjunct-comp num
sg
case
nom
gend fem
comparative conjunction.8
(44) a. Eine schnellere Katze als
a.F.Sg.Nom.S quick.Comp.F.Sg.Nom.W cat than
der Hund erscheint.
the.M.Sg.Nom.S dog appear.3.Sg.Pres
‘A quicker cat than the dog appears.’ (German)
b.
0
erscheinen<subj>0
pred
h i
tns-asp tense pres
mood indicative
stmt-type declarative
vsem unacc
0
katze0
pred
pers 3
num sg
case nom
gend fem
ntype count
h
spec-form ein
i
spec
spec-type indef
0
schnell<obl-comp>0
pred
adegree comparative
adeg-type positive
subj num sg
case nom
gend fem
pred 0
hund0
adj-comp conj-form-comp als
pers 3
num
sg
case nom
obl-comp
gend masc
ntype count
h i
spec-form der
spec
spec-type def
5.2.5.2 Predicatives
In predicative constructions as in (45) the degree adjective is not treated
as an adjective which modifies a head noun. Rather, it is seen as an
argument of the copula be, a predlink (section 3.8). Other than this
difference, the analysis of degree adjectives parallels that of the prenom-
inal cases above: the degree adjective may introduce an argument (obl-
comp), which corresponds to the than-clause. An example, which also
incidentally illustrates a periphrastic comparative, is given below.
8 In the interest of space, adjunct-comp has been abbreviated to adj-comp in
(44b).
128 / A Grammar Writer’s Cookbook
5.2.5.3 Equatives
Equatives compare two entities and indicate that they are equivalent
in terms of one of their properties. Unlike comparatives, equatives do
not appear without a comparative phrase (e.g., as big as a tractor).9
This requirement is ensured by a constraint in the lexical entry of the
equative that requires the existence of a comparative phrase. Equative
adjectival constructions also do not appear prenominally, but surface
either postnominally as in (46), or predicatively as in (47).
(46) A dog as heavy as a truck barked.
Other than a difference in the adegree and the adeg-type (which is
not specified for equatives), the f-structures for comparatives and equa-
tives do not differ in these constructions. An example of a predicative
construction is illustrated below.
9 In both French and German one may find colloquial examples such as Moi,
je ne suis pas aussi libre ‘Me, I’m not as free’, which have an equative but no
comparative (thanks to Anette Frank for pointing these out). However, in these
cases, the equative may also be analyzed as an adverb without comparative function.
Determiners and Adjectives / 129
5.2.5.4 Superlatives
Superlatives indicate that a given entity is the extreme and unique
instantiation of that kind. In these constructions it is thus not the case
that two entities are being compared, but rather that one entity is being
singled out as special. As such, superlatives cannot occur with than-
clauses. This generalization is ensured by a combination of constraints
in the c-structure rules, and in the lexical entries of the superlatives.
(48) a. The best driver owns the heaviest tractor.
b. a better/*best driver than the owner
In the f-structure, superlatives are treated just as comparatives and
equatives, with the exception that they never subcategorize for an obl-
comp (i.e., a than- or as-phrase), and that the values for adegree
differ. The f-structure for (48a) is given in (49b).
130 / A Grammar Writer’s Cookbook
Prepositional Phrases
131
132 / A Grammar Writer’s Cookbook
The psem value is additionally used for tasks such as determining which
phrases can occur in English locative inversion constructions, what type
of adjunct a pp is, and whether the subcategorization requirements of
verbs like put have been met.
such, these pps are treated as arguments of the verb and are encoded
with a ptype nosem to mark the difference from the semantic adjunct
pps discussed in the previous section. An alternation from German,
which illustrates the contrast between the semantic and nonsemantic
prepositional use of one and the same preposition, is shown in (5).
(5) a. Der Fahrer wartet auf das Buch.
the.M.Sg.Nom.S driver waits on the.N.Sg.Acc.W book
‘The driver is waiting for the book.’ (German)
b. Der Fahrer wartet auf dem Traktor.
the.M.Sg.Nom.S driver waits on the.M.Sg.Dat.S tractor
‘The driver is waiting on (top of) the tractor.’ (German)
Note that again a difference in object casemarking accompanies the
difference in meaning. In (5b) the preposition has a clear semantic force
and indicates a locative adjunct to the verb. In (5a), on the other hand,
this particular usage of warten requires a nonsemantic (i.e., nonloca-
tional and nondirectional) usage of on. In this case, the pp auf das Buch
is analyzed as an argument of the verb.
One reason for positing this analysis is that the np can be passivized
out of the pp, as is illustrated for English in (6) and (7), and for German
in (8).
(6) a. The driver must comply with these regulations.
b. These regulations must be complied with.
(7) a. He relies on this book.
b. This book is (often) relied on.
(9) a.
He relies on this book.
0
b. pred rely<subj,obj>0
stmt-type declarative
mood indicative
tns-asp
tense pres
0 0
pred pro
anim
+
pron-form he
gend
masc
subj
case nom
pron-type pers
pers 3
num sg
0
pred book0
case acc
pcase on
ntype count
spec-type def
obj
spec deixis
proximal
spec-form this
−
anim
3
pers
num sg
tions. As such, the treatment of these constructions does not differ from
the pps described above, except that the prepositions take an interrog-
ative or relative prounoun as a complement, rather than a full np.
p np p
vom aus
n
Fahrersitz
The second p daughter is restricted to co-occur with the first p daugh-
ter, reflecting that German does not allow postpositions, but does allow
circumpositions. The first p daughter is always a semantic preposition
and is analyzed in the same way as the semantic prepositions discussed
above. The second preposition may only occur when appropriately li-
censed. The herum in (13c), for example, may only co-occur with um
in German. This restriction is enforced by checking for um via the
pcase feature. The second preposition (i.e., herum) is not encoded in
terms of a pcase, but its form and type are encoded in the f-structure
within separate features, leaving the precise semantic analysis of these
circumpositions to a further semantic module.
7
Adverbial Elements
7.1 Adverbs
Adverbs can be loosely divided into three types: adverbs which modify
adjectives and other adverbs, as in (1a), adverbs which modify vps, as
in (1b), adverbs which modify clauses, as in (1c).
(1) a. der [sehr] graue Hund
the.M.Sg.Nom.S very grey.M.Sg.Nom.W dog
‘the very grey dog’ (German)
b. Elle l’ a fait [doucement].
she it has done gently
‘She did it gently.’ (French)
c. She [usually] drives home.
139
140 / A Grammar Writer’s Cookbook
apbase
a
schnell
The rule for advp requires that the adjectival form be uninflected,
and that it be assigned an adv-type feature so that it can be identified
as an adverb at the level of the f-structure analysis.
Note that as in English and French there is a closed class of items
such as sehr ‘very’ which do not lead such a double life. They are
simply treated as being of the c-structure type adv, as shown in (3),
and receive the appropriate adv-type feature in the lexicon.
(3) advp
adv
sehr
Adverbial Elements / 141
7.4 Negation
Negation particles such as the English not, German nicht, or French ne
. . . pas are usually analyzed as a subtype of adverb because negation
142 / A Grammar Writer’s Cookbook
Coordination
proposal on how to deal with nonconstituent coordination within lfg, see Maxwell
and Manning 1996. For a more general discussion see Sag, Gazdar, Wasow and
Weisler 1985.
2 See Kehler, Dalrymple, Lamping and Saraswat 1995 on the topic of information
flow.
145
146 / A Grammar Writer’s Cookbook
sure that the conjunct as a whole will be marked for case. But as each
of the elements of the conjunct have different values for case, the gen-
eralization of the case attribute is empty: the case attribute could not
be generalized over the whole conjunct since each of the elements has
a different value for it. However, if (↑case) is distributed across the
set elements, then it succeeds for each element. As languages do allow
the coordination of elements whose feature values differ, the result of
distribution is linguistically preferred to that of generalization.
Another situation which illustrates the difference is when a negative
constraint such as (↑passive)6=+ is asserted of a set where some ele-
ments have passive + and others do not. Since some elements satisfy
the condition that there be no passive +, generalization over the nega-
tive constraint succeeds and the coordination is judged to be successful.
However, under distribution, the requirement that (↑passive)6=+ will
be distributed over each element of the set. In this case, the coordi-
nation is illformed, since it fails for any element that has passive +.
Once again, the result of distribution is linguistically preferred to that
of generalization.
However, while the distribution mechanism captures many of the co-
ordination facts, it cannot account for all of them. In some cases, the
conjunct as a whole will be characterized by a certain feature like num
pl, while each of its elements is actually singular. Following Dalrymple
and Kaplan 1997, attributes may therefore be specified as nondistribu-
tive. A nondistributive attribute can be asserted of a coordination set
as a whole, without having it distribute across the individual conjuncts.
This is particularly useful for np coordination, where the number and
person attributes of each conjunct usually differ from those of the set as
a whole (see 8.3 below). Another example is with the conj-form pro-
vided by the conjunction. There is a potential conflict in conj-form
in cases of same category coordination where different conjunctions are
involved, as in (3).
(3) The light flashes and the beacon either turns to the right or flashes
repeatedly.
The conjuncts and the conjunctions in (3) jointly head the whole
coordination. Since each conjunction provides a conj-form, there is a
clash between the values provided by or and and, and the coordination
fails. This problem is solved by defining conj-form as a nondistribu-
tive attribute, thus avoiding the clash.
148 / A Grammar Writer’s Cookbook
unlike periods, exclamation marks, or question marks which are also parsed within
the ParGram grammars and are used to check for the stmt-type attribute at f-
structure (declarative vs. interrogative).
Coordination / 149
not allow a comma before the conjunction unless there are more than
two conjuncts, as in (7).
(7) a. in the tractor, on the trailer, and next to the barn
b. *in the tractor, and on the trailer
However, commas are often placed between full clauses even when there
are only two conjuncts, as in (8).
(8) The tractor started immediately, and the farmer drove off.
A simple way to deal with this situation is to have clauses call a variant
of the usual same category coordination rule, which allows for a comma
with only two conjuncts.
The second issue concerning coordination of clauses has to do with
cases where only punctuation, usually a semicolon, separates two com-
plete sentences, as in (9).
(9) a. The tractor started immediately; the engine was running
smoothly.
b. Lorsque l’on tourne le commutateur, les voyants
when one turns the switch the warning lights
s’allument; ils s’éteignent lorsque le moteur démarre.
light up they turn off when the motor starts
‘When the switch is turned, the signals light up; they turn off
when the motor starts.’ (French)
There are a couple of approaches to this problem. One is to have a
special coordination rule for the highest category under root, which
allows certain types of punctuation in place of a conjunction. Another
is to consider each of the two sentences as a separate root clause, one
of which ends in nonstandard punctuation.
8.3 NP Coordination
np coordination often involves number, person, and gender mismatches
between the individual conjuncts and the entire coordinated np. In
(10a), each conjunct is singular, but the result is a plural np, as is
evident from the plural verb agreement. In (10b), one conjunct is fem-
inine and the other masculine, but the coordinated np is masculine, as
is clear from the masculine morphology on the adjective.
(10) a. The tractor and the trailer are parked outside.
b. Jean et Marie sont gentils.
Jean.M.Sg and Marie.F.Sg are.Pl nice.M.Pl
‘Jean and Marie are nice.’ (French)
150 / A Grammar Writer’s Cookbook
8.4 Problems
Although this approach works well, there are some remaining problems.
Sometimes the conjunction and does not produce a plural coordination,
as in (16).
(16) This writer and artist has produced many important works.
Coordination / 153
In (16), writer and artist refer to the same person, so that the con-
junct remains singular, as shown by the verb agreement. At present we
do not deal with this type of np coordination.
Since conjuncts map into a set, the linear order in which they appear
is lost in the f-structure. Order information can be important for verb
agreement and for the temporal sequencing of events. For example, in
English, the lights or the beacon can be either singular or plural since
the second conjunct is singular; however, the beacon or the lights can
only be plural. The ordering information is preserved in the c-structure,
from where it could be recovered. An alternative solution, and the one
adopted in later versions of the grammar, is to map conjuncts into an
ordered list instead of a set.
9
Special Constructions
9.1 Parentheticals
Typical parentheticals are illustrated in (1). They are set off from the
main part of the clause by parentheses or other types of punctuation.
(1) a. This button (2) is for the oil filter.
b. Contrôler la tension (voir page 2).
check the tension see page 2
‘Check the tension (see page 2).’ (French)
c. Kühler auf Blockierungen überprüfen (siehe Seite 42).
radiator core for blockage check see page 42
‘Check the radiator core for blockage (see page 42).’ (German)
d. A warning light (red) will come on after six seconds.
e. A warning light—red or green—will come on.
Parentheticals are introduced by a special c-structure rule which in-
cludes the required punctuation and allows a limited number of con-
stituents within it, e.g., nps, imperatives, aps. The parenp constituent
appears in selected c-structure positions depending on the language.
Ideally, almost any constituent can be followed by a parenthetical; how-
ever, in practice this allows for extensive ambiguity, and so the paren-
theticals appear only in select positions, as dictated by the corpus at
hand.
The parenp constituent corresponds to an adjunct-paren feature
155
156 / A Grammar Writer’s Cookbook
d npap
a
npzero
nmod n parenp
light
n left-paren ap right-paren
warning ( )
a
red
0
light0
b. pred
ntype count
" #
spec spec-type indef
spec-form a
anim
−
3
pers
num sg
0 0
pred warning
ntype mass
h i
spec
spec-type def
compound
−
anim
3
pers
num sg
" #
0
adjunct-paren pred red0
atype predicative
Special Constructions / 157
9.2 Headers
Headers are the nominal and clausal elements that appear as section
headers in documents, newspapers, chapter titles, etc. Some examples
from our tractor manual corpus are given in (3).
(3) a. Hydraulic quadrant control lever
b. Voyant de filtre à air sec
warning.light of filter for air dry
‘Dry air filter warning light’ (French)
c. Kontrolleuchte Ladekontrolle
warning.light charge.control
‘Warning light for charge control’ (German)
Headers appear as a special c-structure category, header, directly
under root. As a root category, they are assigned a stmt-type header.
In general, headers are types of nps, although in certain genres, e.g.,
newspaper headlines, this would need to be expanded. Unlike regular
nps, headers do not need to have determiners, as seen in (3); as such,
the header rule supplies spec features to the f-structure in order to
satisfy the requirements of nps which would normally appear with de-
terminers. The c- and f-structures for (3a) are shown in (4).
(4) a. root
header
np
npap
apattr npzero
a nmod n
hydraulic lever
nmod n
control
n
quadrant
158 / A Grammar Writer’s Cookbook
0
b. pred lever0
stmt-type header
ntype count
h i
spec
spec-type def
anim
−
pers
3
num sg
(
0 0
)
pred hydraulic
adjunct
atype attributive
0 0
pred control
ntype count
h i
spec spec-type def
anim −
pers 3
num sg
compound
0
quadrant0
pred
ntype count
h i
spec spec-type def
compound
anim −
3
pers
num sg
vpaux
the light
160 / A Grammar Writer’s Cookbook
Grammar Engineering
10
Overview
This part of the book discusses issues that arise with respect to the
engineering aspects of grammar development. In the first part, we pre-
sented analyses of constructions in English, French, and German. In
this part, we focus on issues such as the maintainability of grammars,
how to achieve robustness in parsing while at the same time avoiding
overgeneration, and how a grammar’s performance may be measured
and improved. These issues are at the heart of much discussion in com-
putational linguistics today and are far from resolved. The material
in the following chapters is meant to make a contribution to the dis-
cussion by reporting on some of the ideas, solutions, and experiments
conducted within our project.
Chapter 11 first recapitulates the architecture of the parser and de-
scribes the individual components. Chapter 12 then describes some of
the finite state tools we used in more detail. Note that we do not dis-
cuss implementation issues with regard to the Xerox Linguistic Envi-
ronment (xle). As grammar writers, we saw xle as a black box whose
design and implementation we could not influence directly. We did,
however, influence its development indirectly by reporting on various
needs such as better debugging tools, a better method for integrating
lexical entries with one another, and rule notation that would allow
easier maintenance and modularization of the grammar. This interac-
tion has resulted in a platform that caters to many of our needs, and
which continues to grow and improve even as this book goes to press.
We describe some of the relevant features and functions in the chapter
on modularity and maintainability (Chapter 13), and in the chapter on
robustness and measuring performance (Chapter 14).
163
11
This chapter discusses the architecture and user interface of the Xe-
rox Linguistic Environment (xle). The architecture of the parser is as
shown in (1) (repeated from Chapter 1).
(1) Morph Other
Input Tokenizer Transducers
Analyzer
String
Other
Morph
Tokenization
char fsm Analysis fsm Transducers
fsm
Unification
Lexical
Chart Chart decorated
Look-up Initial
Parser with constraints Complete
Chart Analysis
Graphical
User
Interface
LFG
Lexicon Rules
We discuss each of the components in turn, beginning with the tok-
enizer, and ending with a brief description of a generation component
and a fledgling machine translation component. The generation com-
ponent basically reverses the parsing process shown in (1) and the ma-
chine translation component calls up two such grammars: it works on
the output from one grammar and sends its result to another grammar
for generation.
165
166 / A Grammar Writer’s Cookbook
Once xle is called, it loads the platform and then waits for the user
to do something. In this case, the user has typed “create-parser en-
glish.lfg”. This command enables the loading of the English grammar
defined in the file “english.lfg”. xle reports how many rules, states,
arcs and disjuncts the grammar has and then proceeds to load a cas-
cade of finite-state modules (see Chapter 12). In this case there are
three such modules, which accomplish the tokenization and the mor-
Architecture and User Interface / 167
phological analysis of the input string. Finally, the rules are loaded,
xle reports success and is ready to parse a sentence or constituent.
the f-structure a given c-structure node maps to. The “Options” button
allows the user to toggle the display of the numbers since a tree without
number annotations is sometimes easier to read.
state techniques for several languages: Arabic, Czech, Danish, Dutch, English,
Finnish, French, German, Hungarian, Italian, Norwegian, Portuguese, Spanish, and
Swedish.
172 / A Grammar Writer’s Cookbook
tations are used: one level containing the surface form of a word and
the other containing the base form (canonical dictionary form) with
an attached list of morphological features. The list of morphological
features depends on the language. Examples are shown below in (3a,
b) for English, (3c, d) for French, and (3e) for German.
(3) a. warning
1. warn+Verb+Prog
2. warning+Adj
3. warning+Noun+Sg
b. which
1. which+Pron+Rel+NomObl+3P+SP
2. which+Pron+Wh+NomObl+3P+SP
3. which+Det+Wh+SP
c. clignotant
1. clignoter+PrPrt+Verb
2. clignotant+Masc+Sg+Noun
3. clignotant+Masc+Sg+Adj
d. que
1. qui+Acc+InvGen+InvPL+Rel+Int+Pro
2. que+ConjQue
e. den
1. der+Det+Art+Sg+Masc+Akk+St
2. der+Det+Art+Pl+FMN+Dat+St
3. der+Pron+Dem+Sg+Masc+Akk
4. der+Pron+Rel+Sg+Masc+Akk
As seen in the above examples, the Xerox two-level morphologies
provide more information (e.g., that the pronoun is a demonstrative
one) than what linguists usually expect from a morphological compo-
nent. However, most of the information provided by the morphological
analyzers is extremely useful for the grammar and has been integrated
within the ParGram development effort.
11.3.2.2 Interfacing Morphology with Syntax
The finite-state morphological analyzers are interfaced with xle by
means of sublexical rules (see Kaplan and Newman 1997), which parse
the output of the morphology. For instance clignotant+Masc+Sg+Noun
is considered to be a phrase which consists of the stem and each of the
morphological tags.
(4) the base form: clignotant
the finite-state symbols: +Masc, +Sg, +Noun
Architecture and User Interface / 173
Each item is listed and identified in the lexicon, just like any other
lexical item, i.e., it has the item, its category, the xle tag, and then
any relevant equations or template calls. Sample entries for the tags in
(4) are shown in (5): +Masc is parsed as a gender tag which is inter-
preted as contributing the functional information gend masc to the
f-structure of the stem it appears on, +Sg is a number tag and assigns
num sg to either the f-structure itself or the subject’s f-structure, and
+Noun is a word class tag which is interpreted as assigning pers 3.
The reasoning for the latter assignment is that any full noun could not
be first or second person, but would necessarily be analyzed as third
person. Finally, clignotant is parsed as a terminal (lexical) entry of the
type noun (n). The template @(noun clignotant) assigns clignotant
as the value of a pred feature (among other things).
(5) +Masc gend tag (↑gend)= masc.
+Sg nbr tag (↑num)=sg;
vnbr xle (↑subj num)=sg.
+Noun ntag tag (↑pers)=3.
clignotant n xle @(noun clignotant).
Then, by using sublexical rules which function in the same way as
the usual phrase structure rules in lfg, the grammar is able to parse
the output of the morphological analyzer. A rule which parses this
particular order, which is characteristic of nouns in French, is shown in
(6).2
(6) n −→ n base gend base nbr base ntag base.
The fragmentary f-structure that results from this parse is shown in
(7).
(7) pred 0 clignotant0
gend masc
num sg
pers 3
As should be evident from (7) and the discussion, the integration of
the finite-state morphological analyzers with an lfg parser turned out
to be quite easy and natural.
Furthermore, if the morphological analysis should produce a tag that
is irrelevant (or even wrong due to a mistake in the morphological
analyzer), this can be counteracted via an appropriate formulation of
2 Note that base is added to each sublexical item by xle so that morphological
and syntactic categories can be distinguished. Otherwise, the rule would be n −→n
gend nbr ntag, implying a unintended recursion.
174 / A Grammar Writer’s Cookbook
the sublexical rule: the offending tag will be parsed by the rule, but will
not provide any functional information to be passed to the f-structure.
On the other hand, there are cases where the information encoded
by the finite-state morphology needs to be enriched. That is, since
the finite-state morphological analyzers are meant to be as general as
possible, they sometimes lack the information necessary for a specific
purpose, e.g., lfg grammar writing in our case. This extra information
is provided directly by the grammar writers via annotations and tem-
plate calls in the lexical entry. This additional information then works
together with the information provided by the tags.
The word slow in English can be used as an illustration. The English
finite-state morphology analyzes slow as an adjective and a verb, as
seen in (8).
(8) slow
1. slow+Adj
2. slow+Verb+Pres+Non3Sg
However, in the tractor manual we were working on slow is also used
as a noun in, for example, To operate the speedshift, press either button
to move from fast to slow. As such, the ParGram lexical entry for slow is
as in (9). The adjectival and verbal entries include the annotation xle.
This encodes the fact that these stems are relying on a morphological
analysis from the finite-state morphological analyzers which have been
integrated into xle. The nominal entry, however, is annotated with
an * to encode the fact that here no help from the morphology can
be expected and that all of the information must come from this one
entry.
(9) slow a xle @(adj slow);
v xle @(trans slow);
n * @(noun slow).
As can be gleaned from this discussion, the interaction between lex-
ical items is quite complex within xle because stems may be coming
out of the morphological analyzer, be user-defined, and additionally
may span several different word classes, as in the case of slow. A more
in-depth discussion of the lexicon and the structure of lexical entries
follows below in 11.4. However, before we proceed to that discussion, we
first introduce another method of introducing stems into the grammar.
11.3.2.3 Extending Capabilities via Unknowns
The morphological analyzers can be further exploited in order to parse
words which have not explicitly been entered in the lexicon. Words un-
known to the lexicon, i.e., words which lack an explicit lexical entry,
Architecture and User Interface / 175
are referred to as unknowns. From the point of view of xle, two types
of unknowns are distinguished: unknowns which the morphological an-
alyzer knows about but which have not been encoded in the lexicon
files; unknowns which the morphology does not recognize and which
have not been encoded in the lexicon files either. An example of this
latter type are proper names, often foreign ones, which no reasonable
morphological analyzer can be expected to recognize.
By way of the morphological analyzer, the grammar can deal with
both types of unknowns by means of a series of educated guesses. This
allows the grammar to process many more words than are encoded in
the lexicons, whether they be hand entered, or automatically generated
(section 14.1.1).
Items which are unknown to the grammar (i.e., to the lexicons), but
which are encoded in the morphology may be dealt with by matching
the information from the morphological analyzer with a type of generic
entry in the lexicon. An example of this kind of generic entry is shown in
(10) which is an entry for an -Lunknown item (unknown to the lexicon).
(10) -Lunknown
a xle @(adj %stem);
adv xle @(adverb %stem)
@notadjadvmod;
n xle @(noun %stem)
(↑ntype)=common;
number xle (↑pred)=’%stem’.
What this entry basically says is the following: try to match the out-
put of the morphological analyzer to one of the following: an adjective,
an adverb, a noun, or a number.3 Closed class items such as auxiliaries
or prepositions are not matched via this method. It is assumed that
the grammar writer is responsible for taking care of closed class items,
but not for encoding every member of an open class.
For example, consider the situation where there is no entry for red
in the lexicon. The morphology, however, knows that red can have the
two parses in (11).
(11) red +Adj
red +Noun +Sg
The grammar can guess at a parse for red based on the generic un-
known entry, which contains blueprints for both an adjective and a
3 There is a separate entry for capitalized forms, -LUnknown. In English this has
the same possibilities (a, adv, n, number) as the lowercase -Lunknown, except that
it allows for proper nouns via name and titles via title; that is, proper nouns and
titles can only be upper cased and so there is no entry for them under -Lunknown.
176 / A Grammar Writer’s Cookbook
On the other hand, if a particular entry is the only possible entry for
a given item, then the only feature is added to the lexical entry.
In addition to the etc and only notation, which apply to whole
entries, there are four operators that allow the grammar writer to ma-
nipulate subentries. These are ‘+’, ‘−’ , ‘!’, and ‘=’. The operators
are placed in front of a subentry, as shown in (18). The ‘+’ adds a
new subentry. The ‘!’ replaces an existing subentry. The ‘=’ retains an
earlier subentry, while the ‘−’ deletes it.
(18) a. foo !n xle @(noun foo);
−p
+a xle @(adj foo);
etc.
b. foo =p;
+a xle @(adj foo);
only.
Note that, as shown in (18), both etc and only can be combined
with each of the four subentry operators to achieve different effects. For
example, placing etc as the final subentry in a later lexicon results in
retaining all previous subentries, unless they are explicitly removed
with the ‘−’ operator. Placing only as the final subentry in a later
lexicon will remove all earlier subentries unless they are explicitly re-
tained with the ‘=’ operator. For example, consider (18a, b) as possible
entries for ‘foo’ in a later lexicon (the earlier lexicon entry continues to
be (12)). Note that if one subentry has an operator, all of them must
be preceded by an operator.
With (18a) in the later lexicon, the effective entry for ‘foo’ will be
(19a). With (18b) in the later lexicon, it will be (19b). In (19a), the
n entry has been replaced by the one in the later lexicon, the p entry
has been deleted, and a new entry has been added. In (18b), only the p
entry from the earlier lexicon has been retained, while the a entry has
been added:
(19) a. foo n xle @(noun foo);
v xle @(verb foo);
a xle @(adj foo).
b. foo p xle @(prep foo);
a xle @(adj foo).
These tools for manipulating lexical entries are extremely useful when
several lexicons are being maintained for different purposes. For exam-
ple, they allow modification of the effective entries of a core lexicon
180 / A Grammar Writer’s Cookbook
issue, can also be found in Dalrymple, Kaplan, Maxwell and Zaenen 1995.
Architecture and User Interface / 181
ber of features to aid in this process, such as the use of macros and
templates. The result of applying the transfer rules to the f-structure
is a new f-structure. This new f-structure is then used as input to the
generator of the target language, producing the desired translations.
12
Finite-State Technology
12.1 Preprocessing
Certain processing difficulties can be resolved at an early stage, before
attempting a parse with a full grammar. Preprocessing can greatly
simplify the task of the parser with respect to multiword expressions,
or other parts of the grammar which are assembled according to a
certain pattern, such as time expressions (e.g., Monday morning) or
titles (e.g., Frau Professor Doktor Schmidt).
In the ParGram project, multiword expressions are dealt with via
finite-state preprocessing, as are time expressions (this was limited
to the French team). The preprocessing is accomplished in two main
stages: tokenization and morphological analysis. Both stages are per-
183
184 / A Grammar Writer’s Cookbook
ductive expressions, as was the case with the multiwords above. Nev-
ertheless, there is good reason for treating them as single tokens since
they tend to be used as fixed expressions or names in technical texts.
Precisely because they are used like fixed expressions, technical terms
can be easily and successfully extracted from a technical text by par-
tially or fully automated methods (see Brun 1998 with respect to the
French experiment). This first stage of extracting terminology from a
corpus (in our case the tractor manual) results in a list of items which
can then be turned into a lexicon of multiword items.
The extraction within ParGram with respect to the tractor manual
was done as follows. Because we had parallel aligned English-French-
German texts at our disposal, we used the English translation to decide
when a potential candidate was a technical term. The terminology we
were dealing with consisted mainly of nouns. To perform the extraction
task, we used a tagger to disambiguate the French text (Chanod and
Tapanainen 1995), and then extracted the syntactic patterns, n p n, n
n, n a, a n, which are good candidates to be technical terms. These
candidates were considered as terms when the corresponding English
translation formed a unit, or when their translation differed from a
word to word translation. Some candidates which passed these tests
and were therefore extracted are shown in (2).
(2) vitesses rampantes (gears creeping) ‘creepers’
boı̂te de vitesse (box of gear) ‘gearbox’
arbre de transmission (tree of transmission) ‘drive shaft’
tableau de bord (table of edge) ‘instrument panel’
Once the terminology was extracted, a tokenizer was built which
split the input string into tokens using the list of extracted multiwords
(Grefenstette and Tapanainen 1994, Ait-Mokhtar 1997). A tokenizer
can be set up to provide only the multiword expression analysis of a
string (deterministic tokenization), or it can provide both the multi-
word expression analysis and the canonical one in which each element
of the multiword is returned as a separate token (nondeterministic to-
kenization).
Experience has shown that the first approach is often best in situa-
tions in which there is a constrained corpus, such as a technical text, or
for words which have no possible canonical (nonmultiword) parse. The
second approach is better in the general case where both parses are
likely to be encountered. For example, the French conjunction bien que
can be considered a multiword expression; however, the string bien que
is also found in situations where bien is an independent noun while que
is a complementizer. In (3a) bien que as one two-word unit is clearly
186 / A Grammar Writer’s Cookbook
wrong; instead bien is a noun and que is a relative pronoun (the mul-
tiword expression use is shown in (3b)).
(3) a. Jean me dit tout le bien que Pierre pense de Paul.
Jean me tells all the good that Pierre thinks of Paul
‘Jean tells me all the good that Pierre thinks about Paul.’
(French)
b. Jean écoute silencieusement bien qu’ il ne soit pas
Jean listens quietly although he not is neg
d’accord avec Paul.
agreement with Paul
‘Jean listens quietly although he completely disagrees with
Paul.’ (French)
Due to the occurrence of such ambiguities in our corpus, we built a
nondeterministic tokenizer within ParGram. The tokenization is per-
formed by applying finite-state transducers on the input string. The
Xerox two-level finite-state morphological analyzers were already dis-
cussed in section 11.3.2. In order to provide the reader with a better
idea of how they function, we here go through some examples in detail.
For example, take the sentence in (4). Applying the finite-state trans-
ducer to this input results in the following tokenization, where the token
boundary is signaled by the @ sign.
(4) Le tracteur est à l’arrêt.
the tractor is at the stop
‘The tractor is stationary.’ (French)
Le@tracteur@est@à@l’@arrêt@.@
In this particular case, each word is a token. But several words can
be a unit, as is the case for technical terms. Examples (5) and (6) show
instances of tokenization in which technical terms are treated as units.
(5) La boı̂te de vitesse est en deux sections.
the box of speed is in two sections
‘The gearbox is in two sections.’ (French)
La@boı̂te de vitesse@est@en@deux@sections@.@
(6) Ce levier engage l’arbre de transmission.
this lever engages the tree of transmission
‘This lever engages the drive shaft.’ (French)
Ce@levier@engage@l’@arbre de transmission@.@
Tokenization takes place in two logical steps. First, the basic trans-
ducer splits the sentence into a sequence of single words. Then a second
transducer containing a list of multiword expressions is applied. It rec-
Finite-State Technology / 187
ognizes these expressions and marks them as units. When more than
one expression in the list matches the input, the longest matching ex-
pression is marked. We included all the extracted technical terms and
their morphological variations in this last transducer, so that the mul-
tiwords could be analyzed as single tokens later in the process.
The next step is to associate these multiword units with a morpho-
logical analysis in the cases where one is needed. One type of multiword
which interacts with morphological analysis is represented by French
compounds. These compounds have to be integrated into the morpho-
logical analyzer because they may be inflected according to number, as
shown in (7). In the tractor corpus, we identified two kinds of morpho-
logical variations: either the first part of the compound may inflected,
or both parts of the compounds may be inflected.
.
(7) The first part varies in number:
gyrophare de toit, gyrophares de toit ‘roof flashing beacon(s)’
.
régime moteur, régimes moteur ‘engine speed(s)’
Both parts vary in number:
roue motrice, roues motrices ‘wheel drive’
This is of course not general for French compounds; there are other
patterns of morphological inflection. However, this pattern is reliable for
the technical manual we were dealing with. Other inflectional schemes
and exceptions can be easily added to the regular grammar as needed
(see Quint 1997 and Karttunen, Kaplan and Zaenen 1992 for further
discussion).
A cascade of regular rules is applied to the different parts of the com-
pound in order to build the morphological analysis of the whole com-
pound. For example, roue motrice is marked with the diacritic +dpl
(double plural). A first rule is then applied which copies the morpho-
logical tags from the end to the middle if the diacritic is present in the
right context:
roue 0 0 -motrice+DPL+Fem+PL
FIGURE 1 Rule 1
roue 0 s -motrice 0 s
FIGURE 2 Rule 2
are unexpected capital letters, and missing accents over two of the “e”s
which have been removed as part of the capitalization process.
(10) a. EQUIPEMENT SUPPLEMENTAIRE
equipment additional
‘additional equipment’ (French)
b. équipement supplémentaire
Without a normalizer to turn form (10a) into form (10b), the gram-
mar cannot recognize the words or parse the header.
A guesser deals with unknown words after the normalizer has done
its work. As discussed in section 11.3.2.3, there is a way of incorporat-
ing into the grammar words which are not in the lfg lexicon. If the
words are known by the morphology, they can be incorporated into the
grammar in a constrained manner. However, if they are not known to
the morphology, the unknown words will not be provided with morpho-
logical tags, making it difficult to incorporate them into the grammar
in a productive fashion. A guesser uses morphological tools to guess
the part of speech of a word and provides the appropriate tags. For
example, an unknown English word ending in -tion is likely to be a
noun, while one ending in -ate is likely to be a verb.
pos> pairs which are run through the morphological analyzer, yielding
<base form, pos> pairs. Once duplicates are eliminated, pairs with
the same base form are combined to give <base form, pos-list> pairs.
Then each entry in the lexicon is compared with the list derived from
the corpus and parts of speech which do not occur in the corpus are
eliminated.
13
Our previous discussion of the xle architecture (Chapter 11) and the
interaction of the lfg lexicons with finite-state morphological analyzers
and phrase structure rules should have made it clear that modularity
is at the heart of our grammar development effort. Given that the
projection architecture of lfg as described in the introduction is also
founded on the principle of modularity, this is not surprising.
On the other hand, too much modularity can also work against trans-
parency: rules may be more difficult to formulate (and understand) be-
cause they access different modules of the grammar, and errors may
become more difficult to track down, as they could spring from various
different sources. Alternatively, therefore, one could decide to follow an
approach similar to that of hpsg, which is to encode the different types
of linguistic information within one level of representation and extend
this design decision to the grammar implementation so that everything
is dealt with within one and the same module.
However, it is clear from general programming experience that a
modular approach to software design is generally preferable (see Knuth
1992 on the concept of structured programming), as it furthers the main-
tainability and transparency of the product. Moreover, with respect
to grammar design in particular, packing the phrase structure analy-
sis and the unification-based analysis into the same module increases
the complexity of the parsing problem. Most implementations of hpsg
grammars therefore realize a separate context-free backbone for the
phrase structure analysis (as is done in lfg), thus modularizing the
grammar (Carpenter 1992, Penn 1993).
Without taking this discussion any further, it seems fair to conclude
that some degree of modularity is desirable in the design of a grammar.
193
194 / A Grammar Writer’s Cookbook
1. A small file contains the closed class items and exceptional entries.
2. One file contains the technical terminology or special entries needed
for the text at hand. If the text is a specific application like the
tractor manual, then this lexicon contains items extracted specif-
ically from that text. If the text to be parsed is of a more general
nature, such as a newspaper text, no specialized file is included.
3. One or more large files with semi-automatically generated lexical
entries are always included.
This modular division developed naturally within the grammars, as
it reflects the different resources and applications that went into the
creation of the lexicons. The small file of closed class items contains
the core entries that cannot be done without. These entries are hand-
coded. The technical terminology, on the other hand, is application-
specific and can be extracted semi-automatically from the technical
text.
The larger files reflect open class items such as nouns, verbs, and ad-
jectives, which are needed for the parsing of large corpora of a general
type (e.g., newspaper texts). These were created through the semi-
automatic extraction of subcategorization frames, as is detailed in sec-
tion 14.1.1.
Another effort at modularization of file systems and storage did not
grow naturally out of the development, but came about because of a
design decision that was aimed at ensuring transparency across the
three parallel grammars. In an effort to encode the fact that some rules
and generalizations were in fact valid in all of the ParGram grammars,
the grammar writers agreed on a naming convention by which these
identical rules and generalizations could be identified in each of the
three grammars. That is, generalizations as to how the passive works,
or how transitive verbs differ from intransitive verbs, were captured
by giving the same name to the relevant lexical rule (section 13.2.3)
or template (section 13.2.1) in each of the grammars, and by ensuring
that the same material was contained by these rules and templates in
each of the grammars.
In effect, however, these rules and templates could easily be experi-
mented with and changed by each individual grammar writer without
any consequences for the other grammars. This brought up the prob-
lem of continued maintenance and transparency of the three ParGram
grammars with respect to one another. It was therefore decided to ex-
periment with locating crosslinguistically valid rules and templates in a
single file that would be shared across grammars. That is, the same file
containing crosslinguistic generalizations is included as input to every
196 / A Grammar Writer’s Cookbook
tense( p) defines the name of the template and the number of argu-
ments (if any). The @ sign precedes the call of a template, as illustrated
in (1b) for the template assign-case. This template is defined in (1c)
and takes two arguments (see Maxwell and Kaplan 1996 for details on
the notation and the formal power of templates).
(1) a. tense( p) = (↑tns-asp tense) = p
(↑fin) = +.
b. fin( t) = @(tense t)
@(assign-case subj nom).
c. assign-case( gf c) = (↑ gf case) = c.
The template in (1a) expresses the generalization that when a clause
includes a specification for tense, then it is also finite. This is a true
generalization for languages like German, English, and French, but is
not necessarily true for other languages (e.g., Japanese). This template
takes one argument: a specification of the type of tense involved. So
regardless of whether a verb marks present, past, or future tense, this
same template can be used.
The template in (1b) expresses another linguistic generalization. Once
again, it has to do with tense and finiteness, but looks at things the
other way around. In this case, a template fin is defined, which again
takes a tense specification as an argument, but which also specifies
that the subject must be marked as nominative. Again, this template
captures a generalization that in a finite clause the subject must be
nominative. This is true for English, French and German (for German
a provision must be added for subjectless clauses), but it does not hold
for other languages such as Urdu/Hindi or Icelandic, which may have
dative or other kinds of subjects.
Templates are especially useful for describing verb subcategorization
and generalizations over verb classes. The lexicons are organized so
that each verb subcategorization schema corresponds to a template.
These have varying degrees of complexity, depending on issues such as
case marking (German), extraction (English), clitic climbing (French),
etc. We here illustrate two simple templates for basic intransitive and
transitive verbs.
(2) a. intrans( p) = (↑pred) = 0 p<(↑ subj)>0
@nopass.
b. trans( p) = @(pass (↑pred) = ‘ p<(↑ subj)(↑obj)>0 ).
The templates in (2) specify subcategorization frames. The intrans
template calls another template that ensures that no passivization will
198 / A Grammar Writer’s Cookbook
Another example of the use of macros comes from the German gram-
mar. In this case, a number of disjunctions on the right-hand side of a
phrase structure rule are bundled together, rather than just the func-
tional annotations, as was the case in (3). As German has relatively free
word order, the rules which introduce core arguments and adjuncts of
predicates (both verbs and adjectives) are called in numerous places.
This wide distribution of essentially the same set of rules over different
parts of the grammar poses a maintenance problem: it is easy to realize
that an argument must be defined differently or that another condition
must be added in one part of the grammar, without realizing that this
change must also be made in another part of the grammar. Even if the
grammar writer realizes that the same change must be made in other
parts of the grammar, it is easy to miss one of the instances, especially
as the grammar grows.
Thus, despite all good intentions, the grammar tends to become in-
ternally inconsistent quite quickly. The use of macros is one way of
ensuring that this problem does not arise. Consider the macro in (4). It
disjunctively introduces object arguments, prepositional phrases, and
predicative adjectival arguments such as red in The tractor is red.
(4) adj-args = { np: (↑{ obj|obj2})=↓
|pp: { (↑obl)=↓
|↓∈(↑adjunct) }
|ap: (↑predlink)=↓
(↓atype) =c predicative}.
Whenever this macro is called, it is as if the rules on the right-hand side
of the macro were substituted in at the place of the call in the phrase
structure rule. The macro thus encodes the linguistic generalization
that this particular set of arguments and adjuncts distributes similarly
in German.
A slightly different example of this is coordination (Chapter 8) in
which the same basic schema is used for coordinating several differ-
ent c-structure categories. The coordination macro in (5) expresses
the general linguistic fact that constituent coordination is composed
of constituents of the same category (cat) separated by a comma or a
conjunction.
(5) sccoord(cat) = cat: ↓∈ ↑;
([comma
cat: ↓∈ ↑]+
(comma))
conj: ↑=↓
cat: ↓∈ ↑.
200 / A Grammar Writer’s Cookbook
This macro is called from a rule as in (6) and expands into (7).
(6) sadj −→ {s
|@(sccoord sadj)}.
(7) sadj −→ {s
| sadj: ↓∈ ↑;
([comma
sadj: ↓∈ ↑]+
(comma))
conj: ↑=↓
sadj: ↓∈ ↑}.
13.2.2 Complex Categories
Another method of expressing generalizations is the use of complex
categories. These are phrase structure categories which take arguments,
just as templates do. The argument instantiates a variable in the right
hand side of the rule.
A simplified example from the German grammar is given in (8). This
is a rule which parametrizes across different types of nps: standard,
interrogative and relative nps. The effect that is being illustrated here
is that in German some standard nps like proper names (Jonas) or
mass nouns (e.g., Wasser ‘water’) may appear without a determiner.
Interrogative and relative nps, on the other hand, may never appear
without a determiner. The rule for the np in (8) allows the type of the
np to be set via the variable type. When the value of the variable is
instantiated as std, the rule expands to the option in (9a) to allow a
standard type of np with an optional determiner. On the other hand,
when the variable is instantiated as int, the np is marked as being of
the type interrogative and the determiner is obligatory, as shown in
(9b). An instantation with rel works exactly the same way as in (9b)
for the purposes of our example. However, with respect to the larger
grammar, the identification of an np as interrogative vs. relative will
have consequences in other parts of the grammar.
(8) np[ type] −→ { (d[ type]: type = std)
| d[ type]: { type = int | type = rel } }
npap.
set of possibilities with regard to impersonal passives such as Hier wird getanzt. ‘here
is danced’, which are considered to be subjectless. However, the basic specifications
are the same.
202 / A Grammar Writer’s Cookbook
Performance
14.1 Robustness
The notion of robustness in grammar engineering differs slightly from
the notion a linguist might entertain. A linguist might consider a gram-
mar robust if it provides an analysis for every grammatical sentence or
clause and refuses to provide a parse when the sentence or clause is un-
grammatical. From the point of view of grammar engineering, on the
other hand, a failure to provide some kind of output is unacceptable as
this means that any application depending on output from the gram-
mar will be left floundering. A robust grammar from this perspective
is therefore a grammar that never fails to return some output. If the
203
204 / A Grammar Writer’s Cookbook
http://www.dfki.uni-sb.de/verbmobil/
Performance / 205
German (Feldweg 1993) and the Xerox tagger for English (Cutting, Kupiec, Peder-
sen and Sibun 1992).
3 Another shallow parser which was experimented with for French within Par-
Gram is the finite-state parser discussed in Chanod and Tapanainen (1996, 1997).
Performance / 207
pp∗
(↑ obl ) =↓
mark1 ∈ o∗
np
(1) vp → v
(↑ obj) =↓
↓ ∈ (↑ adjuncts)
mark2 ∈ o∗
the grammar, the use of optimality marks provides a simple yet effective
way of rendering parts of the grammar ineffective by moving a mark
into the nogood category.
14.2 Testing
Testing the grammar is an indispensable part of ensuring robustness
and increasing the performance of a grammar. Although individual con-
structions can be tested with a few sample sentences as they are de-
veloped, more systematic testing is required to maintain the quality of
the grammar over time and to catch unexpected results of the addition
of new rules and lexical items. Much of the methodology of grammar
testing is dictated by common sense. However, issues do arise with re-
spect to what kinds of testsuites to use, what other kinds of methods
apart from testsuites one might use to test the grammar for continued
consistency, and how to get a stable measure of the grammar’s perfor-
mance as it grows and changes. In the following sections we address
these issues as they arose within our grammar development effort.
14.2.1 Types of Testsuites
14.2.1.1 Testsuites Internal to ParGram
One of the hazards of developing large grammars is that although a
construction may be tested relatively thoroughly while it is first being
implemented, subsequent changes may alter the grammar in such a way
as to affect the behavior of the construction. That is, the addition of
rules may inadvertantly block already implemented constructions or
allow them to occur in contexts where they should not. The best way
to ensure that this does not go unnoticed is by developing and utilizing
testsuites, in addition to extensive commenting and documentation of
the grammar.
A testsuite is a series of sentences (or nps, pps, etc.) which can be fed
to the grammar. For example, in the xle environment, the command
parse-testfile <name-of-file> results in the grammar parsing all
of the sentences in the specified file and recording the number of parses,
the time it took to parse the sentence, and the number of subtrees for
the sentence (a measure of complexity). That is, this system allows the
Performance / 213
grammar tester to see some information from the results of the analysis,
but not the analysis itself. An example from a testfile of the English
auxiliary system is shown in (11). The numbers in parentheses provide
information on performance: the first number indicates the number of
parses, the second the number of seconds the parse took, and the third
the number of subtrees that the parser considered.
(11) root: It has appeared. (1 1.3 283)
root: It has been actuated. (1 2.15 294)
root: It has been appearing. (1 1.23 294)
root: It has been being actuated. (1 2.45 308)
Within xle these results are automatically stored in a separate file,
thus providing a benchmark for future users of the grammar.
In order to thoroughly test a grammar, a variety of testsuites is
necessary. It is best to have testsuites for each of the major types of
constructions which the grammar handles. For example, there might be
testsuites for the auxiliary system, for coordination, for questions, for
relative clauses, for different types of verb subcategorization frames, etc.
Each testsuite should contain simple examples of all possible versions of
the construction in question. In addition, ungrammatical variants of the
construction should be included in order to ensure that the grammar is
not overgenerating. For example, a testsuite for the English auxiliary
system should contain ungrammatical sentences like those in (12) in
addition to grammatical sentences like those in (11).
(12) root: It has appearing. (0 1.38 283)
root: It have is actuated. (0 2.3 294)
root: It is having actuates. (0 2.36 312)
root: It has been appears. (0 1.15 294)
The sentences constructed for the basic testing of a construction type
should be very simple, to ensure that the fundamentals of the con-
struction are correct. However, in order to test the grammar properly,
naturally occuring complex examples as in (13) should also be tested
at regular intervals.
(13) np: the two-speed blower which provides increased air circulation
for the heating system (8 2.525 605)
np: a deflector that may be opened to permit greater directional
control of the airflow (1 4.611 581)
This allows for the testing of constructions the grammar writers may
not have thought of themselves, and also tests interactions between
parts of the grammar, such as relative clauses in combination with
embedded clauses. This issue of complex rule interactions is further
214 / A Grammar Writer’s Cookbook
http://cl-www.dfki.uni-sb.de/tsnlp/index.html.
5 More information on these projects can be found at
(15)
average median
optimal solutions 1.742 1
suboptimal solutions 3.778 1
runtime 0.718 0.540
words per test item 0 0
ratio runtime/subtree 0.004 0.004
These two tables break up the data in a number of ways. The first
table shows that of a total number of 1,561 sentences, about 72% were
parsed. All of the sentences that were parsed were indeed grammatical
sentences of the language (i.e., the features of the grammar that are
designed to ensure robustness did not kick in to parse ungrammatical
sentences). The grammar could not deal with about 28% of the cor-
pus. Furthermore, there was only one sentence that displayed massive
ambiguity, there was no sentence which required a parse time of over
30 seconds, and no sentence timed out during parsing. The second ta-
ble calculates averages for the results shown in table (14). The average
number of optimal solutions was 1.7 with an average runtime of 0.7
seconds.
14.2.2 Further Tools and Databases
While the type of statistics illustrated in the previous section go a long
way towards measuring the performance of a grammar (see also section
14.3) and providing the grammar writer with information as to where
the grammar needs to be cleaned up or developed further, there are
a number of things such statistics cannot be used for. For example,
while we know that the German grammar was able to parse 72% of
the sentences, we do not actually know whether it parsed the sentences
correctly, or whether the grammar simply came up with an analysis
that would be considered wrong by most linguists. In cases of massive
ambiguity, as with the one sentence that had more than 25 optimal
solutions in table (14), it is furthermore very difficult for the grammar
writer to determine whether the desired analysis is even among the
many analyses the grammar has produced.
14.2.2.1 Treebanks
One way of storing information about the desired analysis or anal-
yses for a given construction is the use of a treebanking system. In
our project, the use of treebanks was inspired by the Penn Treebank.
However, given that in lfg much of the useful information about the
analysis is represented in the form of avms, our treebank contained
both trees (the c-structure) and the avms (the f-structure).
216 / A Grammar Writer’s Cookbook
Ideally, numbers should always have a tag indicating that they are num-
bers, even if they also have a noun tag. As such, it would be possible to
block the surfacing of numbers as nouns and instead have them always
be numbers and picked up by the np rule that way, i.e., only analysis
(21b) would surface. (The number option is needed independently for
places in which only numbers, and not common nouns, are allowed,
e.g., in dates.) Unlike the other interactions discussed in this section,
there is no way to block this undesirable overgeneration without either
giving up the power of using the unknowns or requesting a significant
modification to the morphology provided. Instead, optimality marks
(see section 14.1.3) were used to constrain this ambiguity.
Finally consider an example of two rules interacting in an undesir-
able manner. This occurred in both the English and French grammars
with the introduction of headers (section 9.2) into the grammars in
conjunction with the rule allowing noun-noun compounds. Headers are
designed to allow certain nps to be root level categories, as in (22a),
while noun-noun compounds occur in nps like (22b).
(22) a. root: Gearshifts
b. np: the oil filter
When both of these exist, roots such as (23) have two analyses, one
which forms a sentence s (the dominant reading in both the French
and English case) and one which forms a header (the less common
reading).6
(23) a. root: the beacon flashes
s = [the beacon]np [flashes]vp
header = [the [beacon]n flashes]np
b. root Le tracteur part
the tractor leaves/portion
s = [le tracteur]np [part]vp (‘the tractor left’)
header = [le tracteur [part]n ]np (#‘the portion tractor’)
However, there is a difference between sentences and headers which
can be exploited to block this double parse. Namely, sentences end with
punctuation marks, at least in written text such as the tractor manual,
while headers do not. As such, if punctuation is made obligatory at
6 Some sentences have the header reading dominant, as with the French example
the end of sentences, then (23) will only have one parse: s if there is a
punctuation mark and header if there is not.
In sum, as the grammar is expanded to include a wider variety of
constructions, these can interact in unpredictable ways, resulting in
ungrammatical parses of grammatical constructions and parses of un-
grammatical constructions. As these can be detected by rigorous test-
ing, they should be eliminated whenever possible before the grammar
increases further in complexity.
14.3.1.2 Legitimate Interactions
Consider next the case of interactions which are unforeseen and result
in a proliferation of analyses, but which are legitimate. Although it is
usually not desirable to block such interactions, it is important to know
that they exist, since they can result in unexpectedly large numbers of
analyses.
An example of a legitimate, but unanticipated, interaction between
rules arises from the introduction of present participles as adverbial
modifiers in conjunction with the np rule which allows present par-
ticiples to act as adjectival modifiers of nouns. Both rules are needed
independently for constructions like those in (24).
(24) a. s = [Turning the wheel to the left]advp the driver should
gently press the brake.
b. np= the [turning]ap wheels
In certain circumstances, sentences can have two parses, one in which an
initial present participle is interpreted as heading a sentence adverbial
and one in which it is interpreted as the adjectival modifier of the
subject np. Such an example is seen in (25).
(25) Flashing lights can be seen for miles around.
[Flashing]s.adv [[lights]np can be seen for miles around.]s
[[Flashing]a lights]np can be seen for miles around.
Since both types of constructions legitimately occur, there is no means,
or reason, to block one parse or the other. In fact, (25) could have either
parse, depending on the context in which it appears. However, knowing
that such interactions occur can help to explain sudden increases in
parses when running testsuites.
A similar type of example comes from the German grammar in which
the inclusion of adverbs, which morphologically resemble their adjec-
tival counterparts, allows for additional interpretations of certain sen-
tences. So, (26) has two readings, one in which früh ‘early’ is an ad-
jective modifying Montag ‘Monday’ (in both sentences this is a noun
phrase acting as an adverb) and one in which it is an adverb modifying
Performance / 221
the vp.
(26) Wir fangen Montag früh an.
we start Monday early part
Wir fangen [Montag [früh]ap ]advp an.
(=‘Early Monday we start.’)
Wir fangen [Montag]advp [früh]advp an.
(=‘Monday we start early.’)
Once again, since both constructions occur legitimately, one parse can-
not be blocked in favor of the other.
He is looking for the middlesized (ones); (ii) She likes those (ones); (iii) Cozy pubs
Grammar Version A B C D
no ranking ranking *empty N *dem.
pronoun
*empty N
(i) Er sucht die mittelgroßen. 1 0.442 1 0.548 0 0.299 0 0.275
(ii) Die gefallen ihr. 1 0.305 1 0.305 1 0.463 0 0.194
(iii) In der Stadt fehlen gemütliche 2 0.483 1 0.473 1 0.450 1 0.384
Kneipen.
(iv) In der Stadt fehlen die schönen 6 5.404 1 5.471 1 1.248 1 0.938
kleinen angenehmen gemütlichen
Kneipen.
(v) Er sieht das Kind. 1 0.312 1 0.301 1 0.334 1 0.278
(vi) Er sieht das Kind mit der Mütze. 2 0.673 2 0.673 2 0.696 2 0.395
(vii) Er sieht das Kind mit der Mütze in 5 1.505 5 1.516 5 1.603 5 0.594
222 / A Grammar Writer’s Cookbook
der Hand.
(viii) Die Erfahrungen sollen später in 2 12.625 2 14.944 2 4.542 2 1.186
die künftigen Planungen für die
gesamte Stadt einfließen.
(ix) Hinter dem Betrug werden die gle- 92 217.418 20 222.776 20 35.580 20 4.632
ichen Täter vermutet, die während
der vergangenen Tage in Griechen-
land gefälschte Banknoten in Um-
lauf brachten.
are missing in the city; (iv) The nice, small, comfortable, cozy pubs are missing in
the city; (v) He sees the child; (vi) He sees the child with the cap; (vii) He sees
the child with the cap in its hand; (viii) The experiences are supposed to enter into
the future plans of the entire city; (ix) The same suspects that brought counterfeit
money into circulation in Greece in the last few days are thought to be behind the
fraud.
224 / A Grammar Writer’s Cookbook
Grammar Version A B C
general cp clause types additional
analysis parameter- np-internal
ized parametri-
zation
Results based on entire testsuite (25 sentences)
parsing time
average [sec] > 300 > 75 1.19
standard deviation [sec] > 370 > 220 1.13
median [sec] > 130 >3 0.94
quartile distance [sec] > 730 > 14 1.01
maximal time [sec] > 900 > 900 5.6
# sentences beyond 6 1 —
timeout
Results based on the 19 “easiest” sentences
parsing time
average [sec] 121.19 4.83 0.92
standard deviation [sec] 163.38 6.82 0.65
median [sec] 69.89 2.15 0.88
quartile distance [sec] 161.62 4.70 0.69
maximal time [sec] 720.76 29.2 5.6
# sentences beyond — — —
timeout
The three versions of the grammar were run on a testsuite that in-
cluded phenomena known to pose difficulties for computational gram-
mars, such as relative clauses, coordination, and headless construc-
tions.8 The result of the experiment clearly indicates that the intro-
duction of complex categories in order to achieve greater modularity
in the grammar via a parametrization over linguistically relevant cate-
gories is very desirable.
14.3.3 Cross-grammar Performance
Comparing grammars to one another is a more difficult problem than
determining grammar internal performance over time because the in-
ternal structure of the different grammars may not be accessible to the
tester to create different versions for comparison. On the other hand,
8 The entire testsuite is provided in an Appendix in Kuhn 1999.
Performance / 225
This appendix provides the basic guidelines we used for positing fea-
tures and their values in the ParGram project, as well as sample fea-
tures.
in names with a prefix of two or more letters. For example, pron-form but vform.
227
228 / A Grammar Writer’s Cookbook
Abney, Steven. 1987. The English Noun Phrase in its Sentential Aspects.
Ph.D. thesis, MIT.
Abush, Dorit. 1994. Sequence of tense. In H. Kamp, ed., Ellipsis, Tense and
Questions. IMS, Stuttgart. DYANA deliverable R 2.2.3.
Alshawi, Hiyan, ed. 1992. The Core Language Engine. Cambridge, Mas-
sachussetts: The MIT Press.
Baker, Mark. 1983. Objects, themes, and lexical rules in Italian. In L. Levin,
M. Rappaport, and A. Zaenen, eds., Papers in Lexical-Functional Gram-
mar . Bloomington, Indiana: Indiana University Linguistics Club.
231
232 / A Grammar Writer’s Cookbook
Baur, Judith, Fred Oberhauser, and Klaus Netter. 1994. SADAW Ab-
schlußbericht. Tech. rep., Universität des Saarlandes and SIEMENS AG.
Bech, G. 1983. Studien über das deutsche Verbum infinitum. Tübingen: Max
Niemeyer Verlag. First published in 1955.
Berman, Judith and Anette Frank. 1995. Deutsche und französische Syntax
im Formalismus der LFG. Tübingen: Max Niemeyer Verlag.
Bod, Rens and Ronald Kaplan. 1998. A probabilistic corpus-driven model for
lexical-functional analysis. In Proceedings of 17th International Conference
on Computational Linguistics ACL/COLING-98 . Montreal, Canada.
Breidt, Lisa, Frédérique Segond, and Giuseppe Valetto. 1996. Formal de-
scription of multi-word lexemes with the finite state formalism: Idarex. In
Proceedings of the 16th International Conference on Computational Lin-
guistics (COLING-96), vol. 2, pages 1036–1040. Copenhagen, Denmark.
Bresnan, Joan, Ronald Kaplan, and Peter Peterson. 1985. Coordination and
the flow of information through phrase structure. Unpublished manuscript,
Xerox PARC.
Carpenter, Bob. 1992. ALE user’s guide. Tech. Rep. CM-LCL-92-1, Carnegie
Mellon University, Laboratory for Computational Linguistics.
Cutting, Doug, Julian Kupiec, Jan Pedersen, and Penelope Sibun. 1992. A
practical part-of-speech tagger. In 3rd Conference on Applied Natural Lan-
guage Processing, pages 133–140. Trento, Italy.
Dalrymple, Mary, Ronald Kaplan, John T. Maxwell III, and Annie Zaenen,
eds. 1995. Formal Issues in Lexical-Functional Grammar . Stanford, Cali-
fornia: CSLI Publications.
Dalrymple, Mary, John Lamping, and Vijay Saraswat. 1993. LFG semantics
via constraints. In Proceedings of the 6th Meeting of the EACL, pages
97–105.
Eckle, Judith and Ulrich Heid. 1996. Extracting raw material for a Ger-
man subcategorization lexicon from newspaper text. In Proceedings of the
4th International Conference on Computational Lexicography (COMPLEX
’96). Budapest, Hungary.
Johnson, Mark. 1986. The LFG treatment of discontinuity and the double
infinitive construction in Dutch. In M. Dalrymple, J. Goldber, K. Hanson,
C. P. Michael Inma and, and S. Wechsler, eds., Proceedings of the Fifth
West Coast Conference on Formal Linguistics, pages 102–118. Stanford,
California: Stanford Linguistics Association.
Kamp, Hans and Uwe Reyle. 1993. From Discourse to Logic. Dordrecht:
Kluwer Academic Publishers.
236 / A Grammar Writer’s Cookbook
Kaplan, Ronald and John T. Maxwell III. 1988a. An algorithm for Func-
tional Uncertainty. In Proceedings of the 12th International Conference
on Computational Linguistics (COLING-88), vol. 1, pages 297–302. Bu-
dapest, Hungary. Reprinted in Dalrymple et al. 1995, pp. 177–198.
Kaplan, Ronald, Klaus Netter, Jürgen Wedekind, and Annie Zaenen. 1989.
Translation by structural correspondences. In EACL 4 , pages 272–281.
University of Manchester.
Karttunen, Lauri, Ronald Kaplan, and Annie Zaenen. 1992. Two-level mor-
phology with composition. In Proceedings of the 14th International Con-
ference on Computational Linguistics (COLING-92), pages 141–148.
Kay, Paul and Charles Fillmore. 1994. Grammatical constructions and lin-
guistic generalizations: The what’s X doing Y? Unpublished manuscript,
UC Berkeley.
Kehler, Andrew, Mary Dalrymple, John Lamping, and Vijay Saraswat. 1995.
The semantics of resource-sharing in Lexical-Functional Grammar. In
EACL95 . University College Dublin.
King, Tracy Holloway. 1995. Configuring Topic and Focus in Russian. Stan-
ford, California: CSLI Publications.
Maxwell III, John T. and Ronald Kaplan. 1991. A Method for Disjunc-
tive Constraint Satisfaction, pages 173–190. Dordrecht: Kluwer Academic
Publishers. Reprinted in Dalrymple et al. 1995, pp. 381–402.
Maxwell III, John T. and Ronald Kaplan. 1993. The interface between
phrasal and functional constraints. Computational Linguistics 19(4):571–
590. Reprinted in Dalrymple et al. 1995, pp. 403–429.
Nolke, Henning and Hanne Korzen. 1996. L’ordre des mots. Langue Française
111. Larousse.
Nunberg, Geoff, Thomas Wasow, and Ivan Sag. 1994. Idioms. Language
70(3):491–538.
Pollard, Carl and Ivan Sag. 1987. Information-Based Syntax and Semantics,
Volume 1: Fundamentals. Stanford, California: CSLI Publications.
Pollard, Carl and Ivan Sag. 1994. Head-Driven Phrase Structure Grammar .
Chicago, Illinois: The University of Chicago Press.
References / 239
Prince, Alan and Paul Smolensky. 1993. Optimality theory: constraint in-
teraction in generative grammar. Tech. Rep. 2, Rutgers University Center
for Cognitive Science, Piscateway, New Jersey.
Rambow, Owen. 1996. Word order, clause union, and the for-
mal machinery of syntax. In M. Butt and T. H. King,
eds., Proceedings of the LFG96 Conference. Grenoble, France.
http://www-csli.stanford.edu/publications/LFG/lfg1.html.
Sag, Ivan, Gerald Gazdar, Thomas Wasow, and Steven Weisler. 1985. Coordi-
nation and how to distinguish categories. Natural Language and Linguistic
Theory 3:117–171.
Segond, Frédérique and Pasi Tapanainen. 1995. Using a finite-state based for-
malism to identify and generate multiword expressions. MLTT-19, Xerox
Research Centre Europe, Grenoble.
van Genabith, Josef and Dick Crouch. 1996. Direct and underspecified in-
terpretations of LFG f-structures. In Proceedings of the 16th Interna-
tional Conference on Computational Linguistics (COLING-96), vol. 1,
pages 262–267. Copenhagen, Denmark.
Zaenen, Annie and Ronald Kaplan. 1995. Formal devices for linguistic gen-
eralizations: West Germanic word order in LFG. In J. Cole, G. Green,
and J. Morgan, eds., Linguistics and Computation, pages 3–27. Stanford,
California: CSLI Publications. Reprinted in Dalrymple et al. 1995, pp.
215–239.
Subject Index
machine-readable, 47 illformed, 5
discourse parallel, 8
packaging, 22
particle, 141 gender, 5, 7, 59, 79, 109
dominance, 3 grammatical, 87
DRS generation, 180, 211
underspecified, 8 grammar
DRT (Discourse Representation checking, 194
Theory), 8 complexity, 220, 221
context free, 3
EAGLES, 214, 226 control, 194
elephants, 57 coverage, 217
ellipsis, 125 development environment, 13
emacs lfg-mode, 166 engineering, 1, 203
embedding evaluation, 214
hierarchy, 70 file sharing, 195
integration, 194
f-structure, see functional- Unix cvs, 194
structure maintenance, 15, 194–196,
feature, 8 199, 201
clash, 113 modularity, 15, 223
finite-state performance, 215, 217, 221,
cascade, 171, 187, 205, 206 223, 224
longest match operator, 171 rule interactions, 217
module, 14 sharing, 194
morphological analysis, 14, transparency, 15, 195
171 Grammar Writer’s Workbench,
preprocessing, 183 180
technology, 183 grammatical function, 3
tokenizer, 171, 185 alternation, 201
transducer, 183, 184, 186, 205 guesser, 14, see also finite-state
two-level morphological ana-
head-modifier, 3, 7, 13
lyzers, 171, 186, 204
header, 155, 157, 211, 219
function
HPSG, 43, 193, 198
open, 32
functional equation, 98 inflection
functional uncertainty, 22, 70 strong/weak, 110, 112
inside-out, 83 inheritance hierarchy, 198
functional uniqueness, 6 inversion
functional-structure, 3, 4 auxiliary, 25, 27, 36, 41
embedding, 65
flat, 64, 66 Kleene star *, 23
246 / A Grammar Writer’s Cookbook
225 preverbal, 22
parser subject, 27, 81
chunk parsing, 206 verb final, 12, 13
context-free backbone, 193 verb second, 12, 13
data-oriented parsing (dop), possessive, 103, 104
206 postposition, 131
deep parsing, 206 precedence, 3
LFG-DOP, 206 predicate, 6, 63
partial parsing, 206 argument, 3
shallow parsing, 206 local, 6
participle, 39, 41, 89, 90 main, 38, 48, 65, 66
nominal modifier, 91 semantic, 120
passive, 39, 64 top level, 65
past, 64, 68, 85 predicate-argument, 3, 7, 13
present, 39, 64 predicative, 8, 73, 74, 128
particle, 34, 59, 71 controlled subject, 74
interrogative, 37 prefix
negation, 141 separable, 72
verb, 72, 217 verbal, 72
passive, 47, 49, 50, 61, 62, 131, 201 preposition, 52, 53, 59, 61, 71, 86,
medial, 62, 82 109, 131
performance, 203, 212 directional, 131
person, 5, 79 incorporation, 132
phonology, 7 instrumental, 131
intonation, 19, 25, 50 locative, 131
prosody, 7, 85 multiple, 137
weight, 22 nested, 137
phrasal idiom, 188 nonsemantic, 131
phrase structure relative complement, 136
rule, 4 semantic, 131
tree, 3, 8 stranding, 131
plural, 115 prepositional phrase, 2, 19, 79, 88,
polarity, 158, 159 89, 131, 141
position, 13 adjunct, 134
in situ, 25 clausal complement, 131, 136
clause final, 12, 38, 72, 99 interrogative complement,
clause initial, 25, 38, 99 136
clause internal, 99 locative, 53
clause second, 12, 72 nominal complement, 131,
object, 81 136
postverbal, 74 rule, 137
predicative, 119 preprocessing, 183
Subject Index / 249
topicalization, 19 derivation, 84
vp, 65 root, 33
transducer, 14 verbal elements, 45
transparency, 194 weather, 46
treebank, 215 verb class
TSNLP, 214, 226 template, 197
two-level morphological analyzers, verb phrase
see finite-state infinitive, 41
Verbmobil, 204
unaccusative, 27, 64 voice, 64
unergative, 27, 64
unification, 6, 14, 110 wellformedness
disjunctive lazy, 170 condition, 6
Unix, 166, 194 syntactic, 120
unknown, see also lexicon word order, 7, 153
construction, 204 fixed, 12
lexicon, 204 flexible, 12, 88
user interface, 166
X0
variability approach, 43
language dependent, 8 principle, 8
verb, 6, 8, 19 theory, 8, 43
agreement, 49, 104 XLE, 14, 83, 163, 165, 166, 180,
bridge, 75 196, 212, 216
causative, 82 chart-parser, 170
coherent, 76, 77 output, 167
lassen, 76 user manual, 14
complex, 13, 76
construction, 48
copula, 46, 73, 74, 123
embedded, 97
finite, 12, 20, 72
head, 45
linking, 74
main, 32, 64, 66
nonfinite, 12
morphology, 19
noncoherent, 76
particle, 59
placement, 19
raising, 64, 65
reflexive, 82, 84
Name Index