Parsing

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

1

2.3 Parsing
Natural language analysis in the early days of AI tended to rely on template matching, for
example, matching templates such as (X has Y) or (how many Y are there on X) to the input to be
analyzed. This of course depended on having a very restricted discourse and task domain. By the
late 1960s and early 70s, quite sophisticated recursive parsing techniques were being employed.
For example, Woods' lunar system used a top-down recursive parsing strategy interpreting an
ATN in the manner roughly indicated in section 2.2 (though ATNs in principle allow other parsing
styles). It also saved recognized constituents in a table, much like the class of parsers we are
about to describe. Later parsers were influenced by the efficient and conceptually elegant CFG
parsers described by Jay Earley (1970) and (separately) by John Cocke, Tadao Kasami, and Daniel
Younger (e.g., Younger 1967). The latter algorithm, termed the CYK or CKY algorithm for the
three separate authors, was particularly simple, using a bottom-up dynamic programming
approach to first identify and tabulate the possible types (nonterminal labels) of sentence
segments of length 1 (i.e., words), then the possible types of sentence segments of length 2, and
so on, always building on the previously discovered segment types to recognize longer phrases.
This process runs in cubic time in the length of the sentence, and a parse tree can be
constructed from the tabulated constituents in quadratic time. The CYK algorithm assumes a
Chomsky Normal Form (CNF) grammar, allowing only productions of form Np → Nq Nr, or
Np → w, i.e., generation of two nonterminals or a word from any given nonterminal. This is only
a superficial limitation, because arbitrary CF grammars are easily converted to CNF.
The method most frequently employed nowadays in fully analyzing sentential structure is chart
parsing. This is a conceptually simple and efficient dynamic programming method closely related
to the algorithms just mentioned; i.e., it begins by assigning possible analyses to the smallest
constituents and then inferring larger constituents based on these, until an instance of the
top-level category (usually S) is found that spans the given text or text segment. There are many
variants, depending on whether only complete constituents are posited or incomplete ones as
well (to be progressively extended), and whether we proceed left-to-right through the word
stream or in some other order (e.g., some seemingly best-first order). A common variant is
a left-corner chart parser, in which partial constituents are posited whenever their “left
corner”—i.e., leftmost constituent on the right-hand side of a rule—is already in place. Newly
completed constituents are placed on an agenda, and items are successively taken off the
agenda and used if possible as left corners of new, higher-level constituents, and to extend
partially completed constituents. At the same time, completed constituents (or rather,
categories) are placed in a chart, which can be thought of as a triangular table of width n and
height n (the number of words processed), where the cell at indices (i, j), with j > i, contains the
categories of all complete constituents so far verified reaching from position i to position j in the
input. The chart is used both to avoid duplication of constituents already built, and ultimately to
reconstruct one or more global structural analyses. (If all possible chart entries are built, the
final chart will allow reconstruction of all possible parses.) Chart-parsing methods carry over to
PCFGs essentially without change, still running within a cubic time bound in terms of sentence
2

length. An extra task is maintaining probabilities of completed chart entries (and perhaps
bounds on probabilities of incomplete entries, for pruning purposes).
Because of their greater expressiveness, TAGs and CCGs are harder to parse in the worst case
(O(n6)) than CFGs and projective DGs (O(n3)), at least with current algorithms (see Vijay-Shankar
& Weir 1994 for parsing algorithms for TAG, CCG, and LIG based on bottom-up dynamic
programming). However, it does not follow that TAG parsing or CCG parsing is impractical for real
grammars and real language, and in fact parsers exist for both that are competitive with more
common CFG-based parsers.
Finally we mention connectionist models of parsing, which perform syntactic analysis using
layered (artificial) neural nets (ANNs, NNs) (see Palmer-Brown et al. 2002; Mayberry and
Miikkulainen 2008; and Bengio 2008 for surveys). There is typically a layer of input units (nodes),
one or more layers of hidden units, and an output layer, where each layer has (excitatory and
inhibitory) connections forward to the next layer, typically conveying evidence for higher-level
constituents to that layer. There may also be connections within a hidden layer, implementing
cooperation or competition among alternatives. A linguistic entity such as a phoneme, word, or
phrase of a particular type may be represented within a layer either by a pattern of activation of
units in that layer (a distributed representation) or by a single activated unit
(a localist representation).
One of the problems that connectionist models need to confront is that inputs are temporally
sequenced, so that in order to combine constituent parts, the network must retain information
about recently processed parts. Two possible approaches are the use of simple recurrent
networks (SRNs) and, in localist networks, sustained activation. SRNs use one-to-one feedback
connections from the hidden layer to special context units aligned with the previous layer
(normally the input layer or perhaps a secondary hidden layer), in effect storing their current
outputs in those context units. Thus at the next cycle, the hidden units can use their own
previous outputs, along with the new inputs from the input layer, to determine their next
outputs. In localist models it is common to assume that once a unit (standing for a particular
concept) becomes active, it stays active for some length of time, so that multiple concepts
corresponding to multiple parts of the same sentence, and their properties, can be
simultaneously active. A problem that arises is how the properties of an entity that are active at
a given point in time can be properly tied to that entity, and not to other activated entities. (This
is the variable binding problem, which has spawned a variety of approaches—see Browne and
Sun 1999). One solution is to assume that unit activation consists of pulses emitted at a globally
fixed frequency, and pulse trains that are in phase with one another correspond to the same
entity (e.g., see Henderson 1994). Much current connectionist research borrows from symbolic
processing perspectives, by assuming that parsing assigns linguistic phrase structures to
sentences, and treating the choice of a structure as simultaneous satisfaction of symbolic
linguistic constraints (or biases). Also, more radical forms of hybridization and modularization
are being explored, such as interfacing a NN parser to a symbolic stack, or using a neural net to
3

learn the probabilities needed in a statistical parser, or interconnecting the parser network with
separate prediction networks and learning networks. For an overview of connectionist sentence
processing and some hybrid methods (see Crocker 2010).

2.4 Coping with syntactic ambiguity


If natural language were structurally unambiguous with respect to some comprehensive,
effectively parsable grammar, our parsing technology would presumably have attained
human-like accuracy some time ago, instead of levelling off at about 90% constituent
recognition accuracy. In fact, however, language is ambiguous at all structural levels: at the level
of speech sounds (“recognize speech” vs. “wreck a nice beach”); morphology (“un-wrapped” vs.
“unwrap-ped”); word category (round as an adjective, noun, verb or adverb); compound word
structure (wild goose chase); phrase category (nominal that-clause vs. relative clause in “the
idea that he is entertaining”); and modifier (or complement) attachment (“He hit the man with
the baguette”). The parenthetical examples here have been chosen so that their ambiguity is
readily noticeable, but ambiguities are far more abundant than is intuitively apparent, and the
number of alternative analyses of a moderately long sentence can easily run into the thousands.
Naturally, alternative structures lead to alternative meanings, as the above examples show, and
so structural disambiguation is essential. The problem is exacerbated by ambiguities in the
meanings and discourse function even of syntactically unambiguous words and phrases. But
here we just mention some of the structural preference principles that have been employed to
achieve at least partial structural disambiguation. First, some psycholinguistic principles that
have been suggested are Right Association (RA) (or Late Closure, LC), Minimal Attachment (MA),
and Lexical Preference (LP). The following examples illustrate these principles:
(2.1)
(RA) He bought the book that I had selected for Mary.
(Note the preference for attaching for Mary to selected rather than bought.)
(2.2)
(MA?) She carried the groceries for Mary.
(Note the preference for attaching for Mary to carried, rather than groceries, despite RA. The
putative MA-effect might actually be an LP-like verb modification preference.)
(2.3)
(LP) She describes men who have worked on farms as cowboys.
4

(Note the preference for attaching as cowboys to describes, rather than worked.)


Another preference noted in the literature is for parallel structure in coordination, as illustrated
by the following examples:
(2.4)
They asked for tea and coffee with sugar.
(Note the preference for the grouping [[tea and coffee] with sugar], despite RA.)
(2.5)
John decided to buy a novel, and Mary, a biography.
(The partially elided conjunct is understood as “Mary decided to buy a biography”.)
(2.6)
John submitted short stories to the editor, and poems too.
(The partially elided conjunct is understood as “submitted poems to the editor too”.)
Finally, the following example serves to illustrate the significance of frequency effects, though
such effects are hard to disentangle from semantic biases for any single sentence (improvements
in parsing through the use of word and phrase frequencies provide more compelling evidence):
(2.7)
What are the degrees of freedom that an object in space has?
(Note the preference for attaching the relative clause to degrees of freedom, rather
than freedom, attributable to the tendency of degree(s) of freedom to occur as a “multiword”.)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy