Parsing
Parsing
Parsing
2.3 Parsing
Natural language analysis in the early days of AI tended to rely on template matching, for
example, matching templates such as (X has Y) or (how many Y are there on X) to the input to be
analyzed. This of course depended on having a very restricted discourse and task domain. By the
late 1960s and early 70s, quite sophisticated recursive parsing techniques were being employed.
For example, Woods' lunar system used a top-down recursive parsing strategy interpreting an
ATN in the manner roughly indicated in section 2.2 (though ATNs in principle allow other parsing
styles). It also saved recognized constituents in a table, much like the class of parsers we are
about to describe. Later parsers were influenced by the efficient and conceptually elegant CFG
parsers described by Jay Earley (1970) and (separately) by John Cocke, Tadao Kasami, and Daniel
Younger (e.g., Younger 1967). The latter algorithm, termed the CYK or CKY algorithm for the
three separate authors, was particularly simple, using a bottom-up dynamic programming
approach to first identify and tabulate the possible types (nonterminal labels) of sentence
segments of length 1 (i.e., words), then the possible types of sentence segments of length 2, and
so on, always building on the previously discovered segment types to recognize longer phrases.
This process runs in cubic time in the length of the sentence, and a parse tree can be
constructed from the tabulated constituents in quadratic time. The CYK algorithm assumes a
Chomsky Normal Form (CNF) grammar, allowing only productions of form Np → Nq Nr, or
Np → w, i.e., generation of two nonterminals or a word from any given nonterminal. This is only
a superficial limitation, because arbitrary CF grammars are easily converted to CNF.
The method most frequently employed nowadays in fully analyzing sentential structure is chart
parsing. This is a conceptually simple and efficient dynamic programming method closely related
to the algorithms just mentioned; i.e., it begins by assigning possible analyses to the smallest
constituents and then inferring larger constituents based on these, until an instance of the
top-level category (usually S) is found that spans the given text or text segment. There are many
variants, depending on whether only complete constituents are posited or incomplete ones as
well (to be progressively extended), and whether we proceed left-to-right through the word
stream or in some other order (e.g., some seemingly best-first order). A common variant is
a left-corner chart parser, in which partial constituents are posited whenever their “left
corner”—i.e., leftmost constituent on the right-hand side of a rule—is already in place. Newly
completed constituents are placed on an agenda, and items are successively taken off the
agenda and used if possible as left corners of new, higher-level constituents, and to extend
partially completed constituents. At the same time, completed constituents (or rather,
categories) are placed in a chart, which can be thought of as a triangular table of width n and
height n (the number of words processed), where the cell at indices (i, j), with j > i, contains the
categories of all complete constituents so far verified reaching from position i to position j in the
input. The chart is used both to avoid duplication of constituents already built, and ultimately to
reconstruct one or more global structural analyses. (If all possible chart entries are built, the
final chart will allow reconstruction of all possible parses.) Chart-parsing methods carry over to
PCFGs essentially without change, still running within a cubic time bound in terms of sentence
2
length. An extra task is maintaining probabilities of completed chart entries (and perhaps
bounds on probabilities of incomplete entries, for pruning purposes).
Because of their greater expressiveness, TAGs and CCGs are harder to parse in the worst case
(O(n6)) than CFGs and projective DGs (O(n3)), at least with current algorithms (see Vijay-Shankar
& Weir 1994 for parsing algorithms for TAG, CCG, and LIG based on bottom-up dynamic
programming). However, it does not follow that TAG parsing or CCG parsing is impractical for real
grammars and real language, and in fact parsers exist for both that are competitive with more
common CFG-based parsers.
Finally we mention connectionist models of parsing, which perform syntactic analysis using
layered (artificial) neural nets (ANNs, NNs) (see Palmer-Brown et al. 2002; Mayberry and
Miikkulainen 2008; and Bengio 2008 for surveys). There is typically a layer of input units (nodes),
one or more layers of hidden units, and an output layer, where each layer has (excitatory and
inhibitory) connections forward to the next layer, typically conveying evidence for higher-level
constituents to that layer. There may also be connections within a hidden layer, implementing
cooperation or competition among alternatives. A linguistic entity such as a phoneme, word, or
phrase of a particular type may be represented within a layer either by a pattern of activation of
units in that layer (a distributed representation) or by a single activated unit
(a localist representation).
One of the problems that connectionist models need to confront is that inputs are temporally
sequenced, so that in order to combine constituent parts, the network must retain information
about recently processed parts. Two possible approaches are the use of simple recurrent
networks (SRNs) and, in localist networks, sustained activation. SRNs use one-to-one feedback
connections from the hidden layer to special context units aligned with the previous layer
(normally the input layer or perhaps a secondary hidden layer), in effect storing their current
outputs in those context units. Thus at the next cycle, the hidden units can use their own
previous outputs, along with the new inputs from the input layer, to determine their next
outputs. In localist models it is common to assume that once a unit (standing for a particular
concept) becomes active, it stays active for some length of time, so that multiple concepts
corresponding to multiple parts of the same sentence, and their properties, can be
simultaneously active. A problem that arises is how the properties of an entity that are active at
a given point in time can be properly tied to that entity, and not to other activated entities. (This
is the variable binding problem, which has spawned a variety of approaches—see Browne and
Sun 1999). One solution is to assume that unit activation consists of pulses emitted at a globally
fixed frequency, and pulse trains that are in phase with one another correspond to the same
entity (e.g., see Henderson 1994). Much current connectionist research borrows from symbolic
processing perspectives, by assuming that parsing assigns linguistic phrase structures to
sentences, and treating the choice of a structure as simultaneous satisfaction of symbolic
linguistic constraints (or biases). Also, more radical forms of hybridization and modularization
are being explored, such as interfacing a NN parser to a symbolic stack, or using a neural net to
3
learn the probabilities needed in a statistical parser, or interconnecting the parser network with
separate prediction networks and learning networks. For an overview of connectionist sentence
processing and some hybrid methods (see Crocker 2010).