Faster Generalized LR Parsing PDF
Faster Generalized LR Parsing PDF
Faster Generalized LR Parsing PDF
1 Introduction
3 Faster LR Parsing
Much attention has been devoted to speeding up LR parsers, and the majority
of this research pertains to implementation techniques. The argument is that in-
terpreted, table-driven programs are inherently slower than hardcoded, directly-
executable programs; given that, the best way to speed up a table-driven LR
parser is to convert it into a directly-executable form that needs no tables.
[16,8,17,3] all start with a LR parser's handle-nding automaton and trans-
late it directly into source code | this source code can then be compiled1 to
create an executable LR parser. Basically, each state of the automaton is directly
translated into source form using boilerplate code. This process tends to produce
inecient code, so these papers expend eort optimizing the source code output.
Several other papers [18, 19,13,14,7] have taken a slightly dierent approach,
introducing a technique called recursive ascent parsing. Here, a LR parser is
implemented with a set of mutually recursive functions, one for each state2 in a
table-driven LR parser's handle-nding automaton.
Unfortunately, all of the above work is of limited use when applied to a GLR
parser. LR parsers produce a single derivation for an input string. In terms of
implementation, a LR parser only needs to keep track of a single set of informa-
tion: the current parser state | what the parser is doing right now, and what
it's done in the past. In a table-driven LR parser, this information is kept on an
explicit stack; in a directly-executable LR parser, the information exists through
a combination of the CPU's execution stack and program counter.
1
Or assembled, as is the case in [16].
2
Two functions per state are reputed to be required in [14].
In contrast, a GLR parser produces all derivations for an input string. This
means that a GLR parser may need to keep track of multiple parser states
concurrently. To construct a directly-executable GLR parser, one would need to
maintain multiple CPU stacks and program counters. Certainly this is possible,
but the overhead in doing so and switching between them frequently would be
prohibitive, at least on a uniprocessor architecture.
Once direct execution of GLR parsers is ruled out, the obvious approach is to
speed up table-driven LR (and thereby GLR) parsers. Looking at the LR parsing
algorithm and its operation, one source of improvement would be to reduce the
reliance on the stack. Fewer stack operations would mean less overhead, resulting
in a faster parser.
The ideal situation, of course, is to have no stack at all! This would mean us-
ing nite automata to parse context-free languages, which is theoretically impos-
sible [15]. Instead, we approximate the ideal situation. Our LR parsing method
is an analogue to the GLR algorithm: it uses ecient nite automata as long as
possible, falling back on the stack when necessary.
3.1 Limit Points
Our rst step is to modify the grammar. When using a LR parser, the usual
heuristic is to prefer left recursion in the grammar when possible; left recursion
yields a shallow stack, because a handle is accumulated atop the stack and is
reduced away immediately.
Non-left recursion may be ill-advised in regular LR parsers, but it is anathema
to our method. For reasons discussed in the next section, we set \limit points" in
the grammar where non-left recursion appears. A limit point is set by replacing a
nonterminal A in the right-hand side of a grammar rule | a nonterminal causing
recursion | with the terminal symbol ?A.
Figure 2 shows a simplied grammar for arithmetic expressions and the limit
point that is set in it. The rules of the resulting grammar are numbered for later
reference.
The process of nding limit points is admittedly not always straightforward.
In general, there can be many places that limit points can be placed to break
a cycle of recursion in a grammar. We will eventually be resorting to use of the
stack when we reach a limit point during parsing, so it is important to try and
nd a solution which minimizes the number of limit points, both statically and
dynamically.
For large grammars, it becomes dicult to select appropriate limit points by
hand. The problem of nding limit points automatically can be modelled using
the feedback arc set (FAS) problem [20]. Unfortunately, the FAS decision prob-
lem is NP-complete [9], and the corresponding optimization problem | nding
the minimal FAS | is NP-hard [5]. There are, however, heuristic algorithms for
the problem. We have used the algorithm from [4] due to its relative simplicity.
The number of limit points obtained for various programminglanguage gram-
mars is shown in Table 1. It is important to remember that these results were
computed using a heuristic algorithm, and that the actual number of limit points
S !E$ 0 S !E$
@
0 0
E !E+F 1 E !E+F
E !F , 2 E !F
F !(E) 3 F ! ( ?E )
F !n 4 F !n
Fig. 2. Expression grammar and limit points
required may be lower. For example, starting with the computed limit points,
hand experimentation revealed that no more than twelve limit points are needed
for the Modula-2 grammar.
( n
⊥E 13
10
) reduce 4
11
reduce 3
reduce 2
$ +
2 3
( n
⊥E 12 reduce 1
) reduce 4
reduce 3
4.2 Results
We performed some timing experiments to compare a standard GLR parser with
our modied GLR parser. As a basis for comparison, we used the public domain
0
( n
13
10
) reduce 4
11
reduce 3
reduce 2 push 10
1 15
$ + pop +
n ( ( n
6 18
7 14 19
8 21 20
reduce 3 25 reduce 3
4 22 26
) reduce 4
23
reduce 3
27
100
Time (seconds)
10
0.1
10 20 30 40 50
Input symbols
10
0.1
0.01
10 20 30 40 50 60 70 80 90 100
Input symbols
References
1. A. V. Aho, R. Sethi, and J. D. Ullman. Compilers: Principles, Techniques, and
Tools. Addison-Wesley, 1986.
2. J. Aycock. Faster Tomita Parsing. MSc thesis, University of Victoria, 1998.
3. A. Bhamidipaty and T. A. Proebsting. Very Fast YACC-Compatible Parsers (For
Very Little Eort). Technical Report TR 95{09, Department of Computer Science,
University of Arizona, 1995.
4. P. Eades, X. Lin, and W. F. Smyth. A fast and eective heuristic for the feedback
arc set problem. Information Processing Letters, 47:319{323, 1993.
5. M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the
Theory of NP-Completeness. W. H. Freeman, 1979.
6. D. Grune and C. J. H. Jacobs. Parsing Techniques: A Practical Guide. Ellis
Horwood, 1990.
7. R. N. Horspool. Recursive Ascent-Descent Parsing. Journal of Computer Lan-
guages, 18(1):1{16, 1993.
8. R. N. Horspool and M. Whitney. Even Faster LR Parsing. Software, Practice and
Experience, 20(6):515{535, 1990.
9. R. M. Karp. Reducibility Among Combinatorial Problems. In R. E. Miller and
J. W. Thatcher, editors, Complexity of Computer Calculations, pages 85{103.
Plenum Press, 1972.
10. J. R. Kipps. GLR Parsing in Time O(n3 ). In Tomita [24], pages 43{59.
11. D. E. Knuth. On the Translation of Languages from Left to Right. Information
and Control, 8:607{639, 1965.
12. D. E. Knuth. The Art of Computer Programming, Volume 3: Sorting and Searching.
Addison-Wesley, 1973.
13. F. E. J. Kruseman Aretz. On a Recursive Ascent Parser. Information Processing
Letters, 29:201{206, 1988.
14. R. Leermakers. Recursive ascent parsing: from Earley to Marcus. Theoretical
Computer Science, 104:299{312, 1992.
15. H. R. Lewis and C. H. Papadimitriou. Elements of the Theory of Computation.
Prentice-Hall, 1981.
16. T. J. Pennello. Very Fast LR Parsing. In Proceedings SIGPLAN '86 Symposium on
Compiler Construction, volume 21(7) of ACM SIGPLAN Notices, pages 145{151,
1986.
17. P. Pfahler. Optimizing Directly Executable LR Parsers. In Compiler Compilers,
Third International Workshop, CC '90, pages 179{192. Springer-Verlag, 1990.
18. G. H. Roberts. Recursive Ascent: An LR Analog to Recursive Descent. ACM
SIGPLAN Notices, 23(8):23{29, 1988.
19. G. H. Roberts. Another Note on Recursive Ascent. Information Processing Letters,
32:263{266, 1989.
20. E. Speckenmeyer. On Feedback Problems in Digraphs. In Graph-Theoretic Con-
cepts in Computer Science, pages 218{231. Springer-Verlag, 1989.
21. M. Tomita. An Ecient Context-Free Parsing Algorithm for Natural Languages
and Its Applications. PhD thesis, Carnegie-Mellon University, 1985.
22. M. Tomita. Ecient Parsing for Natural Language. Kluwer Academic, 1986.
23. M. Tomita. An Ecient Augmented-Context-Free Parsing Algorithm. Computa-
tional Linguistics, 13(1{2):31{46, 1987.
24. M. Tomita, editor. Generalized LR Parsing. Kluwer Academic, 1991.