-
-
Notifications
You must be signed in to change notification settings - Fork 446
Description
I'm trying to write a parser that deals with some syntactic elements inside of Markdown without affecting the rest of the document. I've been dealing with a lot of issues surrounding the performance of the grammar. In trying to convert it into LALR(1), I've found that there are a lot of ambiguities. The root of these is that a lot of the syntax uses characters that are legal in other contexts as well-- for instance, the phrase
asdf `{some syntax}`
could (in theory) be validly parsed as either just a string or as one string token asdf
and then the rule corresponding to {some syntax}
. There are also some multiline constructs and some list handling has to happen. Further complicating things is the fact that I want to preserve whitespace outside of the syntax, so ignoring it isn't really viable
So, the grammar's slow-- there are a good few similar issues to the above. When I tried to change the parser to LALR(1), as a consequence, it did things like regarding a NEWLINE as part of the preceding paragraph rather than as the boundary between two list items. I'm kinda at my wit's end here-- I have a grammar that works well, but is incredibly slow in Earley and is unusable with other parsers. I'd appreciate any guidance on making it run at any speed.