Errors and Edge Cases¶
There are 3 possible ways that parsing can fail. All of them raise subclasses of ParseError
. All instances
of ParseError
contain a message
, and fields
start_index
, end_index
indicating where in
the token stream the error is occurring.
No parse¶
When the parser can find no possible parse for a token stream, it raises NoParseError
. The location will be the
furthest the parse got before there were no possible parses. Additional fields are included:
encountered
- The token we failed at, orNone
if the end of the stream was reached.expected_terminals
- All the terminals that were evaluated against encountered (and failed)expected
- Similar toexpected_terminals
, except the parser includes any non-terminals that could have started at the encountered token, and hides any terminals or non-terminals that are implicitly covered another one. This is usually a higher level summary of what was expected at any given point.
Note, you can override method ParseRuleSet.is_anonymous()
to return true for some heads. Any anonymous rule
will never be eligible to appear in expected
. This is useful if you are transforming or generating the grammar,
and some rules don’t make sense to report.
Ambiguous Parse¶
Ambiguous parses are not an immediate error. The parse
function simply returns a forest which contains all
possible parse trees. However, if you call ParseForest.single
and there is ambiguity, then AmbiguousParseError
will be thrown. It will indicate the earliest possible ambiguity, but there may be others. The error will contain a
field called values
containing the possible alternatives parses. However, it only contains a subtree of the full parse
tree, and additionally, it may be only halfway through building a rule, so the subtree may be missing elements.
These limitations ensure that values
is a short list. It is recommended you do not use ParseForest.single
if
you need more detail on ambiguity.
Infinite Parse¶
In some obscure grammars it is possible to define rules that have an infinite number of possible parses. Here is a simple example:
ParseRule("s", [NT("s")]
ParseRule("s", [T("word")]
When parsing ["word"]
, all the following are valid parse trees:
(s: word)
(s: (s: word))
(s: (s: (s: word)))
...
In this circumstance, parse
throws InfiniteParseError
. You can avoid this error with the right use of greedy and
penalty settings as they are evaluated before checking for infinite parses.
It’s possible to improve support for infinite parses if there is demand. Let me know.