The parser works on a stream of tokens. Tokens can be any python object, they are not expected to have any particular
behaviour. You may want to provide useful
__str__ methods to give better error messages.
parse function can work just as effectively on a stream of
bytes or a stream of single characters
(a scannerless parser). A common technique is for the
lexer to produce some sort of Token object that includes a text string and additional annotations.
For example the Natural Language Toolkit can mark each token with the relevant part of speech.
Symbols are objects used to define the right hand side of a
ParseRule production. Two Symbols,
Terminal are provided in the
symbols module. Anything that duck-types the same as these can be used however.
This is mostly useful for re-defining
Terminal.match, which is the method responsible for determining if
a given token matches the terminal. The default
Terminal class matches by equality, but, for example,
you may have terminals that match entire classes of tokens.
There is no way to customize the
ParseTree class. But you can avoid using it entirely by writing your own
Builder. Builders specify a semantic action to take at each step of the parse, allowing you to build your own
parse trees or abstract syntax trees directly from a
ParseForest. See Builders
for more details.
You can override
ParseRuleSet.get with anything that returns a list of
ParseRule objects. As there is no
preprocessing done on the rules, you can generate a grammar on the fly. You can use this feature to parse
context sensitive grammars, by passing any relevant context as part of the head, and adjusting the non-terminals
of the returned rules to forward on relevant context. This will probably lead to very long parse times unless
care is applied.