Builders¶
Builders are an advanced way to process a ParseForest. Builders can take advantage of the shared representation the
parse trees inside a forest, and choose how to handle ambiguity.
To use builders, you must define your own builder class inheriting from Builder:
class MyBuilder(Builder):
def start_rule(self, context):
...
def end_rule(self, context, prev_value):
...
def terminal(self, context, token):
...
def skip_optional(self, context, prev_value):
...
def begin_multiple(self, context, prev_value):
...
def end_multiple(self, context, prev_value):
...
def extend(self, context, prev_value, extension_value):
...
Then you can run your builder against a parse forest using ParseForest.apply. The parse forest will then invoke
methods on your builder as it walks over the possible parse tree. Each method is given some context, and the
value currently being built, and returns a new value updated for what occured in the parse.
In this way, you can build a complete picture of the parse tree, one step at a time.
The passed context is a BuilderContext with fields rule,
symbol_index, start_index and
end_index that give details about where in the rule and where in the token stream
this invocation is occuring.
First apply will call start_rule for the matched rule. The result from that is passed to the other methods.
start_ruleis called at the start of each parsed rule.end_ruleis called at the start of each parsed rule.terminalis called when a terminal is parsed.extendis called when a given symbol in a rule has been parsed. It is passed both the previous value for that rule and the extension_value describing what was parsed for that symbol.skip_optionalis called in place ofextendwhen anoptionalsymbol is skipped over.begin_multipleandend_multipleare caused when astarorplussymbol is first encountered and left. Between them, any number ofextendcalls may be made, all corresponding to the same symbol.
You may find it easier to study the definitions of CountingBuilder and SingleParseTree builder, which are
internal classes used for implementing ParseForest.count() and ParseForest.single(), as they are both
fairly straightforward. SingleParseTree can be easily adapted to building arbitrary abstract syntax trees,
or performing other semantic actions according to the parse.
Example¶
For example, suppose that the parse forest only contains a single parse tree, which looks like this:
(rule 1: "a" (rule 2: "b") "c")
In other words, we’ve parsed the token stream ["a", "b", "c"] with the following grammar:
rule1 = ParseRule("rule 1", [T("a"), NT("rule 2"), T("c")])
rule2 = ParseRule("rule 2", [T("b")])
Then the following methods would get invoked during apply (though not necessary in this order):
v1 = builder.start_rule({rule2, 0})
v2 = builder.terminal({rule2, 0}, 'b')
v3 = builder.extend({rule2, 0}, v1, v2)
v4 = builder.end_rule({rule2, 1}, v3)
v5 = builder.start_rule({rule1, 0})
v6 = builder.terminal({rule1, 0}, 'a')
v7 = builder.extend({rule1, 0}, v5, v6)
v8 = builder.extend({rule1, 1}, v7, v4)
v9 = builder.terminal({rule1, 1}, 'c')
v10 = builder.extend({rule1, 2}, v8, v9)
v11 = builder.end_rule({rule1, 3}, v10)
Ambiguity¶
When there are multiple possible parse trees, sequences of builder results that are shared between parse trees will only get invoked once, then stored and re-used. This is why some context is omitted from builder methods, as the call may be used in several contexts. This is also why it is important not to mutate the passed in prev_value, as it may be used in other contexts. You should always return a fresh value that represents whatever change you need to make to prev value.
The easiest way to handle amiguity is to use utility methods make_list_builder and make_iter_builder. These methods
accept a builder with no ambiguity handling, and returns a new builder that simply treats every possible parse tree
independently, and return a list or iterable respectively. They directly correspond to the ParseForest.all and
ParseForest.__iter__ methods, which include some additional details.
If you do wish to directly handle ambiguity. You must override either the merge method, or both the
merge_horizontal and merge_vertical methods. All these methods work the same way: you are passed a list of values
that each represent an alternative parse of the same sequence of tokens for the same parse rule or symbol, and you
must return a single value aggregating them.
A merge is “vertical”, and calls merge_vertical when there are multiple possible ParseRule objects with the same
head that match the same sequence of tokens. The BuilderContext indicates the non terminal symbol they both match.
Conversely, merge_horizontal is called when there are multiple possible parses for a single ParseRule. In most use
cases, these methods will share the same implementation, so you are free to override merge instead.
Here is an example of the call sequence for an ambiguous parse of ["hello"] by grammar:
rule1 = ParseRule("sentence", [T("hello")])
rule2 = ParseRule("sentence", [T("hello")])
v1 = builder.start_rule({rule1, 0})
v2 = builder.terminal({rule1, 0}, 'hello')
v3 = builder.extend({rule1, 0}, v1, v2)
v4 = builder.end_rule({rule1, 1}, v3)
v5 = builder.start_rule({rule2, 0})
v6 = builder.terminal({rule2, 0}, 'hello')
v7 = builder.extend({rule2, 0}, v5, v6)
v8 = builder.end_rule({rule2, 1}, v7)
v9 = builder.merge_vertical({None, 0}, [v8, v4])
(Note that in this special case where the top level symbol itself is ambiguous, then None is passed in as the rule
being merged).
Here’s another example, ambiguously parsing ["a"]:
sentence = ParseRule("sentence", [NT("X"),NT("Y")])
X = ParseRule("X", [T("a", optional=True)])
Y = ParseRule("Y", [T("a", optional=True)])
v1 = builder.start_rule({Y, 0}) # After token 0
v2 = builder.skip_optional({Y, 0}, v1)
v3 = builder.end_rule({Y, 1}, v2)
v4 = builder.start_rule({X, 0})
v5 = builder.terminal({X, 0}, 'a')
v6 = builder.extend({X, 0}, v4, v5)
v7 = builder.end_rule({X, 1}, v6)
v8 = builder.start_rule({sentence, 0})
v9 = builder.extend({sentence, 0}, v8, v7)
v10 = builder.start_rule({Y, 0}) # Before token 0
v11 = builder.terminal({Y, 0}, 'a')
v12 = builder.extend({Y, 0}, v10, v11)
v13 = builder.end_rule({Y, 1}, v12)
v14 = builder.skip_optional({X, 0}, v4)
v15 = builder.end_rule({X, 1}, v14)
v16 = builder.extend({sentence, 0}, v8, v15)
v17 = builder.extend({sentence, 1}, v9, v3)
v18 = builder.extend({sentence, 1}, v16, v13)
v19 = builder.merge_horizontal({sentence, 2}, [v17, v18])
v20 = builder.end_rule({sentence, 2}, v19)
The two above examples give a visual indication of the terminology “vertical” and “horizontal”. In the first,
rule1 and rule2 are ambiguous and in vertically column in the grammar definition. In the second, X and
Y are ambiguous, and are horizontally next to each other in a single grammar rule.