问题
Here is the text I'm parsing:
x ~ normal(mu, 1)
y ~ normal(mu2, 1)
The parser matches those lines using:
model_definition = Group(identifier.setResultsName('random_variable_name') + '~' + expression).setResultsName('model_definition')
// end of line: .setResultsName('model_definition')
The problem is that when there are two model definitions, they aren't named separately in the ParseResults object:
It looks like the first one gets overridden by the second. The reason I'm naming them is to make executing the lines easier - this way I (hopefully) don't have to figure out what is going on at evaluation time - the parser has already labelled everything. How can I get both model_definition
s labelled? It would be nice if model_definition
held a list of every model definition found.
Just in case, here is some more of my code:
model_definition = Group(identifier.setResultsName('random_variable_name') + '~' + expression).setResultsName('model_definition')
expression << Or([function_application, number, identifier, list_literal, probability_expression])
statement = Optional(newline) + Or([model_definition, assignment, function_application]) + Optional(newline)
line = OneOrMore('\n').suppress()
comment = Group('#' + SkipTo(newline)).suppress()
program = OneOrMore(Or([line, statement, comment]))
ast = program.parseString(input_string)
return ast
回答1:
Not documented that I know of, but I found something in pyparsing.py
:
I changed .setResultsName('model_definition')
to .setResultsName('model_definition*')
and they listed correctly!
Edit: it is documented, but it is a flag you pass to setResultsName
:
setResultsName( string, listAllMatches=False ) - name to be given to tokens matching the element; if multiple tokens within a repetition group (such as ZeroOrMore or delimitedList) the default is to return only the last matching token - if listAllMatches is set to True, then a list of matching tokens is returned.
回答2:
Here is enough of your code to get things to work:
from pyparsing import *
# fake in the bare minimum to parse the given test strings
identifier = Word(alphas, alphanums)
integer = Word(nums)
function_call = identifier + '(' + Optional(delimitedList(identifier | integer)) + ')'
expression = function_call
model_definition = Group(identifier.setResultsName('random_variable_name') + '~' + expression)
sample = """
x ~ normal(mu, 1)
y ~ normal(mu2, 1)
"""
The trailing '*'
is there in setResultsName
for those cases where you use the short form of setResultsName
: expr("name*")
vs expr.setResultsName("name", listAllMatches=True)
. If you prefer calling setResultsName
, then I would not use the '*'
notation, but would pass the listAllMatches
argument.
If you are getting names that step on each other, you may need to add a level of Grouping. Here is your solution using listAllMatches=True
, by virtue of the trailing '*'
notation:
model_definition1 = model_definition('model_definition*')
print OneOrMore(model_definition1).parseString(sample).dump()
It returns this parse result:
[['x', '~', 'normal', '(', 'mu', '1', ')'], ['y', '~', 'normal', '(', 'mu2', '1', ')']]
- model_definition: [['x', '~', 'normal', '(', 'mu', '1', ')'], ['y', '~', 'normal', '(', 'mu2', '1', ')']]
[0]:
['x', '~', 'normal', '(', 'mu', '1', ')']
- random_variable_name: x
[1]:
['y', '~', 'normal', '(', 'mu2', '1', ')']
Here is a variation that does not use listAllMatches
, but adds another level of Group:
model_definition2 = model_definition('model_definition')
print OneOrMore(Group(model_definition2)).parseString(sample).dump()
gives:
[[['x', '~', 'normal', '(', 'mu', '1', ')']], [['y', '~', 'normal', '(', 'mu2', '1', ')']]]
[0]:
[['x', '~', 'normal', '(', 'mu', '1', ')']]
- model_definition: ['x', '~', 'normal', '(', 'mu', '1', ')']
- random_variable_name: x
[1]:
[['y', '~', 'normal', '(', 'mu2', '1', ')']]
- model_definition: ['y', '~', 'normal', '(', 'mu2', '1', ')']
- random_variable_name: y
In both cases, I see the full content being returned, so I don't quit understand what you mean by "if you return multiple, it fails to split out each child."
来源:https://stackoverflow.com/questions/37329296/pyparsing-setresultsname-for-multiple-elements-get-combined