pyparsing: setResultsName for multiple elements get combined

问题

Here is the text I'm parsing:

x ~ normal(mu, 1)
y ~ normal(mu2, 1)

The parser matches those lines using:

model_definition = Group(identifier.setResultsName('random_variable_name') + '~' + expression).setResultsName('model_definition')

// end of line: .setResultsName('model_definition')

The problem is that when there are two model definitions, they aren't named separately in the ParseResults object:

It looks like the first one gets overridden by the second. The reason I'm naming them is to make executing the lines easier - this way I (hopefully) don't have to figure out what is going on at evaluation time - the parser has already labelled everything. How can I get both model_definitions labelled? It would be nice if model_definition held a list of every model definition found.

Just in case, here is some more of my code:

model_definition = Group(identifier.setResultsName('random_variable_name') + '~' + expression).setResultsName('model_definition')
expression << Or([function_application, number, identifier, list_literal, probability_expression])
statement = Optional(newline) + Or([model_definition, assignment, function_application]) + Optional(newline)
line = OneOrMore('\n').suppress()
comment = Group('#' + SkipTo(newline)).suppress()
program = OneOrMore(Or([line, statement, comment]))
ast = program.parseString(input_string)
return ast

回答1:

Not documented that I know of, but I found something in pyparsing.py:

I changed .setResultsName('model_definition') to .setResultsName('model_definition*') and they listed correctly!

Edit: it is documented, but it is a flag you pass to setResultsName:

setResultsName( string, listAllMatches=False ) - name to be given to tokens matching the element; if multiple tokens within a repetition group (such as ZeroOrMore or delimitedList) the default is to return only the last matching token - if listAllMatches is set to True, then a list of matching tokens is returned.

回答2:

Here is enough of your code to get things to work:

from pyparsing import *

# fake in the bare minimum to parse the given test strings
identifier = Word(alphas, alphanums)
integer = Word(nums)
function_call = identifier + '(' + Optional(delimitedList(identifier | integer)) + ')'
expression = function_call

model_definition = Group(identifier.setResultsName('random_variable_name') + '~' + expression)

sample = """
x ~ normal(mu, 1)
y ~ normal(mu2, 1)
"""

The trailing '*' is there in setResultsName for those cases where you use the short form of setResultsName: expr("name*") vs expr.setResultsName("name", listAllMatches=True). If you prefer calling setResultsName, then I would not use the '*' notation, but would pass the listAllMatches argument.

If you are getting names that step on each other, you may need to add a level of Grouping. Here is your solution using listAllMatches=True, by virtue of the trailing '*' notation:

model_definition1 = model_definition('model_definition*')
print OneOrMore(model_definition1).parseString(sample).dump()

It returns this parse result:

[['x', '~', 'normal', '(', 'mu', '1', ')'], ['y', '~', 'normal', '(', 'mu2', '1', ')']]
- model_definition: [['x', '~', 'normal', '(', 'mu', '1', ')'], ['y', '~', 'normal', '(', 'mu2', '1', ')']]
  [0]:
    ['x', '~', 'normal', '(', 'mu', '1', ')']
    - random_variable_name: x
  [1]:
    ['y', '~', 'normal', '(', 'mu2', '1', ')']

Here is a variation that does not use listAllMatches, but adds another level of Group:

model_definition2 = model_definition('model_definition')
print OneOrMore(Group(model_definition2)).parseString(sample).dump()

gives:

[[['x', '~', 'normal', '(', 'mu', '1', ')']], [['y', '~', 'normal', '(', 'mu2', '1', ')']]]
[0]:
  [['x', '~', 'normal', '(', 'mu', '1', ')']]
  - model_definition: ['x', '~', 'normal', '(', 'mu', '1', ')']
    - random_variable_name: x
[1]:
  [['y', '~', 'normal', '(', 'mu2', '1', ')']]
  - model_definition: ['y', '~', 'normal', '(', 'mu2', '1', ')']
    - random_variable_name: y

In both cases, I see the full content being returned, so I don't quit understand what you mean by "if you return multiple, it fails to split out each child."

来源：https://stackoverflow.com/questions/37329296/pyparsing-setresultsname-for-multiple-elements-get-combined

标签

python

pyparsing