问题
I have a text of the form name(sum(value1,sum(value2,value3)), "sumname")
and pyparsing returns the appropiate tokens, however, I am interested in getting the real text back and I cannot find how.
I have tried setParseAction with a function, but since it only returns string and location, I cannot cope with the trailing part. like, I will only get:
"sum(value2,value3)), "sumname")"
"sum(value1,sum(value2,value3)), "sumname")"
"name(sum(value1,sum(value2,value3)), "sumname")"
And this is not ideal, I do not want to reparse the string manually to get the actual original string.
The way I am trying atm is:
tokens = grammar.parseString(target_string)
print >>sys.stderr, pyparsing.originalTextFor(tokens)
But this does not really work:
AttributeError: 'NoneType' object has no attribute 'setParseAction'
回答1:
Wrap your expression in the pyparsing helper originalTextFor
.
from pyparsing import makeHTMLTags, originalTextFor
sample = '<tag attr1="A1" attr2="B3">'
openTag = makeHTMLTags('tag')[0]
# the expression returned by makeHTMLTags parses the tag and
# attributes into a list (along with a series of helpful
# results names)
print (openTag.parseString(sample).asList())
# prints
# ['tag', ['attr1', 'A1'], ['attr2', 'B3'], False]
# wrap in 'originalTextFor' to get back the original source text
print (originalTextFor(openTag).parseString(sample).asList())
# prints
# ['<tag attr1="A1" attr2="B3">']
回答2:
Depending on what you are trying to accomplish by getting the original matching text, you might find better solutions using scanString
or transformString
:
from pyparsing import makeHTMLTags, replaceWith
sample = '<other><div></div><tag attr1="A1" attr2="B3"><something>'
openTag = makeHTMLTags('tag')[0]
# grammar.scanString is a generator, yielding tokens,start,end tuples
# from the start:end values you can slice the original text from the
# source string
for tokens,start,end in openTag.scanString(sample):
print tokens.dump()
print sample[start:end]
# if your goal in getting the original data is to do some kind of string
# replacement, use transformString - here we convert all <TAG> tags to <REPLACE> tags
print openTag.setParseAction(replaceWith("<REPLACE>")).transformString(sample)
prints:
['tag', ['attr1', 'A1'], ['attr2', 'B3'], False]
- attr1: A1
- attr2: B3
- empty: False
- startTag: ['tag', ['attr1', 'A1'], ['attr2', 'B3'], False]
- attr1: A1
- attr2: B3
- empty: False
- tag: tag
- tag: tag
<tag attr1="A1" attr2="B3">
<other><div></div><REPLACE><something>
来源:https://stackoverflow.com/questions/17395193/how-to-get-the-original-text-back-from-a-pyparsing-token