Pyparsing OR operation use shortest string when more than two match

徘徊边缘 提交于 2019-12-12 23:40:57

问题


I need to parse some statements but want the flexibility of using multiple words to signal the of the statement.

eg.

string = """
start some statement end
other stuff in between
start some other statement.
other stuff in between
start another statement
"""

in this case end, . and end of line are the tokens that will signal the end of the statement I am looking for.

I tried the following:

from pyparsing import restOfLine, SkipTo

skip_to_end_of_line = restOfLine
skip_to_dot = SkipTo('.', include=False)
skip_to_end = SkipTo('end', include=False)

statement = 'start' + skip_to_end_of_line^skip_to_dot^skip_to_end

statement.searchString(string)

([(['start some statement end\nother stuff in between\nstart some other statement'], {}), (['start', ' another statement'], {})], {})

By using the OR function it returns the largest string if there are more than two matches, I would like OR to return the shortest string resulting in

([(['start', ' some statement end'], {}), (['start', ' some other statement.'], {}), (['start', ' another statement'], {})], {})

回答1:


SkipTo is one of the less predictable features of pyparsing, as it is easy for input data to be such that it results in more or less skipping than desired.

Try this instead:

term = LineEnd().suppress() | '.' | 'end'
statement = 'start' + OneOrMore(~term + Word(alphas)) + term

Instead of skipping blindly, this expression iteratively finds words, and stops when it finds one of your terminating conditions.

If you want the actual body string instead of the collection of words, you can use originalTextFor:

statement = 'start' + originalTextFor(OneOrMore(~term + Word(alphas))) + term


来源:https://stackoverflow.com/questions/38640948/pyparsing-or-operation-use-shortest-string-when-more-than-two-match

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!