Python parsing bracketed blocks

后端 未结 9 1940
独厮守ぢ
独厮守ぢ 2020-11-27 04:44

What would be the best way in Python to parse out chunks of text contained in matching brackets?

\"{ { a } { b } { { { c } } } }\"

should i

9条回答
  •  庸人自扰
    2020-11-27 05:32

    If you want to use a parser (lepl in this case), but still want the intermediate results rather than a final parsed list, then I think this is the kind of thing you were looking for:

    >>> nested = Delayed()
    >>> nested += "{" + (nested[1:,...]|Any()) + "}"
    >>> split = (Drop("{") & (nested[:,...]|Any()) & Drop("}"))[:].parse
    >>> split("{{a}{b}{{{c}}}}")
    ['{a}{b}{{{c}}}']
    >>> split("{a}{b}{{{c}}}")
    ['a', 'b', '{{c}}']
    >>> split("{{c}}")
    ['{c}']
    >>> split("{c}")
    ['c']
    

    That might look opaque at first, but it's fairly simple really :o)

    nested is a recursive definition of a matcher for nested brackets (the "+" and [...] in the definition keep everything as a single string after it has been matched). Then split says match as many as possible ("[:]") of something that is surrounded by "{" ... "}" (which we discard with "Drop") and contains either a nested expression or any letter.

    Finally, here's a lepl version of the "all in one" parser that gives a result in the same format as the pyparsing example above, but which (I believe) is more flexible about how spaces appear in the input:

    >>> with Separator(~Space()[:]):
    ...     nested = Delayed()
    ...     nested += Drop("{") & (nested[1:] | Any()) & Drop("}") > list
    ...
    >>> nested.parse("{{ a }{ b}{{{c}}}}")
    [[['a'], ['b'], [[['c']]]]]
    

提交回复
热议问题