pyparsing: skip to the next token ignoring everything in between

醉酒当歌 提交于 2019-12-11 07:25:50

问题


I am trying to parse a log file that contains multiple entries with the following format:

ITEM_BEGIN item_name
  some_text

some_text may optionally contain an expression matched by my_expr anywhere within itself. I am only interested in item_name and my_expr (or None if it is missing). Ideally, what I want is a list of (item_name, my_expr) pairs. What is the best way to extract this information using pyparsing?


回答1:


If you are not trying to define a parser for the entire input text, but only some pieces of it, look into using pyparsing's searchString or scanString methods - something along these lines:

import pyparsing as pp
ident = Word(alphas, alphanums+'_')
item_header = pp.Keyword("ITEM_BEGIN") + ident("name")
other_expr = ... whatever ...

search_expr = item_header | other_expr

found = {}
current_name = ''
for result in search_expr.searchString(input_text):
    result = result[0]
    if result[0] == "ITEM_BEGIN":
        print("found an item header with name {name}".format_map(result))
        current_name = result.name
        found[result.name] = []
    else:
        # found an other expr
        found[current_name].append(result.asList())


来源:https://stackoverflow.com/questions/43075547/pyparsing-skip-to-the-next-token-ignoring-everything-in-between

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!