PyParsing Parse nested loop with brace and specific header

淺唱寂寞╮ 提交于 2021-02-19 05:49:07

问题


I found several topics about pyparsing. They are dealing with almost the same problem in parsing nested loop, but even with that, i can't find a solution to my errors.

I have the following format :

key value;
header_name "optional_metadata"
{
     key value;
     sub_header_name
     {
        key value;
     };
};
key value;
  • Key is alphanum
  • Value may be type of Int, String, with alphanum + "@._"
  • key/value may be after a brace block
  • key/value may be in the file before the first brace block
  • key/value before or after a brace block are optionals
  • header may have a name
  • Closing brace is followed by a semi-colon

I used the following parser:

VALID_KEY_CHARACTERS = alphanums
VALID_VALUE_CHARACTERS = srange("[a-zA-Z0-9_\"\'\-\.@]")

lbr = Literal( '{' ).suppress()
rbr = Literal( '}' ).suppress() + Literal(";").suppress()

expr = Forward()
atom = Word(VALID_KEY_CHARACTERS) + Optional(Word(VALID_VALUE_CHARACTERS))
pair = atom | lbr + OneOrMore( expr ) + rbr
expr << Group( atom + pair )

When i use it, i got only the "header_name" and "header_metadata", i modified it, and i got only key/value inside a brace, python exception is triggered to show a parsing error (it expects '}' when reaching the sub_header_name.

Anyone can help me to understand why ? Thank you.


回答1:


I think that the main problem is that your grammar does not fully describe the input, leading to several mismatches. The two main problems I saw was that you forgot that each of your key-pair values must end in a semicolon and did not specify that a key-pair value can come after a closing curly brace. It also looks like the lines:

pair = atom | lbr + OneOrMore( expr ) + rbr
expr << Group( atom + pair )

...would require each set of curly braces to contain, at minimum, two key-pair values or a key-pair value and a set of curly braces. I believe this would cause an error once you encounter the lines:

{
    key value;
};

...within your input, though I'm not entirely certain.

In any case, after playing around with your grammar, I ended up with this:

from pyparsing import *

data = """key1 value1; 
header_name "optional_metadata"
{
     key2 value2;
     sub_header_name
     {
        key value;
     };
};
key3 value3;"""

# I'm reusing the key characters for the header names, which can contain a semicolon
VALID_KEY_CHARACTERS = srange("[a-zA-Z0-9_]")
VALID_VALUE_CHARACTERS = srange("[a-zA-Z0-9_\"\'\-\.@]")

semicolon = Literal(';').suppress()
lbr = Literal('{').suppress()
rbr = Literal('}').suppress()

key = Word(VALID_KEY_CHARACTERS)
value = Word(VALID_VALUE_CHARACTERS)

key_pair = Group(key + value + semicolon)("key_pair")
metadata = Group(key + Optional(value))("metadata")

header = key_pair + Optional(metadata)

expr = Forward()
contents = Group(lbr + expr + rbr + semicolon)("contents")
expr << header + Optional(contents) + Optional(key_pair)

print expr.parseString(data).asXML()

This results in the following output:

<key_pair>
  <key_pair>
    <ITEM>key1</ITEM>
    <ITEM>value1</ITEM>
  </key_pair>
  <metadata>
    <ITEM>header_name</ITEM>
    <ITEM>&quot;optional_metadata&quot;</ITEM>
  </metadata>
  <contents>
    <key_pair>
      <ITEM>key2</ITEM>
      <ITEM>value2</ITEM>
    </key_pair>
    <metadata>
      <ITEM>sub_header_name</ITEM>
    </metadata>
    <contents>
      <key_pair>
        <ITEM>key</ITEM>
        <ITEM>value</ITEM>
      </key_pair>
    </contents>
  </contents>
  <key_pair>
    <ITEM>key3</ITEM>
    <ITEM>value3</ITEM>
  </key_pair>
</key_pair>

I'm not entirely sure if this is exactly what you were trying to accomplish, hopefully it should be close enough that you can tweak it to suit your particular task.



来源:https://stackoverflow.com/questions/23675372/pyparsing-parse-nested-loop-with-brace-and-specific-header

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!