parsing file with curley brakets

问题

I need to parse a file with information separated by curly brackets, for example:

Continent
{
Name    Europe
Country
{
Name    UK
Dog
{
Name    Fiffi
Colour  Gray
}
Dog
{
Name    Smut
Colour  Black
}
}
}

Here is what I have tried in Python

from io import open
from pyparsing import *
import pprint

def parse(s):
    return nestedExpr('{','}').parseString(s).asList()

def test(strng):
    print strng
    try:
        cfgFile = file(strng)
        cfgData = "".join( cfgFile.readlines() )
        list = parse( cfgData )
        pp = pprint.PrettyPrinter(2)
        pp.pprint(list)

    except ParseException, err:
        print err.line
        print " "*(err.column-1) + "^"
        print err

    cfgFile.close()
    print
    return list

if __name__ == '__main__':
    test('testfile')

But this fails with an error:

testfile
Continent
^
Expected "{" (at char 0), (line:1, col:1)

Traceback (most recent call last):
  File "xxx.py", line 55, in <module>
    test('testfile')
  File "xxx.py", line 40, in test
    return list
UnboundLocalError: local variable 'list' referenced before assignment

What do I need to do to make this work? Is another parser than pyparsing better?

回答1:

Recursivity is the key here. Try something around that:

def parse(it):
    result = []
    while True:
        try:
            tk = next(it)
        except StopIteration:
            break

        if tk == '}':
            break
        val = next(it)
        if val == '{':
            result.append((tk,parse(it)))
        else:
            result.append((tk, val))

    return result

The use case:

import pprint       

data = """
Continent
{
Name    Europe
Country
{
Name    UK
Dog
{
Name    Fiffi
Colour  Gray
}
Dog
{
Name    Smut
Colour  Black
}
}
}
"""

r = parse(iter(data.split()))
pprint.pprint(r)

... which produce (Python 2.6):

[('Continent',
  [('Name', 'Europe'),
   ('Country',
    [('Name', 'UK'),
     ('Dog', [('Name', 'Fiffi'), ('Colour', 'Gray')]),
     ('Dog', [('Name', 'Smut'), ('Colour', 'Black')])])])]

Please take this as only starting point, and feel free to improve the code as you need (depending on your data, a dictionary could have been a better choice, maybe). In addition, the sample code does not handle properly ill formed data (notably extra or missing } -- I urge you to do a full test coverage ;)

EDIT: Discovering pyparsing, I tried the following which appears to work (much) better and could be (more) easily tailored for special needs:

import pprint
from pyparsing import Word, Literal, Forward, Group, ZeroOrMore, alphas

def syntax():
    lbr = Literal( '{' ).suppress()
    rbr = Literal( '}' ).suppress()
    key = Word( alphas )
    atom = Word ( alphas )
    expr = Forward()
    pair = atom | (lbr + ZeroOrMore( expr ) + rbr)
    expr << Group ( key + pair )

    return expr

expr = syntax()
result = expr.parseString(data).asList()
pprint.pprint(result)

Producing:

[['Continent',
  ['Name', 'Europe'],
  ['Country',
   ['Name', 'UK'],
   ['Dog', ['Name', 'Fiffi'], ['Colour', 'Gray']],
   ['Dog', ['Name', 'Smut'], ['Colour', 'Black']]]]]

回答2:

Nested expressions are so common, and usually require recursive parser definitions or recursive code if you're not using a parsing library. This code can be daunting for beginners, and error prone even for experts, so that is why I added the nestedExpr helper to pyparsing.

The problem you are having is that your input string has more than just a nested braces expression in it. When I am first trying out a parser, I try to keep the testing as simple as possible - i.e., I inline the sample instead of reading it from a file, for instance.

test = """\
Continent
{
Name    Europe
Country
{
Name    UK
Dog
{
Name    Fiffi
Colour  "light Gray"
}
Dog
{
Name    Smut
Colour  Black
}}}"""

from pyparsing import *

expr = nestedExpr('{','}')

print expr.parseString(test).asList()

And I get the same parsing error that you do:

Traceback (most recent call last):
  File "nb.py", line 25, in <module>
    print expr.parseString(test).asList()
  File "c:\python26\lib\site-packages\pyparsing-1.5.7-py2.6.egg\pyparsing.py", line 1006, in parseString
    raise exc
pyparsing.ParseException: Expected "{" (at char 1), (line:1, col:1)

So looking at the error message (and even at your own debugging code), pyparsing is stumbling on the leading word "Continent", because this word is not the beginning of a nested expression in braces, pyparsing (as we see in the exception message) was looking for an opening '{'.

The solution is to slightly modify your parser to handle the introductory "Continent" label, by changing expr to:

expr = Word(alphas) + nestedExpr('{','}')

Now, printing out the results as a list (using pprint as done in the OP, nice work) looks like:

['Continent',
 ['Name',
  'Europe',
  'Country',
  ['Name',
   'UK',
   'Dog',
   ['Name', 'Fiffi', 'Colour', '"light Gray"'],
   'Dog',
   ['Name', 'Smut', 'Colour', 'Black']]]]

which should match up with your brace nesting.

来源：https://stackoverflow.com/questions/16958087/parsing-file-with-curley-brakets

标签

python

parsing

pyparsing