问题
I need to parse a file with information separated by curly brackets, for example:
Continent
{
Name Europe
Country
{
Name UK
Dog
{
Name Fiffi
Colour Gray
}
Dog
{
Name Smut
Colour Black
}
}
}
Here is what I have tried in Python
from io import open
from pyparsing import *
import pprint
def parse(s):
return nestedExpr('{','}').parseString(s).asList()
def test(strng):
print strng
try:
cfgFile = file(strng)
cfgData = "".join( cfgFile.readlines() )
list = parse( cfgData )
pp = pprint.PrettyPrinter(2)
pp.pprint(list)
except ParseException, err:
print err.line
print " "*(err.column-1) + "^"
print err
cfgFile.close()
print
return list
if __name__ == '__main__':
test('testfile')
But this fails with an error:
testfile
Continent
^
Expected "{" (at char 0), (line:1, col:1)
Traceback (most recent call last):
File "xxx.py", line 55, in <module>
test('testfile')
File "xxx.py", line 40, in test
return list
UnboundLocalError: local variable 'list' referenced before assignment
What do I need to do to make this work? Is another parser than pyparsing better?
回答1:
Recursivity is the key here. Try something around that:
def parse(it):
result = []
while True:
try:
tk = next(it)
except StopIteration:
break
if tk == '}':
break
val = next(it)
if val == '{':
result.append((tk,parse(it)))
else:
result.append((tk, val))
return result
The use case:
import pprint
data = """
Continent
{
Name Europe
Country
{
Name UK
Dog
{
Name Fiffi
Colour Gray
}
Dog
{
Name Smut
Colour Black
}
}
}
"""
r = parse(iter(data.split()))
pprint.pprint(r)
... which produce (Python 2.6):
[('Continent',
[('Name', 'Europe'),
('Country',
[('Name', 'UK'),
('Dog', [('Name', 'Fiffi'), ('Colour', 'Gray')]),
('Dog', [('Name', 'Smut'), ('Colour', 'Black')])])])]
Please take this as only starting point, and feel free to improve the code as you need (depending on your data, a dictionary could have been a better choice, maybe). In addition, the sample code does not handle properly ill formed data (notably extra or missing }
-- I urge you to do a full test coverage ;)
EDIT: Discovering pyparsing
, I tried the following which appears to work (much) better and could be (more) easily tailored for special needs:
import pprint
from pyparsing import Word, Literal, Forward, Group, ZeroOrMore, alphas
def syntax():
lbr = Literal( '{' ).suppress()
rbr = Literal( '}' ).suppress()
key = Word( alphas )
atom = Word ( alphas )
expr = Forward()
pair = atom | (lbr + ZeroOrMore( expr ) + rbr)
expr << Group ( key + pair )
return expr
expr = syntax()
result = expr.parseString(data).asList()
pprint.pprint(result)
Producing:
[['Continent',
['Name', 'Europe'],
['Country',
['Name', 'UK'],
['Dog', ['Name', 'Fiffi'], ['Colour', 'Gray']],
['Dog', ['Name', 'Smut'], ['Colour', 'Black']]]]]
回答2:
Nested expressions are so common, and usually require recursive parser definitions or recursive code if you're not using a parsing library. This code can be daunting for beginners, and error prone even for experts, so that is why I added the nestedExpr
helper to pyparsing.
The problem you are having is that your input string has more than just a nested braces expression in it. When I am first trying out a parser, I try to keep the testing as simple as possible - i.e., I inline the sample instead of reading it from a file, for instance.
test = """\
Continent
{
Name Europe
Country
{
Name UK
Dog
{
Name Fiffi
Colour "light Gray"
}
Dog
{
Name Smut
Colour Black
}}}"""
from pyparsing import *
expr = nestedExpr('{','}')
print expr.parseString(test).asList()
And I get the same parsing error that you do:
Traceback (most recent call last):
File "nb.py", line 25, in <module>
print expr.parseString(test).asList()
File "c:\python26\lib\site-packages\pyparsing-1.5.7-py2.6.egg\pyparsing.py", line 1006, in parseString
raise exc
pyparsing.ParseException: Expected "{" (at char 1), (line:1, col:1)
So looking at the error message (and even at your own debugging code), pyparsing is stumbling on the leading word "Continent", because this word is not the beginning of a nested expression in braces, pyparsing (as we see in the exception message) was looking for an opening '{'.
The solution is to slightly modify your parser to handle the introductory "Continent" label, by changing expr to:
expr = Word(alphas) + nestedExpr('{','}')
Now, printing out the results as a list (using pprint as done in the OP, nice work) looks like:
['Continent',
['Name',
'Europe',
'Country',
['Name',
'UK',
'Dog',
['Name', 'Fiffi', 'Colour', '"light Gray"'],
'Dog',
['Name', 'Smut', 'Colour', 'Black']]]]
which should match up with your brace nesting.
来源:https://stackoverflow.com/questions/16958087/parsing-file-with-curley-brakets