Trouble doing simple parse in pyparsing

老子叫甜甜 提交于 2020-02-03 23:45:51

问题


I'm having some basic problem using pyparsing. Below is the test program and the output of the run.

aaron-mac:sql aaron$ more s.py

from pyparsing import *

n = Word(alphanums)
a = Group( n | Group( n + OneOrMore( Suppress(",") + n )))
p = Group( a + Suppress(".") )
print a.parseString("first")
print a.parseString("first,second")
print p.parseString("first.")
print p.parseString("first,second.")


aaron-mac:sql aaron$ python s.py
[['first']]
[['first']]
[[['first']]]
Traceback (most recent call last):
 File "s.py", line 15, in <module>
   print p.parseString("first,second.")
 File "/Library/Python/2.6/site-packages/pyparsing.py", line 1032, in parseString
   raise exc
pyparsing.ParseException: Expected "." (at char 5), (line:1, col:6)
aaron-mac:sql aaron$ 

How do I modify the grammar in the test program to parse a list of comma separated names terminated by a period? I looked in the docs and tried to find a live support list, but decided I was most likely to get a response here.


回答1:


The '|' operator creates a MatchFirst expression, in which the alternatives are evaluated until there is a first match.

Pyparsing works purely left-to-right, applying parser expressions to the input string as it can. The only lookahead that pyparsing does is whatever you write into the parser.

In this expression:

a = Group( n | Group( n + OneOrMore( Suppress(",") + n )))

Let's say n is just a literal "X". If this parser was given the input string "X", it would obviously match the leading, lone n expression. If given the string "X,X,X", it would still match just the leading n, because that is the first alternative in the parser.

If you turn the expression around to:

a = Group( Group( n + OneOrMore( Suppress(",") + n )) | n)

then to parse "X" it would first try to match the list, which will fail, and then match the lone n. To parse "X,X,X", the first alternative will be the list expression, which will match.

If you want the longest alternative to match, use the '^' operator, which gives an Or expression. Or will evaluate all the given alternatives, and then select the longest match.

a = Group( n ^ Group( n + OneOrMore( Suppress(",") + n )))

You can also simplify this using the pyparsing helper method delimitedList. Parsing lists of the same expression separated by commas is a common case, so rather than see people have to reinvent expr + ZeroOrMore(Suppress(",") + expr) over and over, I added delimitedList as a standard pyparsing helper. delimitedList("X") would match both "X" and "X,X,X".




回答2:


If you just want to cover the case of a comma separated list of names terminated by period you can use the following:

from pyparsing import *
p = Word(alphanums)+ZeroOrMore(Suppress(",")+Word(alphanums))+Suppress(".")

With this you get the following results:

>>> print p.parseString("first.")
['first']
>>> print p.parseString("first,second.")
['first', 'second']

The other examples in your question fail because they don't end with a period.



来源:https://stackoverflow.com/questions/8234923/trouble-doing-simple-parse-in-pyparsing

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!