PyParsing: Is this correct use of setParseAction()?

雨燕双飞 提交于 2019-12-03 03:23:44

This solution memorizes the department when parsed, and emits a (dept,coursenum) tuple when a number is found.

from pyparsing import Suppress,Word,ZeroOrMore,alphas,nums,delimitedList

data = '''\
MSE 2110, 3030, 4102
CSE 1000, 2000, 3000
'''

def memorize(t):
    memorize.dept = t[0]

def token(t):
    return (memorize.dept,int(t[0]))

course = Suppress(Word(alphas).setParseAction(memorize))
number = Word(nums).setParseAction(token)
line = course + delimitedList(number)
lines = ZeroOrMore(line)

print lines.parseString(data)

Output:

[('MSE', 2110), ('MSE', 3030), ('MSE', 4102), ('CSE', 1000), ('CSE', 2000), ('CSE', 3000)]

Is this the right way to do it, or am I totally off?

It's one way to do it, though of course there are others (e.g. use as parse actions two bound method -- so the instance the method belongs to can keep state -- one for the dept code and another for the course number).

The return value of the parseString call is harder to bend to your will (though I'm sure sufficiently dark magic will do it and I look forward to Paul McGuire explaining how;-), so why not go the bound-method route as in...:

from pyparsing import *

DEPT_CODE = Regex(r'[A-Z]{2,}').setResultsName("DeptCode")
COURSE_NUMBER = Regex(r'[0-9]{4}').setResultsName("CourseNumber")

class MyParse(object):
  def __init__(self):
      self.result = None

  def makeCourseList(self, str, location, tokens):
      print "before: %s" % tokens

      dept = tokens[0][0]
      newtokens = [(dept, tokens[0][1])]
      newtokens.extend((dept, tok) for tok in tokens[1:])

      print "after: %s" % newtokens
      self.result = newtokens

course = Group(DEPT_CODE + COURSE_NUMBER).setResultsName("Course")

inst = MyParse()
course_data = (course + ZeroOrMore(Suppress(',') + COURSE_NUMBER)
    ).setParseAction(inst.makeCourseList)
ignore = course_data.parseString("CS 2110, 4301, 2123, 1110")
print inst.result

this emits:

before: [['CS', '2110'], '4301', '2123', '1110']
after: [('CS', '2110'), ('CS', '4301'), ('CS', '2123'), ('CS', '1110')]
[('CS', '2110'), ('CS', '4301'), ('CS', '2123'), ('CS', '1110')]

which seems to be what you require, if I read your specs correctly.

data = '''\
MSE 2110, 3030, 4102
CSE 1000, 2000, 3000'''

def get_courses(data):
    for row in data.splitlines():
        department, *numbers = row.replace(",", "").split()
        for number in numbers:
            yield department, number

This would give a generator for the course codes. A list can be made with list() if need be, or you can iterate over it directly.

Phil Cooper

Sure, everybody loves PyParsing. For easy stuff like this split is sooo much easier to grok:

data = '''\
MSE 2110, 3030, 4102
CSE 1000, 2000, 3000'''

all = []
for row in data.split('\n'):
        klass,num_l = row.split(' ',1)
        all.extend((klass,int(num)) for num in num_l.split(','))
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!