How best to parse a simple grammar?

前端 未结 5 1123
礼貌的吻别
礼貌的吻别 2020-12-07 15:13

Ok, so I\'ve asked a bunch of smaller questions about this project, but I still don\'t have much confidence in the designs I\'m coming up with, so I\'m going to ask a questi

5条回答
  •  旧时难觅i
    2020-12-07 15:26

    I know that this question is about a decade old and has certainly been answered now. I am mainly posting this answer to prove myself that I have understood PEG parsers at last. I'm using the fantastic parsimonious module here.
    That being said, you could come up with a parsing grammar, build an ast and visit this one to obtain the desired structure:

    from parsimonious.nodes import NodeVisitor
    from parsimonious.grammar import Grammar
    from itertools import groupby
    
    grammar = Grammar(
        r"""
        term            = course (operator course)*
        course          = coursename? ws coursenumber
        coursename      = ~"[A-Z]+"
        coursenumber    = ~"\d+"
        operator        = ws (and / or / comma) ws
        and             = "and"
        or              = (comma ws)? "or"
        comma           = ","
        ws              = ~"\s*"
        """
    )
    
    class CourseVisitor(NodeVisitor):
        def __init__(self):
            self.current = None
            self.courses = []
            self.listnum = 1
    
        def generic_visit(self, node, children):
            pass
    
        def visit_coursename(self, node, children):
            if node.text:
                self.current = node.text
    
        def visit_coursenumber(self, node, children):
            course = (self.current, int(node.text), self.listnum)
            self.courses.append(course)
    
        def visit_or(self, node, children):
            self.listnum += 1
    
    courses = ["CS 2110", "CS 2110 and INFO 3300",
               "CS 2110, INFO 3300", "CS 2110, 3300, 3140",
               "CS 2110 or INFO 3300", "MATH 2210, 2230, 2310, or 2940"]
    
    for course in courses:
        tree = grammar.parse(course)
        cv = CourseVisitor()
        cv.visit(tree)
        courses = [list(v) for _, v in groupby(cv.courses, lambda x: x[2])]
        print(courses)
    

    Here, we walk our way from bottom to top, starting with brickets like whitespace, the operators or, and and , which will eventually lead to the course and finally the term. The visitor class builds the desired (well, kind of, one needs to get rid of the last tuple element) structure.

提交回复
热议问题