Writing grammar rules for context sensitive elements using Pyparsing

给你一囗甜甜゛ 提交于 2019-12-04 16:10:31

Making some guesses about your grammar, here is a rough stab. Notice how I separately define the line expressions from the phrase expressions:

from pyparsing import (CaselessKeyword, Word, alphas, MatchFirst, quotedString, 
    infixNotation, opAssoc, Suppress, Group)


LINE_CONTAINS, LINE_STARTSWITH, LINE_ENDSWITH = map(CaselessKeyword,
    """LINE_CONTAINS LINE_STARTSWITH LINE_ENDSWITH""".split())
NOT, AND, OR = map(CaselessKeyword, "NOT AND OR".split())
BEFORE, AFTER, JOIN = map(CaselessKeyword, "BEFORE AFTER JOIN".split())

keyword = MatchFirst([LINE_CONTAINS, LINE_STARTSWITH, LINE_ENDSWITH, NOT, AND, OR, 
                      BEFORE, AFTER, JOIN])
phrase_word = ~keyword + Word(alphas + '_')

phrase_term = phrase_word | quotedString

phrase_expr = infixNotation(phrase_term,
                            [
                             ((BEFORE | AFTER | JOIN), 2, opAssoc.LEFT,),
                             (NOT, 1, opAssoc.RIGHT,),
                             (AND, 2, opAssoc.LEFT,),
                             (OR, 2, opAssoc.LEFT),
                            ],
                            lpar=Suppress('{'), rpar=Suppress('}')
                            )

line_term = Group((LINE_CONTAINS | LINE_STARTSWITH | LINE_ENDSWITH)("line_directive") + 
                  Group(phrase_expr)("phrase"))
line_contents_expr = infixNotation(line_term,
                                   [(NOT, 1, opAssoc.RIGHT,),
                                    (AND, 2, opAssoc.LEFT,),
                                    (OR, 2, opAssoc.LEFT),
                                    ]
                                   )

sample = """
LINE_CONTAINS transfected BEFORE {sirna} AND gene AND LINE_STARTSWITH Therefore
"""

line_contents_expr.runTests(sample)

parses your sample as:

LINE_CONTAINS transfected BEFORE {sirna} AND gene AND LINE_STARTSWITH Therefore
[[['LINE_CONTAINS', [[['transfected', 'BEFORE', 'sirna'], 'AND', 'gene']]], 'AND', ['LINE_STARTSWITH', ['Therefore']]]]
[0]:
  [['LINE_CONTAINS', [[['transfected', 'BEFORE', 'sirna'], 'AND', 'gene']]], 'AND', ['LINE_STARTSWITH', ['Therefore']]]
  [0]:
    ['LINE_CONTAINS', [[['transfected', 'BEFORE', 'sirna'], 'AND', 'gene']]]
    - line_directive: 'LINE_CONTAINS'
    - phrase: [[['transfected', 'BEFORE', 'sirna'], 'AND', 'gene']]
      [0]:
        [['transfected', 'BEFORE', 'sirna'], 'AND', 'gene']
        [0]:
          ['transfected', 'BEFORE', 'sirna']
        [1]:
          AND
        [2]:
          gene
  [1]:
    AND
  [2]:
    ['LINE_STARTSWITH', ['Therefore']]
    - line_directive: 'LINE_STARTSWITH'
    - phrase: ['Therefore']

The phrase_word starts with a negative lookahead, to avoid accidentally treating strings like 'LINE_STARTSWITH' as phrase words. I also added quoted strings as valid phrase words, since you never know when your search will have to actually include the string "LINE_STARTSWITH".

You use {}s for grouping in your phrase expressions, infixNotation has optional lpar and rpar arguments to override the defaults of ( and ).

From here, you can look at other infixNotation examples (such as SimpleBool.py on the pyparsing wiki examples page) to convert this into your respective regex-generating code.

This seems to me to be a very simplistic grammar. I think you are "overthinking" the problem.

Looking at your examples, I see this:

a JOIN b
a BEFORE b

a AND b
a OR b

STARTSWITH a

Those are simply operators. Unary operators (STARTSWITH) are like ~x or -x in python. Binary operators (JOIN, BEFORE, AND, OR) are like x + y or x in y in python.

I don't think CONTAINS is an operator, so much as a place-holder. Pretty much everything except STARTSWITH is implicitly a contains. So that's kind of like the unary-plus operator: defined, understood, allowed, but useless.

Anyway, figure out what the operators are (make a list). Figure out whether they are unary (startswith) or binary (and). Then figure out what their precedence and associativity are.

Once you know that information, you can build your parser: you will know the key words, and know how to arrange the key words in a grammar.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!