How to write grammar for an expression when it can have many possible forms

北城以北 提交于 2019-12-07 20:34:30

You have a two-tiered grammar here, so you would do best to focus on one tier at a time, which we have covered in some of your other questions. The lower tier is that of the phrase_expr, which will later serve as the argument to the line_directive_expr. So define examples of phrase expressions first - extract them from your list of complete statement samples. Your finished BNF for phrase_expr will have the lowest level of recursion look like:

phrase_atom ::= <one or more types of terminal items, like words of characters 
                 or quoted strings, or *possibly* expressions of numbers of 
                 words or characters>  |  brace + phrase_expr + brace`

(Some other questions: Is it possible to have multiple phrase_items one after another with no operator? What does that indicate? How should it be parsed? interpreted? Should this implied operation be its own level of precedence?)

That will be sufficient to loop back the recursion for your phrase expression - you should not need any other braced_xxx element in your BNF. AND, OR, and JOIN are clearly binary operators - in normal operation precedence, AND's are evaluated before OR's, you can decide for yourself where JOIN should fall in this. Write some sample phrases with no parentheses, with AND and JOIN, and OR and JOIN, and think through what order of evaluation makes sense in your domain.

Once that is done, then line_directive_expr should be simple, since it is just:

line_directive_item ::= line_directive phrase_expr | brace line_directive_expr brace
line_directive_and ::= line_directive_item (AND line_directive_item)*
line_directive_or ::= line_directive_and (OR line_directive_and)*
line_directive_expr ::= line_directive_or

Then when you translate to pyparsing, add Groups and results names a little at a time! Don't immediately Group everything or name everything. Ordinarily I recommend using results names liberally, but in infix notation grammars, lots of results names can just clutter up the results. Let the Group (and ultimately node classes) do the structuring, and the behavior in the node classes will guide you where you want results names. For that matter, the results classes usually get such a simple structure that it is often easier just to do list unpacking in the class init or evaluate methods. Start with simple expressions and work up to complicated ones. (Look at "LINE_STARTSWITH gene" - it is one of your simplest test cases, but you have it as #97?) If you just sort this list by length order, that would be a good rough cut. Or sort by increasing number of operators. But tackling the complex cases before you have the simple ones working, you will have too many options on where a tweak or refinement should go, and (speaking from personal experience) you are as likely to get it wrong as get it right - except when you get it wrong, it just makes fixing the next issue more difficult.

And again, as we have discussed elsewhere, the devil in this second tier is doing the actual interpretation of the various line directive items, since there is an implied order to evaluating LINE_STARTSWITH vs LINE_CONTAINS that overrides the order that they may be found in the initial string. That ball is entirely in your court, since you are the language designer for this particular domain.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!