Optimizing a boost::spirit::qi parser

╄→гoц情女王★ 提交于 2019-12-06 07:36:18
sehe

The optimizations depend on what you want to achieve. Therefore, I think you're optimizing prematurely.

E.g. parsing a variable_combo as a raw[] input sequence does not make any sense if you want to interpret the symbols later (because you'll have to parse the variable combo again, and you'll even have to anticipate whitespace in there: "foo . bar .tux" is a valid variable combo here).

I have quite a lot of posts in general dealing with optimizing Boost Spirit (start here e.g.). Quick observations here:

  • consider correctness under backtracking; with your grammar parsing 'ceil(3.7') you'll get:

    Expression: ceil(3.7)
    PushInt: 3
    PushInt: ceil
    Remaining: (3.7)
    

    Note how this emits opcodes when parsing failed. Note also, it emits the wrong opcodes

    • it pushes 3 instead of 3.7
    • it pushes ceil as an PushInt?

    So not only does it detect failure to parse too late, it just ignores the parentheses, fails to spot the function call and parses the number wrong.

    Regarding the premature evaluation, I'm going to point to this popular answer: Boost Spirit: "Semantic actions are evil"?

    Other than that, I'm just confirming my suspicion that you're doing premature optimization. Consider doing

    #define BOOST_SPIRIT_DEBUG
    

    and then later, in the grammar constructor:

    BOOST_SPIRIT_DEBUG_NODES(
            (expression)(logical_or_expression)(logical_and_expression)(negate_expression)(series_expression)(single_expression)
            (inclusive_or_expression)(exclusive_or_expression)(and_expression)(equality_expression)(relational_expression)
            (shift_expression)(additive_expression)(multiplicative_expression)(term)(complement_factor)(factor)(result)(integer)
            (variable)(variable_combo)(word)(prefix)
    

    To really see how your parser behaves.

  • consider qi::symbols e.g.:

    qi::symbols<char,const char*> unary_function;
    
    unary_function.add
        ("ceil",    "OP_CEIL")
        ("wrap",    "OP_WRAP")
        ("abs",     "OP_ABS")
        ("count1",  "OP_COUNT1")
        ("pick",    "OP_PICK")
        ("defined", "OP_DEF");
    
    unary_call = (unary_function >> "(" >> expression >> ')') [phx::bind(&fPushOp, qi::_1)];
    
  • traits might leave more potential for the compiler to optimize after inlining (as opposed to semantic actions, since the many levels of template instantiation can obscure some cases, especially when bind is involved)

You may want to make operator precedence table driven here, as some of the spirit samples show. The traditional way to use rule-hierarchy to enforce precedence that you've taken complicates the grammar. This has two key downsides:

  • each rule introduces a virtual dispatch (Spirit X3 may not have this limitation anymore in the future)
  • your grammar got so complicated that you lost the overview already (see first bullet)

Recommendations

I'd suggest

  1. moving away from evaluating during parsing as the semantic actions grow unwieldy, and are very (very) tricky to get right in the face of (late) backtracking (or even parser failures; the latter could be detected easily, but backtracking can also be benign and very hard to correct for when semantic actions have side effects).

  2. start building the grammar from the simplest rule, gradually building it as you add test cases

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!