Boost spirit takes for ever to parse expressions

问题

i am new here working with boost spirit

Reading alot of very good articels for boost Spirit i decide to make an own Parser and run into the Problem that parsing an Expression like this

1+(2+(3+(4+(5+(6+(7+(8)))))))

takes forever on runtime.. making it more simple 1+(2+(3)) works fine. I Looks like the backtracking of the Parser is active. Please give me a hint how to modify the grammer or behaviour to make this run in time.

Here is a bit code from the grammer. I use the "iter_pos" for tracking the Position.

regards Markus

  primary            = functioncall 
                     | constant_double   
                     | constant_integer 
                     | name 
                     | string;

  constant_double   = real_parser< double, strict_ureal_policies<double> >()                    [_val = construct<common_node>(type_const_double, key_value, _1)];
  name              = name_pure_location                                                        [_val = construct<common_node>(type_name, phoenix::bind(&getLocation, _1),key_value, phoenix::bind(&getString, _1))];
  string            = (lexeme[L'"' >> +(boost::spirit::standard_wide::char_ - L'"') >> L'"'])   [_val = construct<common_node>(type_const_string, key_value,phoenix::bind(&makeString, _1))];
  constant_integer  = int_                                                                      [_val = construct<common_node>(type_const_int, key_value, construct<int>(_1))];

  parenthetical =
              lit('(') >> expression >> lit(')')
              | primary;

  unary =
     (iter_pos >> unary_op >> unary >> iter_pos)                            [_val = construct<common_node>(
                                                                                 type_cmd_member_call,
                                                                                 LOCATION(_1,_4),
                                                                                 key_callname, construct<std::wstring>(_2),
                                                                                 key_this,construct<common_node>(_3)
                                                                            )]
    | parenthetical[_val = _1]
    ;

  multiplicative =

    (iter_pos >> unary >> (multiply_op | divide_op | modulo_op) >> multiplicative >> iter_pos)    [_val = construct<common_node>(
                                                                            type_cmd_member_call,
                                                                            LOCATION(_1, _5),
                                                                            key_callname, construct<std::wstring>(_3),
                                                                            key_this, construct<common_node>(_2),
                                                                            key_parameter, construct<common_node>(_4)
                                                                            )]
    | unary[_val = _1];

  additive =

    (iter_pos >> multiplicative >> (add_op | subtract_op) >> additive >> iter_pos) [_val = construct<common_node>(
                                                                                type_cmd_member_call,
                                                                                LOCATION(_1, _5),
                                                                                key_callname, construct<std::wstring>(_3),
                                                                                key_this, construct<common_node>(_2),
                                                                                key_parameter, construct<common_node>(_4)
                                                                              )]
    | multiplicative[_val = _1]
    ;

  compares =

    (iter_pos >> additive >> (compare_op) >> compares >> iter_pos)      [_val = construct<common_node>(
                                                                            type_cmd_member_call,
                                                                            LOCATION(_1, _5),
                                                                            key_callname, construct<std::wstring>(_3),
                                                                            key_this, construct<common_node>(_2),
                                                                            key_parameter, construct<common_node>(_4)
                                                                          )]
    | additive[_val = _1]
    ;

  expression = compares[_val = _1];

回答1:

You correctly identified the source of the trouble: the rules are specified "lazily" (in that they - as they should - descriptively give the productions for the rule).

As you can see, in a PEG grammar this quickly leads to a bad performance if there's a lot of backtracking.

I've already shown optimization of very similar expressions. The summary is this: instead of "expecting" a binary expression and backtracking if it turns out not to be satisfied, parse the first operand "greedily" and compose different AST expression nodes depending on what follows.

Boost spirit poor perfomance with Alternative parser
Similar treatment with more visual explanation Boost::Spirit : Optimizing an expression parser

Perhaps interesting:

Optimizing a boost::spirit::qi parser
https://stackoverflow.com/a/20387628/85371

回答2:

thx for this Input. I try your sample and it works like expected (and fast) .. anyway i was not able to use the solution from "Boost::Spirit : Optimizing an expression Parser" with my sematic actions. Simply don´t know how to place my actions into this sample .. All the other links are a greate source and i also understand the ideas behind the scene but it seems that this is still a workaround of the more general problem of PEG grammer and backtracking. So is there a more General way to solve this? E.g. using the ">" Operator instead of ">>" to avoid backtracking at some point. So far i failed in using the ">" Operator in my sample.

After playing around with the other info on the links i rewrite the rules to this one and now it works fine and fast.

primary = functioncall | constant_double | constant_integer | name | string;

constant_double = real_parser<double, strict_ureal_policies<double>>()
    [_val = construct<common_node>(type_const_double, key_value, _1)];

name = name_pure_location[_val = construct<common_node>(
                              type_name, phoenix::bind(&getLocation, _1),
                              key_value, phoenix::bind(&getString, _1))];

string = (lexeme[L'"' >> +(boost::spirit::standard_wide::char_ - L'"') >> L'"'])
    [_val = construct<common_node>(type_const_string, key_value,
                                   phoenix::bind(&makeString, _1))];

constant_integer = int_[_val = construct<common_node>(type_const_int, key_value,
                                                      construct<int>(_1))];

parenthetical = lit('(') >> expression >> lit(')') | primary;

// this define is used to make to code more readable ..
#define EXPR_ACTION(NAME)                                                      \
  [_val = construct<common_node>(type_cmd_member_call, LOCATION(_1, _3),       \
                                 key_callname, construct<std::wstring>(NAME),  \
                                 key_this, construct<common_node>(_val),       \
                                 key_parameter, construct<common_node>(_2))]

unary = (iter_pos >> unary_op >> unary >>
         iter_pos)[_val = construct<common_node>(
                       type_cmd_member_call, LOCATION(_1, _4), key_callname,
                       construct<std::wstring>(_2), key_this,
                       construct<common_node>(_3))] 
         | parenthetical[_val = _1];

multiplicative = unary[_val = _1] >> 
           *(iter_pos >> lit('*') >> unary >> iter_pos) EXPR_ACTION(L"_mul") >>
           *(iter_pos >> lit('/') >> unary >> iter_pos) EXPR_ACTION(L"_div");

additive = multiplicative[_val = _1] >>
           *(iter_pos >> lit('-') >> multiplicative >> iter_pos) EXPR_ACTION(L"_sub") >>
           *(iter_pos >> lit('+') >> multiplicative >> iter_pos) EXPR_ACTION(L"_add");

compares =
    additive[_val = _1] >>
    *(iter_pos >> lit('<') >> additive >> iter_pos) EXPR_ACTION(L"_cmp_low") >>
    *(iter_pos >> lit('>') >> additive >> iter_pos) EXPR_ACTION(L"_cmp_gre") >>
    *(iter_pos >> lit('=') >> additive >> iter_pos) EXPR_ACTION(L"_cmp_equ");

logical = compares[_val = _1] >> 
    *(iter_pos >> nocaselit(L"and") >> compares >> iter_pos) EXPR_ACTION(L"_and") >>
    *(iter_pos >> nocaselit(L"or")  >> compares >> iter_pos) EXPR_ACTION(L"_or");

expression = logical[_val = _1];

thank you very much for the good Inputs ! Markus

来源：https://stackoverflow.com/questions/41809417/boost-spirit-takes-for-ever-to-parse-expressions

标签

c++

boost

runtime

expression

boost-spirit