Spirit parser segfaults

丶灬走出姿态 提交于 2019-12-12 04:38:14

问题


I'm getting a segfault when I run this. It looks like the debug prints, but when I debug it I just get an endless loop of backtrace. If anyone can help point me in the right direction, I'd appreciate it. I'd also appreciate, if possible any tips/tricks for cleaning up this grammar.

Thanks!

//code here:
/***
*I-EBNF parser
*
*This defines a grammar for BNF.
*/

//Speeds up compilation times.
//This is a relatively small grammar, this is useful.
#define BOOST_SPIRIT_NO_PREDEFINED_TERMINALS
#define BOOST_SPIRIT_QI_DEBUG

#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <boost/fusion/adapted.hpp>
#include <boost/fusion/support.hpp>
#include <vector>
#include <string>
#include <iostream>

namespace Parser
{

namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;

enum class RHSType
{
    Terminal, Identifier
};
struct RHS
{
    RHSType type;
    std::string value;
};
struct Rule
{
    std::string identifier; //lhs
    std::vector<RHS> rhs;
};
}

//expose our structs to fusion:
BOOST_FUSION_ADAPT_STRUCT(
    Parser::RHS,
    (Parser::RHSType, type)
    (std::string, value)
)
BOOST_FUSION_ADAPT_STRUCT(
    Parser::Rule,
    (std::string, identifier)
    (std::vector<Parser::RHS>, rhs)
)

namespace Parser
{
typedef std::vector<Rule> RuleList;

//our grammar definition
template <typename Iterator>
struct Grammar: qi::grammar<Iterator, std::list<Rule>, ascii::space_type>
{
    Grammar(): Grammar::base_type(rules)
    {
        qi::char_type char_;

        letter = char_("a-zA-Z");
        digit = char_('0', '9');
        symbol = char_('[') | ']' | '[' | ']' | '(' | ')' | '<' | '>'
| '\'' | '\"' | '=' | '|' | '.' | ',' | ';';
        character = letter | digit | symbol | '_';
        identifier = letter >> *(letter | digit | '_');
        terminal = (char_('\'') >> character >> *character >>
char_('\'')) | (char_('\"') >> character >> *character >> char_('\"'));
        lhs = identifier;
        rhs = terminal | identifier | char_('[') >> rhs >> char_(']')
| char_('{') >> rhs >> char_('}') | char_('(') >> rhs >> char_(')') |
rhs >> char_('|') >> rhs | rhs >> char_(',') >> rhs;
        rule = identifier >> char_('=') >> rhs;
        rules = rule >> *rule;
    }

private:
    qi::rule<Iterator, char(), ascii::space_type> letter, digit,
symbol, character;
    qi::rule<Iterator, std::string(), ascii::space_type> identifier,
lhs, terminal;
    qi::rule<Iterator, RHS, ascii::space_type> rhs;
    qi::rule<Iterator, Rule, ascii::space_type> rule;
    qi::rule<Iterator, std::list<Rule>, ascii::space_type> rules;
};

}

int main()
{
    Parser::Grammar<std::string::const_iterator> parser;
    boost::spirit::ascii::space_type space;
    std::string input;
    std::vector<std::string> output;
    bool result;

    while (std::getline(std::cin, input))
        {
            if (input.empty())
                {
                    break;
                }
            std::string::const_iterator it, itEnd;
            it = input.begin();
            itEnd = input.end();
            result = phrase_parse(it, itEnd, parser, space, output);
            if (result && it == itEnd)
                {
                    std::cout << "success" << std::endl;
                }
        }

    return 0;
}

¹ cross posted from the [spirit-general] mailing list: http://boost.2283326.n4.nabble.com/parser-segfault-tips-tricks-td4680336.html


回答1:


On 09/26/2015 01:45 AM, Littlefield, Tyler wrote:

Hello all: I'm getting a segfault when I run this. It looks like the debug prints, but when I debug it I just get an endless loop of backtrace. If anyone can help point me in the right direction, I'd appreciate it. I'd also appreciate, if possible any tips/tricks for cleaning up this grammar.

First of all, it doesn't compile.

It shouldn't compile, since the grammar doesn't expose an attribute (did you mean list<Rule>() instead of list<Rule>?).

But you could never assign that to the output variable (which is std::vector<std::string>?!?)

Likewise you forget parentheses here

qi::rule<Iterator, RHS(), ascii::space_type> rhs;
qi::rule<Iterator, Rule(), ascii::space_type> rule;
qi::rule<Iterator, std::list<Rule>(), ascii::space_type> rules;

The rhs rule has infinite left-recursion:

    rhs               = terminal
                      | identifier
                      | ('[' >> rhs >> ']')
                      | ('{' >> rhs >> '}')
                      | ('(' >> rhs >> ')')
                      | (rhs >> '|' >> rhs)  // OOPS
                      | (rhs >> ',' >> rhs)  // OOPS
                      ;

This likely explains the crashes because it leads to stackoverflow.

Notes

The livestream recording (part #1 and part #2) shows exactly the steps I took to cleanup the grammar first, and subsequently make things actually compile-worthy.

There was quite a lot of work there:

  • cleanup: use implicit qi::lit for the interpunction ([], {}, () and =,|,,)
  • use kleene+ instead of a >> *a (in more than one occasion)
  • prefer the % parser to parse... lists
  • I had to "wiggle" the rules around "RHS" a bit; There was the infinite recursion in the last two branches (see // OOPS). I fixed it by introducing a "pure" expression rule (which parses just a single RHS structure. I've renamed this type to Expression.

    The "list" parsing (that recognizes lists of expressions separated by , or |) is moved into the original rhs rule, which I renamed to expr_list to be more descriptive:

    expression        = qi::attr(Parser::ExprType::Terminal)   >> terminal 
                      | qi::attr(Parser::ExprType::Identifier) >> identifier 
                      | qi::attr(Parser::ExprType::Compound)   >> qi::raw [ '[' >> expr_list >> ']' ]
                      | qi::attr(Parser::ExprType::Compound)   >> qi::raw [ '{' >> expr_list >> '}' ] 
                      | qi::attr(Parser::ExprType::Compound)   >> qi::raw [ '(' >> expr_list >> ')' ] 
                      ;
    
    expr_list         = expression % (char_("|,")) // TODO FIXME?
                      ;
    
  • In order for the synthesized attributes to actually transform into the RHS (now: Expression) type, we need to actually expose a RHSType (now: ExprType) value for the first adapted member. You can see in the above lines that we used qi::attr() for this purpose.

  • Modern compilers and boost versions can simplify the BOOST_FUSION_ADAPT_STRUCT invocations considerably:

    BOOST_FUSION_ADAPT_STRUCT(Parser::Expression, type, value)
    BOOST_FUSION_ADAPT_STRUCT(Parser::Rule, identifier, expr_list)
    
  • I "upgraded" some rules to be lexemes, meaning they don't obey a skipper.

    I guessed that I should probably add the space character to the acceptable character set inside a terminal (string literal). If that's not what you want, just remove the last character from char_("[][]()<>\'\"=|.,;_ ");.

  • I also changed the skipper to be blank_type as that will not skip newline characters. You could use this to easily parse multi-line input directly with the same grammar. Do, e.g.:

    rules = rule % qi::eol;
    

See also: Boost spirit skipper issues for additional information regarding skippers, lexemes and their interaction(s).

Without further ado, here is a working example:

Live On Coliru

#define BOOST_SPIRIT_DEBUG

#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/qi.hpp>
#include <iostream>
#include <string>
#include <vector>

namespace Parser
{
    namespace qi    = boost::spirit::qi;
    namespace ascii = boost::spirit::ascii;

    enum class ExprType { Terminal, Identifier, Compound };

    static inline std::ostream& operator<<(std::ostream& os, ExprType type) {
        switch (type) {
            case ExprType::Terminal:   return os << "Terminal";
            case ExprType::Identifier: return os << "Identifier";
            case ExprType::Compound:   return os << "Compound";
        }
        return os << "(unknown)";
    }

    struct Expression { // TODO make recursive (see `boost::make_recursive_variant`)
        ExprType    type;
        std::string value;
    };

    using ExprList = std::vector<Expression>;

    struct Rule {
        std::string identifier; // lhs
        ExprList    expr_list;
    };
}

//expose our structs to fusion:
BOOST_FUSION_ADAPT_STRUCT(Parser::Expression, type, value)
BOOST_FUSION_ADAPT_STRUCT(Parser::Rule, identifier, expr_list)

namespace Parser
{
    typedef std::list<Rule> RuleList;

    //our grammar definition
    template <typename Iterator>
    struct Grammar: qi::grammar<Iterator, RuleList(), ascii::blank_type>
    {
        Grammar(): Grammar::base_type(rules)
        {
            qi::char_type char_;

            symbol            = char_("[][]()<>\'\"=|.,;_ ");
            character         = qi::alpha | qi::digit | symbol;
            identifier        = qi::alpha >> *(qi::alnum | char_('_'));


            // TODO capture strings including interpunction(?)
            terminal          = ('\'' >> +(character - '\'') >> '\'') 
                              | ('\"' >> +(character - '\"') >> '\"');

            expression        = qi::attr(Parser::ExprType::Terminal)   >> terminal 
                              | qi::attr(Parser::ExprType::Identifier) >> identifier 
                              | qi::attr(Parser::ExprType::Compound)   >> qi::raw [ '[' >> expr_list >> ']' ]
                              | qi::attr(Parser::ExprType::Compound)   >> qi::raw [ '{' >> expr_list >> '}' ] 
                              | qi::attr(Parser::ExprType::Compound)   >> qi::raw [ '(' >> expr_list >> ')' ] 
                              ;

            expr_list         = expression % (char_("|,")) // TODO FIXME?
                              ;
                                    // above accepts mixed separators:
                                    //     a, b, c | d, e
                                    //
                                    // original accepted:
                                    //
                                    //     a, b, [ c | d ], e
                                    //     a| b| [ c , d ]| e
                                    //     a| b| [ c | d ]| e
                                    //     a, b, [ c , d ], e

            rule              = identifier >> '=' >> expr_list;
            //rules           = rule % qi::eol; // alternatively, parse multi-line input in one go
            rules             = +rule;

            BOOST_SPIRIT_DEBUG_NODES((rules)(rule)(expr_list)(expression)(identifier)(terminal))
        }

    private:
        qi::rule<Iterator, Expression(),  ascii::blank_type> expression;
        qi::rule<Iterator, ExprList(),    ascii::blank_type> expr_list;
        qi::rule<Iterator, Rule(),        ascii::blank_type> rule;
        qi::rule<Iterator, RuleList(),    ascii::blank_type> rules;
        // lexemes:
        qi::rule<Iterator, std::string()> terminal, identifier;
        qi::rule<Iterator, char()>        symbol, character;
    };

}

int main() {
    using It = std::string::const_iterator;

    Parser::Grammar<It> parser;
    boost::spirit::ascii::blank_type blank;

    std::string input;

    while (std::getline(std::cin, input))
    {
        if (input.empty()) {
            break;
        }

        It it = input.begin(), itEnd = input.end();

        Parser::RuleList output;
        bool result = phrase_parse(it, itEnd, parser, blank, output);
        if (result) {
            std::cout << "success\n";
            for (auto& rule : output) {
                std::cout << "\ntarget: " << rule.identifier << "\n";
                for (auto& rhs : rule.expr_list) {
                    std::cout << "rhs:    " << boost::fusion::as_vector(rhs) << "\n";
                }
            }
        } else {
            std::cout << "parse failed\n";
        }

        if (it != itEnd)
            std::cout << "remaining unparsed: '" << std::string(it, itEnd) << "\n";
    }
}

Prints the output:

success

target: assigned1
rhs:    (Identifier some_var)
rhs:    (Terminal a 'string' value)
rhs:    (Compound [ 'okay', "this", is_another, identifier, { ( "nested" ) } ])
rhs:    (Terminal done)
success

target: assigned2
rhs:    (Compound { a })

And with debug enabled (#define BOOST_SPIRIT_DEBUG):

<rules>
<try>assigned1 = some_var</try>
<rule>
    <try>assigned1 = some_var</try>
    <identifier>
    <try>assigned1 = some_var</try>
    <success> = some_var | "a 'st</success>
    <attributes>[[a, s, s, i, g, n, e, d, 1]]</attributes>
    </identifier>
    <expr_list>
    <try> some_var | "a 'stri</try>
    <expression>
        <try> some_var | "a 'stri</try>
        <terminal>
        <try>some_var | "a 'strin</try>
        <fail/>
        </terminal>
        <identifier>
        <try>some_var | "a 'strin</try>
        <success> | "a 'string' value</success>
        <attributes>[[s, o, m, e, _, v, a, r]]</attributes>
        </identifier>
        <success> | "a 'string' value</success>
        <attributes>[[Identifier, [s, o, m, e, _, v, a, r]]]</attributes>
    </expression>
    <expression>
        <try> "a 'string' value" </try>
        <terminal>
        <try>"a 'string' value" |</try>
        <success> | [ 'okay', "this",</success>
        <attributes>[[a,  , ', s, t, r, i, n, g, ',  , v, a, l, u, e]]</attributes>
        </terminal>
        <success> | [ 'okay', "this",</success>
        <attributes>[[Terminal, [a,  , ', s, t, r, i, n, g, ',  , v, a, l, u, e]]]</attributes>
    </expression>
    <expression>
        <try> [ 'okay', "this", i</try>
        <terminal>
        <try>[ 'okay', "this", is</try>
        <fail/>
        </terminal>
        <identifier>
        <try>[ 'okay', "this", is</try>
        <fail/>
        </identifier>
        <expr_list>
        <try> 'okay', "this", is_</try>
        <expression>
            <try> 'okay', "this", is_</try>
            <terminal>
            <try>'okay', "this", is_a</try>
            <success>, "this", is_another</success>
            <attributes>[[o, k, a, y]]</attributes>
            </terminal>
            <success>, "this", is_another</success>
            <attributes>[[Terminal, [o, k, a, y]]]</attributes>
        </expression>
        <expression>
            <try> "this", is_another,</try>
            <terminal>
            <try>"this", is_another, </try>
            <success>, is_another, identi</success>
            <attributes>[[t, h, i, s]]</attributes>
            </terminal>
            <success>, is_another, identi</success>
            <attributes>[[Terminal, [t, h, i, s]]]</attributes>
        </expression>
        <expression>
            <try> is_another, identif</try>
            <terminal>
            <try>is_another, identifi</try>
            <fail/>
            </terminal>
            <identifier>
            <try>is_another, identifi</try>
            <success>, identifier, { ( "n</success>
            <attributes>[[i, s, _, a, n, o, t, h, e, r]]</attributes>
            </identifier>
            <success>, identifier, { ( "n</success>
            <attributes>[[Identifier, [i, s, _, a, n, o, t, h, e, r]]]</attributes>
        </expression>
        <expression>
            <try> identifier, { ( "ne</try>
            <terminal>
            <try>identifier, { ( "nes</try>
            <fail/>
            </terminal>
            <identifier>
            <try>identifier, { ( "nes</try>
            <success>, { ( "nested" ) } ]</success>
            <attributes>[[i, d, e, n, t, i, f, i, e, r]]</attributes>
            </identifier>
            <success>, { ( "nested" ) } ]</success>
            <attributes>[[Identifier, [i, d, e, n, t, i, f, i, e, r]]]</attributes>
        </expression>
        <expression>
            <try> { ( "nested" ) } ] </try>
            <terminal>
            <try>{ ( "nested" ) } ] |</try>
            <fail/>
            </terminal>
            <identifier>
            <try>{ ( "nested" ) } ] |</try>
            <fail/>
            </identifier>
            <expr_list>
            <try> ( "nested" ) } ] | </try>
            <expression>
                <try> ( "nested" ) } ] | </try>
                <terminal>
                <try>( "nested" ) } ] | "</try>
                <fail/>
                </terminal>
                <identifier>
                <try>( "nested" ) } ] | "</try>
                <fail/>
                </identifier>
                <expr_list>
                <try> "nested" ) } ] | "d</try>
                <expression>
                    <try> "nested" ) } ] | "d</try>
                    <terminal>
                    <try>"nested" ) } ] | "do</try>
                    <success> ) } ] | "done"</success>
                    <attributes>[[n, e, s, t, e, d]]</attributes>
                    </terminal>
                    <success> ) } ] | "done"</success>
                    <attributes>[[Terminal, [n, e, s, t, e, d]]]</attributes>
                </expression>
                <success> ) } ] | "done"</success>
                <attributes>[[[Terminal, [n, e, s, t, e, d]]]]</attributes>
                </expr_list>
                <success> } ] | "done"</success>
                <attributes>[[Compound, [(,  , ", n, e, s, t, e, d, ",  , )]]]</attributes>
            </expression>
            <success> } ] | "done"</success>
            <attributes>[[[Compound, [(,  , ", n, e, s, t, e, d, ",  , )]]]]</attributes>
            </expr_list>
            <success> ] | "done"</success>
            <attributes>[[Compound, [{,  , (,  , ", n, e, s, t, e, d, ",  , ),  , }]]]</attributes>
        </expression>
        <success> ] | "done"</success>
        <attributes>[[[Terminal, [o, k, a, y]], [Terminal, [t, h, i, s]], [Identifier, [i, s, _, a, n, o, t, h, e, r]], [Identifier, [i, d, e, n, t, i, f, i, e, r]], [Compound, [{,  , (,  , ", n, e, s, t, e, d, ",  , ),  , }]]]]</attributes>
        </expr_list>
        <success> | "done"</success>
        <attributes>[[Compound, [[,  , ', o, k, a, y, ', ,,  , ", t, h, i, s, ", ,,  , i, s, _, a, n, o, t, h, e, r, ,,  , i, d, e, n, t, i, f, i, e, r, ,,  , {,  , (,  , ", n, e, s, t, e, d, ",  , ),  , },  , ]]]]</attributes>
    </expression>
    <expression>
        <try> "done"</try>
        <terminal>
        <try>"done"</try>
        <success></success>
        <attributes>[[d, o, n, e]]</attributes>
        </terminal>
        <success></success>
        <attributes>[[Terminal, [d, o, n, e]]]</attributes>
    </expression>
    <success></success>
    <attributes>[[[Identifier, [s, o, m, e, _, v, a, r]], [Terminal, [a,  , ', s, t, r, i, n, g, ',  , v, a, l, u, e]], [Compound, [[,  , ', o, k, a, y, ', ,,  , ", t, h, i, s, ", ,,  , i, s, _, a, n, o, t, h, e, r, ,,  , i, d, e, n, t, i, f, i, e, r, ,,  , {,  , (,  , ", n, e, s, t, e, d, ",  , ),  , },  , ]]], [Terminal, [d, o, n, e]]]]</attributes>
    </expr_list>
    <success></success>
    <attributes>[[[a, s, s, i, g, n, e, d, 1], [[Identifier, [s, o, m, e, _, v, a, r]], [Terminal, [a,  , ', s, t, r, i, n, g, ',  , v, a, l, u, e]], [Compound, [[,  , ', o, k, a, y, ', ,,  , ", t, h, i, s, ", ,,  , i, s, _, a, n, o, t, h, e, r, ,,  , i, d, e, n, t, i, f, i, e, r, ,,  , {,  , (,  , ", n, e, s, t, e, d, ",  , ),  , },  , ]]], [Terminal, [d, o, n, e]]]]]</attributes>
</rule>
<rule>
    <try></try>
    <identifier>
    <try></try>
    <fail/>
    </identifier>
    <fail/>
</rule>
<success></success>
<attributes>[[[[a, s, s, i, g, n, e, d, 1], [[Identifier, [s, o, m, e, _, v, a, r]], [Terminal, [a,  , ', s, t, r, i, n, g, ',  , v, a, l, u, e]], [Compound, [[,  , ', o, k, a, y, ', ,,  , ", t, h, i, s, ", ,,  , i, s, _, a, n, o, t, h, e, r, ,,  , i, d, e, n, t, i, f, i, e, r, ,,  , {,  , (,  , ", n, e, s, t, e, d, ",  , ),  , },  , ]]], [Terminal, [d, o, n, e]]]]]]</attributes>
</rules>
success

target: assigned1
rhs:    (Identifier some_var)
rhs:    (Terminal a 'string' value)
rhs:    (Compound [ 'okay', "this", is_another, identifier, { ( "nested" ) } ])
rhs:    (Terminal done)
<rules>
<try>assigned2 = { a }</try>
<rule>
    <try>assigned2 = { a }</try>
    <identifier>
    <try>assigned2 = { a }</try>
    <success> = { a }</success>
    <attributes>[[a, s, s, i, g, n, e, d, 2]]</attributes>
    </identifier>
    <expr_list>
    <try> { a }</try>
    <expression>
        <try> { a }</try>
        <terminal>
        <try>{ a }</try>
        <fail/>
        </terminal>
        <identifier>
        <try>{ a }</try>
        <fail/>
        </identifier>
        <expr_list>
        <try> a }</try>
        <expression>
            <try> a }</try>
            <terminal>
            <try>a }</try>
            <fail/>
            </terminal>
            <identifier>
            <try>a }</try>
            <success> }</success>
            <attributes>[[a]]</attributes>
            </identifier>
            <success> }</success>
            <attributes>[[Identifier, [a]]]</attributes>
        </expression>
        <success> }</success>
        <attributes>[[[Identifier, [a]]]]</attributes>
        </expr_list>
        <success></success>
        <attributes>[[Compound, [{,  , a,  , }]]]</attributes>
    </expression>
    <success></success>
    <attributes>[[[Compound, [{,  , a,  , }]]]]</attributes>
    </expr_list>
    <success></success>
    <attributes>[[[a, s, s, i, g, n, e, d, 2], [[Compound, [{,  , a,  , }]]]]]</attributes>
</rule>
<rule>
    <try></try>
    <identifier>
    <try></try>
    <fail/>
    </identifier>
    <fail/>
</rule>
<success></success>
<attributes>[[[[a, s, s, i, g, n, e, d, 2], [[Compound, [{,  , a,  , }]]]]]]</attributes>
</rules>
success

target: assigned2
rhs:    (Compound { a })

Lose Ends

There are some TODO's left in the code. Implementing them would require some more effort, but I'm not sure it's actually the right direction to go in, so I'll await feedback :)

One of the elementary TODO's left would be to represent the recursive nature of expressions in the AST proper. Right now, I "copped out" by just inserting the source string for a nested Compound expression.



来源:https://stackoverflow.com/questions/32801103/spirit-parser-segfaults

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!