Boost Spirit 2 - Symbol expansion and backtracking

六月ゝ 毕业季﹏ 提交于 2019-12-07 19:24:02

问题


I am having some issues specifying and parsing a rather simple grammar.

vertex = char+
edge = vertex " -> " vertex
start = ((vertex | edge) eol)*

input = "a\nb\na -> b\n"

Spirit is doing the following:

"a" -> vertex
"\n" -> eol
-> start

"b" -> vertex
"\n" -> eol
-> start

and

"a" -> vertex
terminate

instead of identifying the edge in the end and parsing the entire input. That is, it could parse the entire input but it's not. Shouldn't it backtrack and attempt to parse using the alternate rule? And thus complete the start rule.

Is it because Spirit uses PEGs? (http://www.boost.org/doc/libs/1_57_0/libs/spirit/doc/html/spirit/abstracts/parsing_expression_grammar.html#spirit.abstracts.parsing_expression_grammar.alternatives)

Minimal working example:

#include <cstdio>
#include <string>
#include <boost/spirit/include/qi.hpp>

namespace qi = boost::spirit::qi;

void print(const std::string & str) {
  printf("%s\n", str.c_str());
}

void print_eol() {
  printf("eol\n");
}

int main() {
  std::string str = "a\nb\na -> b\n";
  std::string::iterator it = str.begin(), begin = str.begin(), end = str.end();

  qi::rule<std::string::iterator, std::string()> vertex = +qi::alnum;
  qi::rule<std::string::iterator, std::string()> edge = vertex >> " -> " >> vertex;
  qi::rule<std::string::iterator> start = (vertex[&print] | edge[&print]) % qi::eol[&print_eol];

  bool matched = parse(it, end, start);

  if (matched) {
    printf("matched\n");
  }
  if (it == end) {
    printf("all\n");
  } else if (it != begin) {
    printf("some\n");
  } else {
    printf("none\n");
  }

  return 0;
}

Output:

$ ./a.out
a
eol
b
eol
a
matched
some

I'm using Boost 1.57.0 and Clang 3.5.1 on MSYS2.


回答1:


It seems that you answered your own question: yes, it's because PEG grammars are inherently greedy, left-to-right matching.

Notes

  • printing from a semantic action is not a fair test, because even when the parser does backtrack (it can continue the next alternative branch on failure of a single expression) the side effect will already have fired. This is a major draw back of using Semantic Actions - unless you're careful (Boost Spirit: "Semantic actions are evil"?)

  • BOOST_SPIRIT_DEBUG and BOOST_SPIRIT_DEBUG_NODES macros are there fopr this purpose and give a lot more information

  • The obvious solutions here would be

    1. reorder the branches:

      start = (edge[&print] | vertex[&print]) % qi::eol[&print_eol];
      

      Live On Coliru

      a
      eol
      b
      eol
      a -> b
      matched all
      
    2. left - factorisation:

      start = (vertex[print] >> -(" -> " >> vertex)[print]) % qi::eol[&print_eol];
      

      Live On Coliru

      a
      eol
      b
      eol
      a
      b
      matched all
      

If you want some real-life ideas beyond satisfying your curiosity about PEG grammars, you can search SO answers for ideas how to parse into separate collections of vertices/edges in roughly a dozen boost-spirit answers on SO, as I remember.



来源:https://stackoverflow.com/questions/28952703/boost-spirit-2-symbol-expansion-and-backtracking

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!