问题
I am struggeling writing a identifier parser, which parses a alphanum string which is not a keyword. the keywords are all in a table:
struct keywords_t : x3::symbols<x3::unused_type> {
keywords_t() {
add("for", x3::unused)
("in", x3::unused)
("while", x3::unused);
}
} const keywords;
and the parser for a identifier should be this:
auto const identifier_def =
x3::lexeme[
(x3::alpha | '_') >> *(x3::alnum | '_')
];
now i try to combine these so an identifier parser fails on parsing a keyword. I tried it like this:
auto const identifier_def =
x3::lexeme[
(x3::alpha | '_') >> *(x3::alnum | '_')
]-keywords;
and this:
auto const identifier_def =
x3::lexeme[
(x3::alpha | '_') >> *(x3::alnum | '_') - keywords
];
it works on most inputs but if a string starts with a keyword like like int, whilefoo, forbar
the parser fails to parse this strings.
how can i get this parser correct?
回答1:
Your problem is caused by the semantics of the difference operator in Spirit. When you have a - b
Spirit does the following:
- check whether
b
matches:- if it does,
a - b
fails and nothing is parsed. - if
b
fails then it checks whethera
matches:- if
a
fails,a - b
fails and nothing is parsed. - if
a
succeeds,a - b
succeeds and parses whatevera
parses.
- if
- if it does,
In your case (unchecked_identifier - keyword
) as long as the identifier starts with a keyword, keyword
will match and your parser will fail. So you need to exchange keyword
with something that matches whenever a distinct keyword is passed, but fails whenever the keyword is followed by something else. The not predicate
(!
) can help with that.
auto const distinct_keyword = x3::lexeme[ keyword >> !(x3::alnum | '_') ];
Full Sample (Running on Coliru):
//#define BOOST_SPIRIT_X3_DEBUG
#include <iostream>
#include <boost/spirit/home/x3.hpp>
namespace parser {
namespace x3 = boost::spirit::x3;
struct keywords_t : x3::symbols<x3::unused_type> {
keywords_t() {
add("for", x3::unused)
("in", x3::unused)
("while", x3::unused);
}
} const keywords;
x3::rule<struct identifier_tag,std::string> const identifier ("identifier");
auto const distinct_keyword = x3::lexeme[ keywords >> !(x3::alnum | '_') ];
auto const unchecked_identifier = x3::lexeme[(x3::alpha | x3::char_('_')) >> *(x3::alnum | x3::char_('_'))];
auto const identifier_def = unchecked_identifier - distinct_keyword;
//This should also work:
//auto const identifier_def = !distinct_keyword >> unchecked_identifier
BOOST_SPIRIT_DEFINE(identifier);
bool is_identifier(const std::string& input)
{
auto iter = std::begin(input), end= std::end(input);
bool result = x3::phrase_parse(iter,end,identifier,x3::space);
return result && iter==end;
}
}
int main() {
std::cout << parser::is_identifier("fortran") << std::endl;
std::cout << parser::is_identifier("for") << std::endl;
std::cout << parser::is_identifier("integer") << std::endl;
std::cout << parser::is_identifier("in") << std::endl;
std::cout << parser::is_identifier("whileechoyote") << std::endl;
std::cout << parser::is_identifier("while") << std::endl;
}
回答2:
The problem is, that this runs without a lexer, that is, if you write
keyword >> *char_
And put in whilefoo
it will parse while
as keyword
and foo
as the *char_
.
You can prevent that in two ways: either require to have a space after the keyword, i.e.
auto keyword_rule = (keyword >> x3::space);
//or if you use phrase_parse
auto keyword_rule = x3::lexeme[keyword >> x3::space];
The other way you described is also possible, i.e. remove the keyword from the string explicitly (I'd do it that way):
auto string = x3::lexeme[!keyword >> (x3::alpha | '_') >> *(x3::alnum | '_')];
The problem with your definition is, that it will interpret the first set of chars as the keyword, thereby choosing to not parse it at all. The 'x-y' operator means, parse x, but not y. But if you pass 'whilefoo' it will interpret 'while' as the keyword and therefor not parse at all.
来源:https://stackoverflow.com/questions/38039237/parsing-identifiers-except-keywords