问题
I'm learning how to use Boost.Spirit library for parsing strings. It seems to be a very nice tool but difficult as well. So, I want to parse a string with some words separated with /
and put them in a vector of strings. Here is an example:word1/word2/word3
. That's a simple task, I can do this with the following finction:
bool r = phrase_parse(first, last, (+~char_("/") % qi::lit("/")),space,v)
where v
is std::vector<std::string>
. But in general, I'd like to parse something like w1/[w2/w3]2/w4
which is equivalent to w1/w2/w3/w2/w3/w4
, that is [w2/w3]2
means that w2/w3
is repeated twice. Could anyone give me some ideas on that? I read the documentation but still have some problems.
Thank you in advance!
回答1:
Fully working demo: live on Coliru
What this adds over a naive approach is that raw
values are optionally ended at ]
if the state is in_group
.
I elected pass the state using an inherited attribute (bool
).
This implementation allows nested sub-groups as well, e.g.: "[w1/[w2/w3]2/w4]3"
#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
namespace phx = boost::phoenix;
int main()
{
typedef std::string::const_iterator It;
const std::string input = "[w1/[w2/w3]2/w4]3";
std::vector<std::string> v;
It first(input.begin()), last(input.end());
using namespace boost::spirit::qi;
rule<It, std::string(bool in_group)> raw;
rule<It, std::vector<std::string>(bool in_group), space_type>
group,
delimited;
_r1_type in_group; // friendly alias for the inherited attribute
raw = eps(in_group) >> +~char_("/]")
| +~char_("/");
delimited = (group(in_group)|raw(in_group)) % '/';
group = ('[' >> delimited(in_group=true) >> ']' >> int_)
[ phx::while_(_2--)
[ phx::insert(_val, phx::end(_val), phx::begin(_1), phx::end(_1)) ]
];
BOOST_SPIRIT_DEBUG_NODES((raw)(delimited)(group));
bool r = phrase_parse(first, last,
delimited(false),
space,v);
if (r)
std::copy(v.begin(), v.end(), std::ostream_iterator<std::string>(std::cout, "\n"));
}
Prints:
w1
w2
w3
w2
w3
w4
w1
w2
w3
w2
w3
w4
w1
w2
w3
w2
w3
w4
(besides debug info)
回答2:
This is my quick implementation ( c++11 ). You can find a lot of scenarios how to tackle various problems in boost-spirit-qi and I agree learning SPIRIT takes some effort :-)
#define BOOST_RESULT_OF_USE_DECLTYPE
#define BOOST_SPIRIT_USE_PHOENIX_V3
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <iostream>
#include <algorithm>
#include <iterator>
#include <string>
struct SInsert
{
struct result
{
typedef void type;
};
void operator()( std::vector<std::string>&out,
std::vector<std::string>&in, int counter ) const
{
for( int i=0; i<counter; ++i )
std::copy( in.begin(), in.end(), std::back_inserter(out) );
}
};
boost::phoenix::function<SInsert> inserter;
int main()
{
namespace qi = boost::spirit::qi;
namespace ph = boost::phoenix;
namespace ascii = boost::spirit::ascii;
for ( auto &str : std::vector< std::string >
{ "w1/ w2 /w4 ",
"[w2]1 /w4 ",
"[w2/w3]2 /w4 ",
"[]0",
"[]0 / w4"
}
)
{
std::cout << "input:" << str << std::endl;
std::string::const_iterator iter( str.begin() );
std::string::const_iterator last( str.end() );
std::vector< std::string > v;
qi::rule<std::string::const_iterator,
qi::locals< std::vector<std::string> >,
ascii::space_type ,std::vector<std::string>()> mrule =
( qi::as_string[ qi::lexeme[ +(qi::graph -"/"-"[") ] ][ ph::push_back( qi::_val,qi::_1 )] |
(
qi::lit("[")
>> -(
qi::eps[ ph::clear( qi::_a ) ]
>> qi::as_string[ qi::lexeme[ +(qi::graph-"/"-"]") ] ][ ph::push_back( qi::_a ,qi::_1 ) ]
% qi::lit("/")
)
)
>> qi::lit("]" )
>> qi::int_[ inserter( qi::_val,qi::_a,qi::_1 ) ]
)
% qi::lit("/");
if( qi::phrase_parse( iter, last, mrule , ascii::space, v ) && iter==last )
std::copy( v.begin(), v.end(),
std::ostream_iterator<std::string>( std::cout,"\n" ));
else
std::cerr << "parsing failed:" << *iter << std::endl;
}
return 0;
}
You can further simplify the mrule
so that the attributes are synthesized automatically rather than using semantic actions - even though you wont avoid them altogether:
qi::rule<std::string::const_iterator,
qi::locals< std::vector<std::string> >,
ascii::space_type ,std::vector<std::string>()> mrule;
mrule %=
(
qi::as_string[ qi::lexeme[ +(qi::graph -"/"-"[") ] ] |
qi::lit("[")
>> -(
qi::eps[ ph::clear( qi::_a ) ]
>> qi::as_string[ qi::lexeme[ +(qi::graph-"/"-"]") ] ][ ph::push_back( qi::_a ,qi::_1 ) ]
% qi::lit("/")
)
>> qi::lit("]" )
>> qi::omit[ qi::int_[ inserter( qi::_val,qi::_a,qi::_1-1 ) ] ]
)
% qi::lit("/");
As sehe
pointed to some ugly constructs, here is minor simplification:
qi::rule<std::string::const_iterator,
qi::locals< std::vector<std::string> >,
ascii::space_type ,std::vector<std::string>()> mrule;
mrule %= (
qi::as_string[ qi::lexeme[ +qi::alnum ] ] |
qi::lit("[")
>> -(
qi::eps[ ph::clear( qi::_a ) ] >>
qi::as_string[ qi::lexeme[ +qi::alnum ] ][ ph::push_back( qi::_a ,qi::_1 ) ]
% qi::lit("/")
)
>> qi::lit("]")
>> qi::omit[ qi::int_[ inserter( qi::_val,qi::_a,qi::_1-1 ) ] ]
) % qi::lit("/");
来源:https://stackoverflow.com/questions/18377380/parsing-a-simple-repeated-text-macro-with-boost-spirit