问题
To handle large compile times and reuse of grammars I've composed my grammar into several sub-grammars which are called in sequence. One of them (call it: SETUP grammar) offers some configuration of the parser (via symbols parser), so later sub grammars logically depend on that one (again via different symbols parsers). So, after SETUP is parsed, the symbols parsers of the following sub grammars need to be altered.
My question is, how to approach this efficiently while preserving loose coupling between the sub grammars?
Currently I see only two possibilities:
- The on_success handler of the SETUP grammar, which could do the work, but this would introduce quite some coupling.
- After the SETUP, parse everything into a string, build up a new parser (from the altered symbols) and parse that string in a second step. This would leave quite some overhead.
What I would like to have is a on_before_parse handler, which could be implemented by any grammar which needs to do some work before each parsing. From my point of view, this would introduce less coupling and some setup of the parser could come handy in other situations, too. Is something like this possible?
Update:
Sorry for being sketchy, that wasn't my intention.
The task is to parse an input I with some keywords like #task1
and #task2
. But there will be cases where these keywords need to be different, say $$task1
and $$task2
.
So the parsed file will start with
setup {
#task1=$$task1
#task2=$$task2
}
realwork {
...
}
Some code sketches: Given is a main parser, consisting of several (at least two) parsers.
template<typename Iterator>
struct MainParser: qi::grammar<Iterator, Skipper<Iterator>> {
MainParser() : MainParser::base_type(start) {
start = setup >> realwork;
}
Setup<Iterator> setup;
RealWork<Iterator> realwork;
qi::rule<Iterator, Skipper<Iterator> > start;
}
Setup
and RealWork
are themselves parsers (my sub parsers from above). During the setup part, some keywords of the grammar may be altered, so the setup part has a qi::symbols<char, keywords>
rule. In the beginning these symbols will contain #task1
and #task2
. After parsing the first part of the file, they contain $$task1
and $$task2
.
Since the keywords have changed and since RealWork
needs to parse I, it needs to know about the new keywords. So I have to transfer the symbols from Setup
to RealWork
during the paring of the file.
The two approaches I see are:
- Make the
Setup
aware ofRealWork
and transfer the symbols fromSetup
toRealWork
in theqi::on_success
handler ofSetup
. (bad, coupling) Switch to two parsing steps.
start
ofMainParser
will look likestart = setup >> unparsed_rest
and there will be a second parser afer
MainParser
. Schematically:SymbolTable Table; string Unparsed_Rest; MainParser.parse(Input, (Unparsed_Rest, Table)); RealWordParser.setupFromAlteredSymbolTable(Table); RealWorkParser.parse(Unparsed_Rest);
Overhead of several parsing steps.
So, up to now, attributes are not into play. Just changing the parser at parse time to handle several kinds of input files.
My hope is a handler qi::on_before_parse
like qi::on_success
. From the idea this handler would be triggered each time the parser starts parsing an input. Theoretically just an interception at the beginning of parsing, like we have the interceptions on_success
and on_error
.
回答1:
Sadly, you showed no code, and your description is a bit... sketchy. So here's a fairly generic answer that addresses some of the points I was able to distill from your question:
Separation of concerns
It sounds very much like you need to separate AST building from transformation/processing steps.
Parser composition
Of course you can compose grammars. Simply compose grammars as you would rules and hide the implementation of these grammars in any traditional way you would (pImpl
idiom, const static internal rules, whatever fits the bill).
However, the composition usually doesn't require an 'event' driven element: if you feel the need to parse in two phases, it sounds to me you're just struggling to keep the overview, but recursive descent or PEG grammars are naturally well-suited to describe grammars like that in one swoop (or one pass, if you will).
However, if you find that
(a) your grammar gets complicated
(b) or you want to be able to selectively plugin subgrammars depending on runtime features
You could consider
- The Nabialek trick (I've shown/mentioned this on several occasions in my [tag:boost-spirit] answers on this site
You could build rules dynamically (this is not readily recommended because you'll run in deadly traps having to do with copying Proto expression trees which leads to dangling references). I have also shown some answers doing this on occasion:
- Generating Spirit parser expressions from a variadic list of alternative parser expressions
- C++ Boost qi recursive rule construction
- Boost.Spirit.Qi: dynamically create "difference" parser at parse time
REPEAT: don't try this unless you know how to detect UB and fix things with Proto
Hope these things help you on track. If not, I suggest you come back with a concrete question. I'm much more at home with code than 'ideas' because ideas often mean something else to you than to me.
来源:https://stackoverflow.com/questions/17790320/boost-spirit-setup-sub-grammar-during-parsing