Best parser generator for parsing many small texts in C++?

旧城冷巷雨未停 提交于 2019-12-08 03:10:30

问题


I am, for performance reason, porting a C# library to C++. During normal operation, this library needs, amongst other things, to parse about 150'000 math expressions (think excel formulas) with an average length of less than 150 characters.

In the C# version, I used GOLD parser to generate parsing code. It can parse all 150'000 expressions in under one second.

Because we were thinking about extending our language, I figured the move to C++ might be a good chance to change to ANTLR. I have ported the (simple) grammar to ANTLR and generated C code out of it. Parsing the 150'000 expressions takes over 12 seconds, because for each expression, I need to create a new ANTL3_INPUT_STREAM, token stream, lexer and parser - there is, at least in version 3.4, no way to reuse them.

I'd be grateful is someone could give me a recommendation what to use instead - GOLD is of course an option though generating C++ or C code seems a lot more complicated than the C# variety. My grammar is LALR and LL(1) compatible. Paramount concern is parsing performance on small inputs.


回答1:


I would try boost::spirit. It is often extreamly fast (even for parsing simple things like an integer it can be faster than the C function atoi http://alexott.blogspot.com/2010/01/boostspirit2-vs-atoi.html)

http://boost-spirit.com/home/

It has nice things : header only, so dependency hell, liberal licence.

However be warned that the learning curve is difficult. It's modern C++ (no pointer, but a lot of template and very frustrating compiling errors), so coming from C or C#, you might not be very comfortable.




回答2:


If the grammar to be parsed is simple, you might just write the parser by hand.

Most parser generators are designed to make it easy to whip up a working parser, and execution time often suffers as a result.




回答3:


The best performance I have seen in parsing came from Boost.Spirit.Qi which expresses the grammar in C++ using meta-template programming. It is not for the faint of heart though.

This will need be well insulated, and the compilation time of the file containing the parser will increase to several seconds (so better make sure there is as few as possible there).




回答4:


If the syntax of your expression is simple enough, consider making a hand-written recursive descent parser. It can run really fast, and give you the ability (with enough care) to report nicely and specifically syntax errors.

You could also use bison, but I believe a hand-written recursive parser would probably go faster.

And you can do lexing with a flex generated lexer, and do the parsing manually, in a recursive descent way.

For your information the GCC compiler has its own recursive-descent parsers for C++ & C at least. It does not use parser generators (like bison or ANTLR) anymore.




回答5:


Instead of expr make you grammar recognize sequence-of-expr.

EDIT:

Instead of having (bison syntax):

start: expr { process_expr ($1); }
     ;

have:

start: expr_seq ;

expr_seq:   expr          { process_expr ($1); }
          | expr_seq expr { process_expr ($2); }
          ;



回答6:


I've written many parsers, and hand-coded recursive-descent is the way I do it. They are easy to write and pretty much optimal.

That said, if speed is what you're after, no matter what you write there will be plenty of room to speed it up. These will be in ways that could surprise you, because anything you could think of, you would have already done.

Here's a slide set showing how to do it.



来源:https://stackoverflow.com/questions/8356206/best-parser-generator-for-parsing-many-small-texts-in-c

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!