Is the ANTLR parser generator best for a C++ app with constrained memory?

喜你入骨 提交于 2019-12-20 14:11:36

问题


I'm looking for a good parser generator that I can use to read a custom text-file format in our large commercial app. Currently this particular file format is read with a handmade recursive parser but the format has grown and complexified to the point where that approach has become unmanageable.

It seems like the ultimate solution would be to build a proper grammar for this format and then use a real parser generator like yacc to read it, but I'm having trouble deciding which such generator to use or even if they're worth the trouble at all. I've looked at ANTLR and Spirit, but our project has specific constraints beyond earlier answers that make me wonder if they're as appropriate for us. In particular, I need:

  • A parser that generates C or C++ code with MSVC. ANTLR 3 doesn't support C++; it claims to generate straight C but the docs on getting it to actually work are sort of confusing.
  • Severely constrained memory usage. Memory is at a huge premium in our app and even tiny leaks are fatal. I need to be able to override the parser's memory allocator to use our custom malloc(), or at the very least I need to give it a contiguous pool from which it draws all its memory (and which I can deallocate en bloc afterwards). I can spare about 200kb for the parser executable itself, but whatever dynamic heap it allocates in parsing has to get freed afterwards.
  • Good performance. This is less critical but we ought to be able to parse 100kb of text in no more than a second on a 3ghz processor.
  • Must be GPL-free. We can't use GNU code.

I like ANTLRworks' IDE and debugging tools, but it looks like getting its C target to actually work with our app will be a huge undertaking. Before I embark on that palaver, is ANTLR the right tool for this job?

The text format in question looks something like:

attribute "FluxCapacitance"  real constant

asset DeLorean
{
    //comment foo bar baz
    model "delorean.mdl"
    animation "gullwing.anm"
    references "Marty"
    loadonce
}

template TimeMachine
{
    attribute FluxCapacitance 10      
    asset DeLorean
}

回答1:


ANTLR 3 doesn't support C++; it claims to generate straight C but the docs on getting it to actually work are sort of confusing.

It does generate C, and furthermore, it works with Visual Studio and C++. I know this because I've done it before and submitted a patch to get it to work with stdcall.

Memory is at a huge premium in our app and even tiny leaks are fatal. I need to be able to override the parser's memory allocator to use our custom malloc(), or at the very least I need to give it a contiguous pool from which it draws all its memory (and which I can deallocate en bloc afterwards). I can spare about 200kb for the parser executable itself, but whatever dynamic heap it allocates in parsing has to get freed afterwards.

The antlr3c runtime, last time I checked does not have a memory leak, and uses the Memory pool paradigm which you describe. However, it does have one shortcoming in the API which the author refuses to change, which is that if you request the string of a node, it will create a new copy each time until you free the entire parser.

I have no comment on the ease of using a custom malloc, but it does have a macro to define what malloc function to use in the entire project.

As for the executable size, my compilation was about 100 kb in size including a small interpreter.

My suggestion to you is to keep learning ANTLR, because it still fits your requirements and you probably need to sacrifice a little more time before it will start working for you.




回答2:


We use Boost Spirit successfully in our application. The Boost license is a very liberal one, so there is no problem using it in commercial applications.

Quote from the documentation:

Spirit is an object-oriented recursive-descent parser generator framework implemented using template meta-programming techniques. Expression templates allow us to approximate the syntax of Extended Backus-Normal Form (EBNF) completely in C++. The Spirit framework enables a target grammar to be written exclusively in C++. Inline EBNF grammar specifications can mix freely with other C++ code and, thanks to the generative power of C++ templates, are immediately executable. In retrospect, conventional compiler-compilers or parser-generators have to perform an additional translation step from the source EBNF code to C or C++ code.




回答3:


A hand-coded recursive descent parser is actually quite fast and can be very compact. The only downside is you have to be careful to code essentially LL(1) grammars. [If you use ANTLR, you have similar restrictions so this isn't that big a deal].

You can hand code such parsers as plain recursive C code. (See this answer for complete details: Is there an alternative for flex/bison that is usable on 8-bit embedded systems?)

If you are really tight on space, you can define a parsing virtual machine, and build a tiny C interpreter to run it. I used to build BASIC interpreters this way back in the early 70s.

By sticking to the very simple conventions that make these parsers actually work, you can guarantee that there is no memory leak caused by the parsing machinery. (Of course, you may attach arbitrary actions to the parser where it recognizes items of interest; whether those actions leak is a matter of general programming, not the parser).

The ideas came from a 1964 paper on metacompilers by Val Schorre, who shows how to build complete compilers in 10 pages. Shorre's tiny parser generator produces pretty good recursive descent parsers. A site describing this paper and showing precisely how to build such parsers can be found at http://www.bayfronttechnologies.com/metaii.html

I used Schorre's methods to build Basic compilers in the late 70s, after I got tired of hand-coding complex grammars.




回答4:


Please have a look at the new C++ target I have posted for ANTLR. It also has the option to restrict the memory usage of the parser, and it exposes all the necessary memory management routines in the form of traits.

http://www.antlr.org/wiki/pages/viewpage.action?pageId=29130826




回答5:


Then why don't you use flex/yacc? It generates C code,can be run from MSVC, was developped with efficiency in mind, can have malloc overriden (google for yymalloc), they are themselves GPL, but the resulting code (the code you use in your project) AFAIK not.

Or use a hand-made parser.




回答6:


ANTLR parsers, and in fact any parser built with something LALR or the like, tend to be big. Do you have an actual grammar for this? It looks like it might be most readily parsed with a hand-written recursive-descent parser, but it's not much of a sample.

Oops, my mistake, as ANTLR apparently generates recursive-descent. Still, I've had problems with ANTLR generating big parsers.




回答7:


Realistically, if your grammar is relatively small, and doesn't contain many ambiguities or parsing complexities then it shouldn't matter if you use a recursive descent parser or a shift-reduce parser.

I would say look at ANTLR and Spirit, also look at Flex and Bison. There are other, less known parsers like Coco/R (contains generators for a lot of languages, including C++) also.



来源:https://stackoverflow.com/questions/770577/is-the-antlr-parser-generator-best-for-a-c-app-with-constrained-memory

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!