I am looking for a parser generator for Java that does the following: My language project is pretty simple and only contains a small set of tokens.
Output in pure R
Maybe ANTLR will do it for you. It's a nice parser generator with a fine book available for documentation.
We are using JavaCC for our (as well rather small language) and are happy with it.
I had good experience SableCC.
It works different from most generators, in that you're given a AST/Visitor model that you extend (via inheritance).
I can't comment on the "quality" of its code in terms of readability (it's been a while since I've used it), but it does have the quality that you don't have to read the code at all. Just the code in your subclass.
Maybe you're looking for parser combinators instead of parser generators? See this paper and JParsec.
It's a really bad idea to edit generated parser code--it's a lot easier to edit the grammar file and then recompile it. Unless you're doing it for educational purposes, in which case ANTLR prides itself in generating pretty readable code for such a powerful parser generator.
For a language that simple, JFlex might suffice. It's similar to JLex but faster (which might also mean less readable, but I've not seen JLex's output).
It is a lexer, not a parser, but it is built to interface easily with CUP or BYacc/J. And again, for a simple language, it might be easier to just write your own parser (I've done this before).
You should use Rats... This way, you don't have to separate lexer and parser and then if you want to extend your project that will be trivial. It's in java and then you can process your AST in Java...