antlr | 易学教程

Parsing a templating language

阅读更多关于 Parsing a templating language

I'm trying to parse a templating language and I'm having trouble correctly parsing the arbitrary html that can appear between tags. So far what I have is below, any suggestions? An example of a valid input would be {foo}{#bar}blah blah blah{zed}{/bar}{>foo2}{#bar2}This Should Be Parsed as a Buffer.{/bar2} And the grammar is: grammar g; options { language=Java; output=AST; ASTLabelType=CommonTree; } /* LEXER RULES */ tokens { } LD : '{'; RD : '}'; LOOP : '#'; END_LOOP: '/'; PARTIAL : '>'; fragment DIGIT : '0'..'9'; fragment LETTER : ('a'..'z' | 'A'..'Z'); IDENT : (LETTER | '_') (LETTER | '_' |

ANTLR - Distinguish 'IS NOT NULL' from 'IS NULL'

阅读更多关于 ANTLR - Distinguish 'IS NOT NULL' from 'IS NULL'

问题 How can I differentiate ' IS NOT NULL ' from ' IS NULL '? 'IS' and 'IS NOT' are defined in a parser rule, and 'NULL' in another rule, and the second follows the first. What happens is that when I write 'IS NULL', the Parser is excepting 'IS NOT NULL' because both second words begin with 'N' . How can I distinguish both? Grammar File query : expr EOF -> ^(QUERY expr) ; expr : logical_expr ; logical_expr : equality_expr (logical_op^ equality_expr)* ; equality_expr : ID equality_op atom -> ^

Using ANTLR to generate lexer/parser as streams

阅读更多关于 Using ANTLR to generate lexer/parser as streams

问题 Can I use the ANTLR Java API to generate the lexer/parser as streams and save them somewhere other than some files? Also, is there a simple example of using the API to generate the required files from a given grammar? thanks 回答1: I am not 100% sure if I understand your question correctly, but you might want to have a look at https://stackoverflow.com/a/38052798/5068458. This is an in-memory compiler for antlr grammars that generates lexer and parser for a given grammar in-memory. You do not

Token type depends on following token

阅读更多关于 Token type depends on following token

问题 I am stuck with a pretty simple grammar. Googling and books reading did not help. I started to use ANTLR quite recently, so probably this is a very simple question. I am trying to write a very simple Lexer using ANTLR v3. grammar TestLexer; options { language = Java; } TEST_COMMENT : '/*' WS? TEST WS? '*/' ; ML_COMMENT : '/*' ( options {greedy=false;} : .)* '*/' {$channel=HIDDEN;} ; TEST : 'TEST' ; WS : (' ' | '\t' | '\n' | '\r' | '\f')+ {$channel=HIDDEN;} ; The test class: public class

Antlr superfluous Predicate required?

阅读更多关于 Antlr superfluous Predicate required?

问题 I have a file where I want to ignore parts of it. In the Lexer I use gated semantic predicates to avoid creating tokens for the uninteresting part of the file. My rules are similar to the following. A : {!ignore}?=> 'A' ; START_IGNORE : 'foo' {ignore = true; skip();} ; END_IGNORE : 'oof' {ignore = false; skip();} ; IGNORE : {ignore}?=> . {skip();} ; However unless I change START and END to also use semantic predicates (as below) it does not work.. A : {!ignore}?=> 'A' ; START_IGNORE : {true}?

How does ANTLR decide which lexer rule to apply? The longest matching lexer rule wins?

阅读更多关于 How does ANTLR decide which lexer rule to apply? The longest matching lexer rule wins?

问题 The input content: The grammar: grammar test; p : EOF; Char : [a-z]; fragment Tab : '\t'; fragment Space : ' '; T1 : (Tab|Space)+ ->skip; T2 : '#' T1+ Char+; The matching result is this: [@0,0:6='# abc',<T2>,1:0] <<<<<<<< PLACE 1 [@1,7:6='<EOF>',<EOF>,1:7] line 1:0 extraneous input '# abc' expecting <EOF> Please ignore the error in the last line. I am wondering why the token matched at PLACE 1 is T2 . In the grammar file, the T2 lexer rule goes after the T1 lexer rule. So I expect T1 rule

Tolerating malformed statements with ANTLR (e.g., for code-completion)

阅读更多关于 Tolerating malformed statements with ANTLR (e.g., for code-completion)

问题 I have an ANTLR grammar for a simple DSL, and everything works swimmingly when there are no syntax errors. Now, however, I need to support an auto-completion mechanism, where I need to get possible completions from my tree grammars that perform basic type-checking on attributes, functions, etc. The problem is, ANTLR isn't reporting syntax errors at the local statement level, but farther up the parse tree, e.g., at the program or function level. Hence, instead of an AST that looks like program

ANTLR syntax error unexpected token: +

阅读更多关于 ANTLR syntax error unexpected token: +

问题 Hi I have a small problem in my ANTLR tree grammar. I am using ANTLRWorks 1.4. In parser grammar I have the rule like this: declaration : 'variable' IDENTIFIER ( ',' IDENTIFIER)* ':' TYPE ';' -> ^('variable' IDENTIFIER TYPE)+ So I wanted one tree per each IDENTIFIER. And in the tree grammar I left only rewrite rules: declaration : ^('variable' IDENTIFIER TYPE)+ But when I check grammar I got syntax error unexpected token +. And it is this + sign at the end of the declaration rule in the tree

Error handeling in antlr 3.0

阅读更多关于 Error handeling in antlr 3.0

问题 I need to Report customized error when ever user input does not match our defined rules. Here is my code: grammar second1; @lexer::members { @Override public void reportError(RecognitionException e) { System.out.println("Throwing Exception: "+ e.getMessage()); throw new IllegalArgumentException(e); } } @parser::members { private boolean inbounds(Token t, int min, int max, String methodName) { int n = Integer.parseInt(t.getText()); if(n >= min && n <= max) { return true; } else { System.out

Best parser generator for parsing many small texts in C++?

阅读更多关于 Best parser generator for parsing many small texts in C++?

问题 I am, for performance reason, porting a C# library to C++. During normal operation, this library needs, amongst other things, to parse about 150'000 math expressions (think excel formulas) with an average length of less than 150 characters. In the C# version, I used GOLD parser to generate parsing code. It can parse all 150'000 expressions in under one second. Because we were thinking about extending our language, I figured the move to C++ might be a good chance to change to ANTLR. I have