antlr4 | 易学教程

ANTLR4: Using non-ASCII characters in token rules

阅读更多关于 ANTLR4: Using non-ASCII characters in token rules

On page 74 of the ANTRL4 book it says that any Unicode character can be used in a grammar simply by specifying its codepoint in this manner: '\uxxxx' where xxxx is the hexadecimal value for the Unicode codepoint. So I used that technique in a token rule for an ID token: grammar ID; id : ID EOF ; ID : ('a' .. 'z' | 'A' .. 'Z' | '\u0100' .. '\u017E')+ ; WS : [ \t\r\n]+ -> skip ; When I tried to parse this input: Gŭnter ANTLR throws an error, saying that it does not recognize ŭ . (The ŭ character is hex 016D, so it is within the range specified) What am I doing wrong please? ANTLR is ready to

Dynamically create lexer rule

阅读更多关于 Dynamically create lexer rule

Here is a simple rule: NAME : 'name1' | 'name2' | 'name3'; Is it possible to provide alternatives for such rule dynamically using an array that contains strings? Yes, dynamic tokens match IDENTIFIER rule In that case, simply do a check after the Id has matched completely to see if the text the Id matched is in a predefined collection. If it is in the collection (a Set in my example) change the type of the token. A small demo: grammar T; @lexer::members { private java.util.Set<String> special; public TLexer(ANTLRStringStream input, java.util.Set<String> special) { super(input); this.special =

Can we define a non context-free grammar with ANTLR?

阅读更多关于 Can we define a non context-free grammar with ANTLR?

问题 I'm pretty new to ANTLR4 and now I'm trying to undertand which kind of grammars we might define with it. As far as I got, there're two kind of rules in ANTLR: parser rules (lower case words) and lexer rules (upper-case words). Example: grammar Test; init: prog(','prog)*; prog: A | prog ; A: [a-z]+; Form the grammar production rule standpoint I would say that parser rules are NON-TERMINAL symbols which can be replaced with a sequence of tokens defined by a lexer rules. So, it's perfectly clear

Antlr4 how to build a grammar allowed keywords as identifier

阅读更多关于 Antlr4 how to build a grammar allowed keywords as identifier

问题 This is a demo code label: var id let id = 10 goto label If allowed keyword as identifier will be let: var var let var = 10 goto let This is totally legal code. But it seems very hard to do this in antlr. AFAIK, If antlr match a token let, will never fallback to id token. so for antlr it will see LET_TOKEN : VAR_TOKEN <missing ID_TOKEN>VAR_TOKEN LET_TOKEN <missing ID_TOKEN>VAR_TOKEN = 10 although antlr allowed predicate, I have to control ever token match and problematic. grammar become this

Trouble Setting Up ANTLR 4 IDE on Eclipse Luna (4.4)

阅读更多关于 Trouble Setting Up ANTLR 4 IDE on Eclipse Luna (4.4)

I'm trying to install the ANTLR 4 IDE on Eclipse Luna (4.4). I've installed it from the Marketplace but I have no idea how to create a project that has an ANTLR 4 Lexer/Parser in it. When I go to create a new project I don't see any options for ANTLR 4. I tried creating a .g4 file and it opens in the editor but when I save it doesn't do anything. I looked around all over the internet and found a handful of resources that I cobbled together and found a solution by trial and error. Below is a guide that I've used on a few of my machines to get ANTLR 4 IDE setup in Eclipse. I figured I should

ANTLR4 negative lookahead in lexer

阅读更多关于 ANTLR4 negative lookahead in lexer

问题 I am trying to define lexer rules for PostgreSQL SQL. The problem is with the operator definition and the line comments conflicting with each other. for example @--- is an operator token @- followed by the -- comment and not an operator token @--- In grako it would be possible to define the negative lookahead for the - fragment like: OP_MINUS: '-' ! ( '-' ) . In ANTLR4 I could not find any way to rollback already consumed fragment. Any ideas? Here the original definition what the PostgreSQL

Can ANTLR4 java parser handle very large files or can it stream files

阅读更多关于 Can ANTLR4 java parser handle very large files or can it stream files

问题 Is the java parser generated by ANTLR capable of streaming arbitrarily large files? I tried constructing a Lexer with a UnbufferedCharStream and passed that to the parser. I got an UnsupportedOperationException because of a call to size on the UnbufferedCharStream and the exception contained an explained that you can't call size on an UnbufferedCharStream. new Lexer(new UnbufferedCharStream( new CharArrayReader("".toCharArray()))); CommonTokenStream stream = new CommonTokenStream(lexer);

Antlr4 unexpectedly stops parsing expression

阅读更多关于 Antlr4 unexpectedly stops parsing expression

How Get error messages of antlr parsing?

阅读更多关于 How Get error messages of antlr parsing?

问题 I wrote a grammar with antlr 4.4 like this : grammar CSV; file : row+ EOF ; row : value (Comma value)* (LineBreak | EOF) ; value : SimpleValueA | QuotedValue ; Comma : ',' ; LineBreak : '\r'? '\n' | '\r' ; SimpleValue : ~(',' | '\r' | '\n' | '"')+ ; QuotedValue : '"' ('""' | ~'"')* '"' ; then I use antlr 4.4 for generating parser & lexer, this process is successful after generate classes I wrote some java code for using grammar import org.antlr.v4.runtime.ANTLRInputStream; import org.antlr.v4

How to match any symbol in ANTLR parser (not lexer)?

阅读更多关于 How to match any symbol in ANTLR parser (not lexer)?

问题 How to match any symbol in ANTLR parser (not lexer)? Where is the complete language description for ANTLR4 parsers? UPDATE Is the answer is "impossible"? 回答1: You first need to understand the roles of each part in parsing: The lexer: this is the object that tokenizes your input string. Tokenizing means to convert a stream of input characters to an abstract token symbol (usually just a number). The parser: this is the object that only works with tokens to determine the structure of a language.