antlr4

Can't import module ANTLR MyGrammarLexer and MyGrammarParser

雨燕双飞 提交于 2019-12-11 05:45:49
问题 I'm trying to start with ANTLR . When I import module antlr it's working just fine , but if I try to import MyGrammarLexer and MyGrammarParser , it's shows that MyGrammarLexer and Parser aren't in lib. I Using PyCharm , I installed ANTLR with : pip3 install antlr4-python3-runtime my code is : import sys from antlr4 import * import MyGrammarLexer import MyGrammarParser def main(argv): input = FileStream(argv[1]) lexer = MyGrammarLexer(input) stream = CommonTokenStream(lexer) parser =

how to report grammar ambiguity in antlr4

a 夏天 提交于 2019-12-11 05:29:47
问题 According to the antlr4 book (page 159), and using the grammar Ambig.g4, grammar ambiguity can be reported by: grun Ambig stat -diagnostics or equivalently, in code form: parser.removeErrorListeners(); parser.addErrorListener(new DiagnosticErrorListener()); parser.getInterpreter().setPredictionMode(PredictionMode.LL_EXACT_AMBIG_DETECTION); The grun command reports the ambiguity properly for me, using antlr-4.5.3 . But when I use the code form, I dont get the ambiguity report. Here is the

Getting plain text in antlr instead of tokens

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-11 04:38:27
问题 I'm trying to create a parser using antlr. My grammar is as follows. code : codeBlock* EOF; codeBlock : text | tag1Ops | tag2Ops ; tag1Ops: START_1_TAG ID END_2_TAG ; tag2Ops: START_2_TAG ID END_2_TAG ; text: ~(START_1_TAG|START_2_TAG)+; START_1_TAG : '<%' ; END_1_TAG : '%>' ; START_2_TAG : '<<'; END_2_TAG : '>>' ; ID : [A-Za-z_][A-Za-z0-9_]*; INT_NUMBER: [0-9]+; WS : ( ' ' | '\n' | '\r' | '\t')+ -> channel(HIDDEN); SPACES: SPACE+; ANY_CHAR : .; fragment SPACE : ' ' | '\r' | '\n' | '\t' ;

check previous/left token in lexer

六月ゝ 毕业季﹏ 提交于 2019-12-11 04:37:49
问题 how can I find the previous/left token in lexer for example lexer grammar TLexer; ID : [a-zA-Z] [a-zA-Z0-9]*; CARET : '^'; RTN : {someCond1}? CARET ID; // CARET not include this token GLB : {someCond2}? CARET ID; // CARET not include this token etc 回答1: thanks, I did it this way lexer grammar TLexer; @lexer::members { int lastTokenType = 0; public void emit(Token token) { super.emit(token); lastTokenType = token.getType(); } } CARET : '^'; RTN : {someCond1&&(lastTokenType==CARET)}? ID; GLB :

Antlr4 doesn't correctly recognizes unicode characters

落爺英雄遲暮 提交于 2019-12-11 03:54:01
问题 I've very simple grammar which tries to match 'é' to token E_CODE. I've tested it using TestRig tool (with -tokens option), but parser can't correctly match it. My input file was encoded in UTF-8 without BOM and I've used ANTLR version 4.4. Could somebody else also check this ? I got this output on my console: line 1:0 token recognition error at: 'Ă' grammar Unicode; stat:EOF; E_CODE: '\u00E9' | 'é'; 回答1: I tested the grammar: grammar Unicode; stat: E_CODE* EOF; E_CODE: '\u00E9' | 'é'; as

ANTLR4 not reporting ambiguity

微笑、不失礼 提交于 2019-12-11 03:47:52
问题 Given the following grammar: grammar ReportAmbiguity; unit : statements+; statements : callStatement+ // '.' // <- uncomment this line ; callStatement : 'CALL' ID (argsByRef | argsByVal)*; argsByRef : ('BY' 'REF')? ID+; argsByVal : 'BY' 'VAL' ID+; ID : ('A'..'Z')+; WS : (' '|'\n')+ -> channel(HIDDEN); When parsing the string "CALL FUNCTION BY VAL A B" through the non-root rule callStatement everything works and the parser correctly reports an ambiguity: line 1:24 reportAttemptingFullContext d

Return the line number of the last character for current token

╄→гoц情女王★ 提交于 2019-12-11 03:34:15
问题 Is there a way in ANTLR 4 to be able to return the line number of the the last character for the current token ? I referred Antlr, get last line from token but that would be specific to a rule. I wanted something more generic but couldn't find what would suit me in the ANTLR API. 回答1: There is no direct way to get this information. However, if you don't have any -> skip commands in your lexer you can derive it from the following token. Suppose token b follows token a . If b

antlr4 array implementation : getting values of elements

匆匆过客 提交于 2019-12-11 03:08:43
问题 I'm trying to implement arrays in antlr4 and I'm lost as to how I can get the multiple elements of the array when it is initialized like so: int array[] = {1, 2}; I was thinking of placing them in a HashMap like this, the key being the index: public Map<Integer, Value> array_memory = new HashMap<Integer, Value>(); Below is the grammar I'm following: grammar GaleugParserNew; /* * PARSER RULES */ declare_var : INTEGER ID '[' (INT)? ']' (ASSIGN '{' array_init '}')? SCOL ; array_init : INT ','

Why does ANTLR require all or none alternatives be labeled?

為{幸葍}努か 提交于 2019-12-11 03:05:55
问题 I'm new to ANTLR. I just discovered that it is possible to label each alternative in a production like so: foo : a # aLabel | b # bLabel | // ... ; However, I find it unpleasant that all or none alternatives must be labeled. I needed to label just 2 alternatives of a production with 20+ branches recently, and I ended up labelling each of the others # stubLabel . Is there any reason why all or none have to be labeled? 回答1: As soon as you add a label ANTLR4 will no longer generate a context

What to use in ANTLR4 to resolve ambiguities in more complex cases (instead of syntactic predicates)?

蹲街弑〆低调 提交于 2019-12-11 02:56:29
问题 In ANTLR v3, syntactic predicates could be used to solve ambiguitites, i.e., to explicitly tell ANTLR which alternative should be chosen. ANTLR4 seems to simply accept grammars with similar ambiguities, but during parsing it reports these ambiguities. It produces a parse tree, despite these ambiguities (by chosing the first alternative, according to the documentation). But what can I do, if I want it to chose some other alternative? In other words, how can I explicitly resolve ambiguities?