antlr4

Nested Boolean Expression Parser using ANTLR

主宰稳场 提交于 2019-11-30 03:42:02
I'm trying to parse a Nested Boolean Expression and get the individual conditions within the expression separately. For e.g., if the input string is: (A = a OR B = b OR C = c AND ((D = d AND E = e) OR (F = f AND G = g))) I would like to get the conditions with the correct order. i.e., D =d AND E = e OR F = f AND G = g AND A = a OR B = b OR C = c I'm using ANTLR 4 to parse the input text and here's my grammar: grammar SimpleBoolean; rule_set : nestedCondition* EOF; AND : 'AND' ; OR : 'OR' ; NOT : 'NOT'; TRUE : 'TRUE' ; FALSE : 'FALSE' ; GT : '>' ; GE : '>=' ; LT : '<' ; LE : '<=' ; EQ : '=' ;

ANTLR 4 $channel = HIDDEN and options

此生再无相见时 提交于 2019-11-30 02:49:22
I need help with my ANTLR 4 grammar after deciding to switch to v4 from v3. I am not very experienced with ANTLR so I am really sorry if my question is dumb ;) In v3 I used the following code to detect Java-style comments: COMMENT : '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;} | '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;} ; In v4 there are no rule-specific options. The actions (move to hidden channel) are also invalid. Could somebody please give me a hint how to do it in ANTLR v4? The v4 equivalent would look like: COMMENT : ( '//' ~[\r\n]* '\r'? '\n' | '/*' .*? '*/' ) ->

Mismatched input error in simple antlr4 grammar

不羁岁月 提交于 2019-11-29 22:58:24
问题 I'm trying to parse a simple subset of SQL using antlr4. My grammar looks like this: grammar Query; query : select; select : 'select' colname (',' colname)* 'from' tablename; colname : COLNAME; tablename : TABLENAME; COLNAME: [a-z]+ ; TABLENAME : [a-z]+; WS : [ \t\n\r]+ -> skip ; // skip spaces, tabs, newlines I am testing this with a simple java application as follows: import java.io.ByteArrayInputStream; import java.io.InputStream; import org.antlr.v4.runtime.*; import org.antlr.v4.runtime

In antlr4 lexer, How to have a rule that catches all remaining “words” as Unknown token?

十年热恋 提交于 2019-11-29 18:08:24
问题 I have an antlr4 lexer grammar. It has many rules for words, but I also want it to create an Unknown token for any word that it can not match by other rules. I have something like this: Whitespace : [ \t\n\r]+ -> skip; Punctuation : [.,:;?!]; // Other rules here Unknown : .+? ; Now generated matcher catches '~' as unknown but creates 3 '~' Unknown tokens for input '~~~' instead of a single '~~~' token. What should I do to tell lexer to generate word tokens for unknown consecutive characters.

ANTLR4: Unexpected behavior that I can't understand

限于喜欢 提交于 2019-11-29 18:04:36
I'm very new to ANTLR4 and am trying to build my own language. So my grammar starts at program: <EOF> | statement | functionDef | statement program | functionDef program; and my statement is statement: selectionStatement | compoundStatement | ...; and selectionStatement : If LeftParen expression RightParen compoundStatement (Else compoundStatement)? | Switch LeftParen expression RightParen compoundStatement ; compoundStatement : LeftBrace statement* RightBrace; Now the problem is, that when I test a piece of code against selectionStatement or statement it passes the test, but when I test it

ANTLR4 Semantic Predicates that is Context Dependent Does Not Work

99封情书 提交于 2019-11-29 17:26:16
I am parsing a C++ like declaration with this scaled down grammar (many details removed to make it a fully working example). It fails to work mysteriously (at least to me). Is it related to the use of context dependent predicate? If yes, what is the proper way to implement the "counting the number of child nodes logic"? grammar CPPProcessor; cppCompilationUnit : decl_specifier_seq? init_declarator* ';' EOF; init_declarator: declarator initializer?; declarator: identifier; initializer: '=0'; decl_specifier_seq locals [int cnt=0] @init { $cnt=0; } : decl_specifier+ ; decl_specifier : @init {

How to create a antlr4 grammar which will parse date

纵然是瞬间 提交于 2019-11-29 17:03:37
I want to parse few date format using following ANTLR4 grammar. grammar Variables; //varTable : tableNameFormat dateFormat? ; //tableNameFormat: (ID SEPERATOR); dateFormat : YEAR UNDERSCORE MONTH UNDERSCORE TODAY | YEAR ; YEAR : DIGIT DIGIT DIGIT DIGIT; // 4-digits YYYY MONTH : DIGIT DIGIT; // 2-digits MM TODAY : DIGIT DIGIT ; // 2-digits DD UNDERSCORE: ('_' | '-' ); fragment DIGIT : [0-9] ; ID : [a-zA-Z][a-zA-Z0-9]? ; WS : [ \t\r\n]+ -> skip ; This grammar should parse "2016-01-01" easily but it's giving input mismatch. Please help For such a task regex is much better solution. But if you

ANTLR4 mutual left recursion grammar

房东的猫 提交于 2019-11-29 16:44:31
I have read many questions here on StackOverflow about mutual left-recursion issues in LL(k) parsers. I found the general algorithm for removing left-recursion: A : Aa | b ; becomes A : bR ; R : (aA)? ; However, I cannot figure out how to apply it to my situation. I have left_exp: IDENT | exp DOT IDENT ; exp : handful | of | other rules | left_exp ; The "handful of other rules" all contain regular recursion, such as exp : exp PLUS exp , etc. and have no issues. The issue is with left_exp and exp being mutually recursive. I thought about just adding IDENT and exp DOT IDENT to the exp rules, but

Is there a way to generate unit test to test my grammar

点点圈 提交于 2019-11-29 16:00:26
I created my grammar using antlr4 but I want to test robustess is there an automatic tool or a good way to do that fast Thanks :) The only way I found to create unit tests for a grammar is to create a number of examples from a written spec of the given language. This is neither fast, nor complete, but I see no other way. You could be tempted to create test cases directly from the grammar (writing a tool for that isn't that hard). But think a moment about this. What would you test then? Your unit tests would always succeed, unless you use generated test cases from an earlier version of the

Can we define a non context-free grammar with ANTLR?

夙愿已清 提交于 2019-11-29 15:19:55
I'm pretty new to ANTLR4 and now I'm trying to undertand which kind of grammars we might define with it. As far as I got, there're two kind of rules in ANTLR: parser rules (lower case words) and lexer rules (upper-case words). Example: grammar Test; init: prog(','prog)*; prog: A | prog ; A: [a-z]+; Form the grammar production rule standpoint I would say that parser rules are NON-TERMINAL symbols which can be replaced with a sequence of tokens defined by a lexer rules. So, it's perfectly clear that the grammar is context-free by the definition . The alpahbet of the language generated by the