antlr4

ANTLR 4 extraneous input matching non lexer item

痞子三分冷 提交于 2019-12-11 16:47:20
问题 I have a grammar like this : grammar MyGrammar; field : f1 (STROKE f2 f3)? ; f1 : FIELDTEXT+ ; f2 : 'A' ; f3 : NUMBER4 ; FIELDTEXT : ~['/'] ; NUMBER4 : [0-9][0-9][0-9][0-9]; STROKE : '/' ; This works well enough, and fields f1 f2 f3 are all populated correctly. Except when there is an A to the left of the / , (regardless of the presence of the optional part) this additionally causes an error: extraneous input 'A' expecting {<EOF>, FIELDTEXT, '/'} Some sample Data: PHOEN -> OK. KLM405/A4046 ->

Is there any good ways to improve the parser's performance generated using antlr4?

旧时模样 提交于 2019-12-11 16:28:00
问题 I have tried a few days to fix my grammar file(uniformSQL.g4) in order to improve the parser performance but still failed. The parser cost 4000+ ms to parser through the SQL case. And I also tried to use SLL(*) strategy, it is fast but come out a lot of mismatch cases. So I wonder how to get the best performance when designing the grammar. I also tried to lower the parse tree'height when designing grammar, but the speed turned out to be slower. Looking forward to your suggestion,thanks. This

Antlr4: single quote rule fails when there are escape chars plus carriage return, new line

孤街浪徒 提交于 2019-12-11 15:52:55
问题 I have a grammar as such: grammar Testquote; program : (Line ';')+ ; Line: L_S_STRING ; L_S_STRING : '\'' (('\'' '\'') | ('\\' '\'') | ~('\''))* '\''; // Single quoted string literal L_WS : L_BLANK+ -> skip ; // Whitespace fragment L_BLANK : (' ' | '\t' | '\r' | '\n') ; This grammar--and the L_S_STRING in particular--seems working fine with vanilla inputs like: 'ab'; 'cd'; However, it fails with this input: 'yyyy-MM-dd\\'T\\'HH:mm:ss\\'Z\\''; 'cd'; Yet works when I changed the first line to

ANTLR parser for alpha numeric words which may have whitespace in between

隐身守侯 提交于 2019-12-11 15:25:16
问题 First I tried to identify a normal word and below works fine: grammar Test; myToken: WORD; WORD: (LOWERCASE | UPPERCASE )+ ; fragment LOWERCASE : [a-z] ; fragment UPPERCASE : [A-Z] ; fragment DIGIT: '0'..'9' ; WHITESPACE : (' ' | '\t')+; Just when I added below parser rule just beneath "myToken", even my WORD tokens weren't getting recognised with input string as "abc" ALPHA_NUMERIC_WS: ( WORD | DIGIT | WHITESPACE)+; Does anyone have any idea why is that? 回答1: This is because ANTLR's lexer

Antlr4 import of combined grammar failing

寵の児 提交于 2019-12-11 14:43:57
问题 I am presently getting... error(56): AqlCommentTest.g4:12:4: reference to undefined rule: htmlCommentDeclaration error(56): AqlCommentTest.g4:13:4: reference to undefined rule: mdCommentDeclaration The import for the lexer grammar does seem to be loading. The following files present the problem. AqlCommentTest.g4 grammar AqlCommentTest; import AqlLexerRules; import AqlComment; program: commentDeclaration+; commentDeclaration: htmlCommentDeclaration #Comment_HTML | mdCommentDeclaration

Antlr4, How to report specific syntax error

自古美人都是妖i 提交于 2019-12-11 13:36:41
问题 I am trying to use antlr4 to write some error checking for my simple grammar. The grammar itself is constructed by functions. ie FUNCTION hello (n){ ...... } FUNCTION main (n) { ...... } I am not sure how it suppose to catch specific errors such as missing function name , or missing main function Here is what my ErrorListener looks like import org.antlr.v4.runtime.*; import org.antlr.v4.runtime.tree.*; public class SimpleErrorListener extends BaseErrorListener { @Override public void

How do I use custom tokens and contexts in ANTLR 4

末鹿安然 提交于 2019-12-11 13:15:27
问题 I've used ANTLR3 for quite a while. I am just switching to ANTLR 4. It is, in general, much more understandable for my students in my compiler class. However, it's not clear from the book and other documentation that I've located, how to make the tokens and contexts that form the nodes of the parse tree customized classes. With ANTLR 3 I just used the options to have the generated code rename them in the generated code. What about in ANTLR 4?Is there documentation that I shoudl have been able

antlr4 mixed fragments in tokens

时光总嘲笑我的痴心妄想 提交于 2019-12-11 12:23:12
问题 I observe a strange behavior, trying to parse a text using a grammar that contains a statements like the following: fragment A : ('a'|'A') ; fragment D : ('d'|'D') ; fragment N : ('n'|'N') ; KEY_AND : A N D; I created a simple grammar to produce the issue I experience: grammar AndTest; mainRule: NAME SEP KEY_AND SEP NAME; NAME: ('A'..'Z')+ ; SEP: ';' ; fragment A : ('a'|'A') ; fragment D : ('d'|'D') ; fragment N : ('n'|'N') ; KEY_AND : A N D; WS: [ \r\t\n]+ -> skip ; During grun execution I

Antlr: how to match everything between the other recognized tokens?

ぃ、小莉子 提交于 2019-12-11 10:39:39
问题 How do I match all of the leftover text between the other tokens in my lexer? Here's my code: grammar UserQuery; expr: expr AND expr | expr OR expr | NOT expr | TEXT+ | '(' expr ')' ; OR : 'OR'; AND : 'AND'; NOT : 'NOT'; LPAREN : '('; RPAREN : ')'; TEXT: .+?; When I run the lexer on "xx AND yy", I get these tokens: x type:TEXT x type:TEXT type:TEXT AND type:'AND' type:TEXT y type:TEXT y type:TEXT This sort-of works, except that I don't want each character to be a token. I'd like to

How to rewrite Antlr4 Parse Tree manually?

你。 提交于 2019-12-11 10:28:33
问题 I am working on a simple Xquery processor and using Antlr4 to parse the grammar. I use the visitor pattern to walk through the parse tree. Now I want to rewrite a query if the query meet the some condition. The processor now can process a query if the query directly use the keyword like "join" and meet the "join" grammar. I want to first rewrite the parse tree if the query can be changed to a join query or do nothing if not. Is there a way to manually manipulate the parse tree? Like adding a