antlr | 易学教程

ANTLR lexer rule consumes too much

阅读更多关于 ANTLR lexer rule consumes too much

问题 ANTLR Lexer Rule Design I have a requirement for the following token: Allowable characters include uppercase, lowercase, numeric, space, and hyphen characters Unfixed length (must be at least two characters in length) Token must contain at least one space or hyphen Token must start and end in an uppercase, lowercase, numeric, space, or hyphen character (cannot begin or end with a space) The ANTLR lexer rule "AlphaNumericSpaceHyphen" in the grammar below almost works except for one case. Using

antlr3 remove treenode with subtree

阅读更多关于 antlr3 remove treenode with subtree

问题 i try to do some tree to tree transform with antlr3.4 It's (for this question) about boolean expressions were "AND" and "OR" are allowed to bind to n expressions. The parser stage creates something like this (OR (AND (expr1) (expr2) (expr3) (OR (AND (expr4)) (AND (expr5)) (AND (expr6)) ) ) ) Unfortunately there are AST nodes for "AND" and "OR" that bind just to one expression. (Which is useless, but hey - rules andExpr and orExpr are invoked) I tried to kick them out (mean, replace them by

ANTLR Lua long string grammar rules

阅读更多关于 ANTLR Lua long string grammar rules

问题 I'm trying to create ANTLR parser for Lua. So i took grammar produced by Nicolai Mainero(available at ANTLR's site, Lua 5.1 grammar) and begin to work. Grammar is good. One thing not working: LONG STRINGS. Lua specification rule: Literal strings can also be defined using a long format enclosed by long brackets. We define an opening long bracket of level n as an opening square bracket followed by n equal signs followed by another opening square bracket. So, an opening long bracket of level 0

Antlr - Parsing Multiline #define for C.g4

阅读更多关于 Antlr - Parsing Multiline #define for C.g4

问题 I am using Antlr4 to parse C code. I want to parse multiline #defines alongwith C.g4 provided in C.g4 But the grammar mentioned in the link above does not support preprocessor directives, so I have added the following new rules to support preprocessing. Link to my previous question Whitespace : [ \t]+ -> channel(HIDDEN) ; Newline : ( '\r' '\n'? | '\n' ) -> channel(HIDDEN) ; BlockComment : '/*' .*? '*/' ; LineComment : '//' ~[\r\n]* ; IncludeBlock : '#' Whitespace? 'include' ~[\r\n]* ;

my lexer token action is not invoked

阅读更多关于 my lexer token action is not invoked

问题 I use antlr4 with javascript target. Here is a sample grammar: P : T ; T : [a-z]+ {console.log(this.text);} ; start: P ; When I run the generated parser, nothing is printed, although the input is matched. If I move the action to the token P , then it gets invoked. Why is that? 回答1: Actions are ignored in referenced rules. This was the original behavior of ANTLR 4, back when the lexer only supported a single action per token (and that action must appear at the end of the token). Several

How can I parse a special character differently in two terminal rules using antlr?

阅读更多关于 How can I parse a special character differently in two terminal rules using antlr?

问题 I have a grammar that uses the $ character at the start of many terminal rules, such as $video{ , $audio{ , $image{ , $link{ and others that are like this. However, I'd also like to match all the $ and { and } characters that don't match these rules too. For example, my grammar does not properly match $100 in the CHUNK rule, but adding the $ to the long list of acceptable characters in CHUNK causes the other production rules to break. How can I change my grammar so that it's smart enough to

ANTLR generating empty conditions

阅读更多关于 ANTLR generating empty conditions

问题 I'm trying to learn to use ANTLR, but I cannot figure out what's wrong with my code in this case. I hope this will be really easy for anyone with some experience with it. This is the grammar (really short). grammar SmallTest; @header { package parseTest; import java.util.ArrayList; } prog returns [ArrayList<ArrayList<String>> all] :(stat { if ($all == null) $all = new ArrayList<ArrayList<String>>(); $all.add($stat.res); } )+ ; stat returns [ArrayList<String> res] :(element { if ($res == null)

Migration tool for ANTLR grammar

阅读更多关于 Migration tool for ANTLR grammar

问题 Suppose I have a following simple grammar (query DSL): grammar TestGrammar; term : textTerm ; textTerm : 'Text' '(' T_VALUE '=' STRING+ ')' ; T_VALUE : 'value' ; STRING : '"' .+? '"' ; WS : [ \t\r\n]+ -> skip ; Then at some point I decide that text term format needs to be changed, for example: Text(value = "123") -> MyText(val = "123") How should I approach migrating existing data that users have generated with previous version of grammar? 回答1: Assumption Let's make one simplification of your

Antlrworks - extraneous input

阅读更多关于 Antlrworks - extraneous input

问题 I am new in this stuff, and for that reason I will need your help.. I am trying to parse the Wikipedia Dump, and my first step is to map each rule defined by them into ANTLR, unfortunally I got my first barrier: line 1:8 extraneous input ''''' expecting '\'\'' I am not understanding what is going on, please lend me your help. My code: grammar Test; options { language = Java; } parse : term+ EOF ; term : IDENT | '[[' term ']]' | '\'\'' term '\'\'' | '\'\'\'' term '\'\'\'' ; IDENT : ('a'..'z' |

ANTLR: Lexer does not recognize token

阅读更多关于 ANTLR: Lexer does not recognize token

问题 Given the following Lexer grammar: lexer grammar CodeTableLexer; CodeTabHeader : '[code table 1.0]'; Code : 'code'; Table : 'table'; End : 'end'; Row : 'row'; Naming : 'naming'; Dfltlang : 'dfltlang'; Language : 'english' | 'german' | 'french' | 'italian' | 'spanish'; Null : 'null'; Number : Int ('.' Digit*)? ; Identifier : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '$' | '.' | Digit)* ; String @after { setText(getText().substring(1, getText().length() - 1).replaceAll("\\\\(.)",