antlr | 易学教程

ANTLR4: Unrecognized constant value in a lexer command

阅读更多关于 ANTLR4: Unrecognized constant value in a lexer command

问题 I am learning how to use the "more" lexer command. I typed in the lexer grammar shown in the ANTLR book, page 281: lexer grammar Lexer_To_Test_More_Command ; LQUOTE : '"' -> more, mode(STR) ; WS : [ \t\r\n]+ -> skip ; mode STR ; STRING : '"' -> mode(DEFAULT_MODE) ; TEXT : . -> more ; Then I created this simple parser to use the lexer: grammar Parser_To_Test_More_Command ; import Lexer_To_Test_More_Command ; test: STRING EOF ; Then I opened a DOS window and entered this command: antlr4 Parser

Ambiguous ANTLR parser rule

阅读更多关于 Ambiguous ANTLR parser rule

问题 I have a very simple example text which I want to parse with ANTLR, and yet I'm getting wrong results due to ambiguous definition of the rule. Here is the grammar: grammar SimpleExampleGrammar; prog : event EOF; event : DEFINE EVT_HEADER eventName=eventNameRule; eventNameRule : DIGIT+; DEFINE : '#define'; EVT_HEADER : 'EVT_'; DIGIT : [0-9a-zA-Z_]; WS : ('' | ' ' | '\r' | '\n' | '\t') -> channel(HIDDEN); First text example: #define EVT_EX1 Second text example: #define EVT_EX1 #define EVT_EX2

Antlr4 Javascript target - issue with Visitor and labeled alternative

阅读更多关于 Antlr4 Javascript target - issue with Visitor and labeled alternative

问题 I'm using antlr4 (4.5.3) with Javascript target, and trying to implement a visitor. Following the antlr4 book's calculator example (great book BTW) I'm trying to create a similar grammar: ... expr: expr op=('*'|'/') expr # MulDiv | expr op=('+'|'-') expr # AddSub | INT # int | '(' expr ')' # parens ; ... The issue: visitor methods are created for the labeled alternatives (for example visitMulDiv) however 2 thing are missing: Implementation for visitExpr in the base visitor

ANTLR4 Mutual left recursion

阅读更多关于 ANTLR4 Mutual left recursion

问题 I just ran into a strange problem with ANTLR 4.2.2: Consider a (simplified) java grammar. This does not compile: classOrInterfaceType : (classOrInterfaceType) '.' Identifier | Identifier ; ANTLR outputs the following error: error(119): Java.g4::: The following sets of rules are mutually left-recursive [classOrInterfaceType] Yes, I also see a left recursion. But I do not see a mutual left recursion, only a usual one. When I remove the parenthesis around (classOrInterfaceType) , then it

ANTLR: multiplication omiting '*' symbol

阅读更多关于 ANTLR: multiplication omiting '*' symbol

问题 I'm trying to create a grammar for multiplying and dividing numbers in which the '*' symbol does not need to be included. I need it to output an AST. So for input like this: 1 2 / 3 4 I want the AST to be (* (/ (* 1 2) 3) 4) I've hit upon the following, which uses java code to create the appropriate nodes: grammar TestProd; options { output = AST; } tokens { PROD; } DIV : '/'; multExpr: (INTEGER -> INTEGER) ( {div = null;} div=DIV? b=INTEGER -> ^({$div == null ? (Object)adaptor.create(PROD, "

ANTLR behaviour with conflicting tokens

阅读更多关于 ANTLR behaviour with conflicting tokens

问题 How is ANTLR lexer behavior defined in the case of conflicting tokens? Let me explain what I mean by "conflicting" tokens. For example, assume that the following is defined: INT_STAGE : '1'..'6'; INT : '0'..'9'+; There is a conflict here, because after reading a sequence of digits, the lexer would not know whether there is one INT or many INT_STAGE tokens (or different combinations of both). After a test, it looks like that if INT is defined after INT_STAGE, the lexer would prefer to find INT

Antlr (lexer): matching the right token

阅读更多关于 Antlr (lexer): matching the right token

问题 In my Antlr3 grammar, I have several "overlapping" lexer rules, like this: NAT: ('0' .. '9')+ ; INT: ('+' | '-')? ('0' .. '9')+ ; BITVECTOR: ('0' | '1')* ; Although tokens like 100110 and 123 can be matched by more than one of those rules, it is always determined by context which of them it has to be. Example: s: a | b | c ; a: '<' NAT '>' ; b: '{' INT '}' ; c: '[' BITVECTOR ']' ; The input {17} should then match { , INT , and } , but the lexer has already decided that 17 is a NAT-token. How

ANTLR and Xtext integration for developing plugin

阅读更多关于 ANTLR and Xtext integration for developing plugin

问题 My current project is focusing on code generation from High-level specification. More specifically, developers write high-level specifications and compiler parses them and generates Java code. For parser, I have used ANTLR grammar and for code generation I have used StringTemplateFiles. For providing nice editor support (with syntax high lighting & coloring), I have used xText. Now, the real problem comes - how can I integrate xText editor support with ANTLR parser and code generator? I want

Parse string antlr

阅读更多关于 Parse string antlr

问题 I have strings as a parser rule rather than lexer because strings may contain escapes with expressions in them, such as "The variable is \(variable)" . string : '"' character* '"' ; character : escapeSequence | . ; escapeSequence : '\(' expression ')' ; IDENTIFIER : [a-zA-Z][a-zA-Z0-9]* ; WHITESPACE : [ \r\t,] -> skip ; This doesn't work because . matches any token rather than any character, so many identifiers will be matched and whitespace will be completely ignored. How can I parse strings

How can I check if first character of a line is “*” in ANTLR4?

阅读更多关于 How can I check if first character of a line is “*” in ANTLR4?

问题 I am trying to write a parser for a relatively simple but idiosyncratic language. Simply put, one of the rules is that comment lines are denoted by an asterisk only if that asterisk is the first character of the line. How might I go about formalising such a rule in ANTLR4? I thought about using: START_LINE_COMMENT: '\n*' .*? '\n' -> skip; But I am certain this won't work with more than one line comment in a row, as the newline at the end will be consumed as part of the START_LINE_COMMENT