antlr4 | 易学教程

How to make antlr4 fully tokenize terminal nodes

阅读更多关于 How to make antlr4 fully tokenize terminal nodes

问题 I'm trying to use Antlr to make a very simple parser, that basically tokenizes a series of . -delimited identifiers. I've made a simple grammar: r : STRUCTURE_SELECTOR ; STRUCTURE_SELECTOR: '.' (ID STRUCTURE_SELECTOR?)? ; ID : [_a-z0-9$]* ; WS : [ \t\r\n]+ -> skip ; When the parser is generated, I end up with a single terminal node that represents the string instead of being able to find further STRUCTURE_SELECTOR s. I'd like instead to see a sequence (perhaps represented as children of the

HTML/Markdown style grammar for ANTLR4

阅读更多关于 HTML/Markdown style grammar for ANTLR4

问题 I want to define a HTML/Markdown like grammar for an document that gets transformed to an AST. I'm aware, that ANTLR4 is not the best tool for doing Markdown things but I'm way closer to the HTML direction. At least I think I am. :) Here's my lexer definition: lexer grammar dnpMDLexer; NL : [\r\n] ; HEAD_TAG : '#' ; HEADING_TEXT : ('\\#'|~[#`\r\n])+ ; ITALIC_TAG : '*' ; ITALIC_TEXT : ('\\*'|~[#`*\r\n]).+? ; LISTING_TAG : '`' ; RUNNING_TEXT : ('\\#'|'\\`'|'\\*'|~[#*`])+ ; And here's my parser

context.getText() excludes spaces in ANTLR4

阅读更多关于 context.getText() excludes spaces in ANTLR4

问题 The getText() returns the complete statement excluding the spaces between the words. One way of considering the spaces is to include them in grammar. But, is there any other way to get the complete String with the spaces considered. 回答1: Yes, there is (assuming here you are using ParserRuleContext.getText() . The idea is to ask the input char stream for a range of characters. The position values are stored in the start and stop tokens of the context. Here's some code (converted from C++, so

mismatched Input when lexing and parsing with modes

阅读更多关于 mismatched Input when lexing and parsing with modes

问题 I'm having an ANTLR4 problem with mismatched input but can't solve it. I've found a lot of questions dealing with that, and the usually revolve around the lexer matching something else to the token, but I don't see it in my case. I've got this lexer grammar: FieldStart : '[' Definition ']' -> pushMode(INFIELD) ; Definition : 'Element'; mode INFIELD; FieldEnd : '[end]' -> popMode ; ContentValue : ~[[]* ; Which then runs on the following parser: field : FieldStart ContentValue FieldEnd #Field

Antlr4: The following sets of rules are mutually left-recursive

阅读更多关于 Antlr4: The following sets of rules are mutually left-recursive

问题 I am trying to describle simple grammar with AND and OR , but fail with the following error The following sets of rules are mutually left-recursive The grammar is following: expr: NAME | and | or; and: expr AND expr; or: expr OR expr; NAME : 'A' .. 'B' + ; OR: 'OR' | '|'; AND: 'AND' | '&'; Simultaneously, the following grammar expr: NAME | expr AND expr | expr OR expr; NAME : 'A' .. 'B' + ; OR: 'OR' | '|'; AND: 'AND' | '&'; does compile. Why? 回答1: ANTLR4 supports only direct left recursion

Antlr4: Mismatched input

阅读更多关于 Antlr4: Mismatched input

Here's a simple grammar test I thought would be easy to parse, but I get 'mismatched input' right off the bat and I can't figure out what Antlr is looking for. The input: # include "something" program TEST1 { BLAH BLAH } My grammar: grammar ProgHeader; program: header* prog EOF ; header: '#' ( include | define ) ; include: 'include' string ; define: 'define' string string? ; string: '"' QTEXT '"' ; prog: 'program' QTEXT '{' BLOCK '}' ; QTEXT: ~[\r\n\"]+ ; BLOCK: ~[}]+ ; // don't care, example block WS: [ \t\r\n] -> skip ; The output error message: line 1:0 mismatched input '# include

can an element contain attribute as parsed by parser generated by ANTLR? if so, how?

阅读更多关于 can an element contain attribute as parsed by parser generated by ANTLR? if so, how?

I am following this tutorial and successfully replicated its behavior except that I am using Antlr 4.7 instead of the 4.5 that the tutorial was using. I am trying to build a DSL for expense tracker. Was wondering if each element can have attributes? E.g. this is what it looks like now This is the code for the todo.g4 as seen in https://github.com/simkimsia/learn-antlr-web-js/blob/master/todo.g4 grammar todo; elements : (element|emptyLine)* EOF ; element : '*' ( ' ' | '\t' )* CONTENT NL+ ; emptyLine : NL ; NL : '\r' | '\n' ; CONTENT : [a-zA-Z0-9_][a-zA-Z0-9_ \t]* ; Meaning to say the element

HTML/Markdown style grammar for ANTLR4

阅读更多关于 HTML/Markdown style grammar for ANTLR4

I want to define a HTML/Markdown like grammar for an document that gets transformed to an AST. I'm aware, that ANTLR4 is not the best tool for doing Markdown things but I'm way closer to the HTML direction. At least I think I am. :) Here's my lexer definition: lexer grammar dnpMDLexer; NL : [\r\n] ; HEAD_TAG : '#' ; HEADING_TEXT : ('\\#'|~[#`\r\n])+ ; ITALIC_TAG : '*' ; ITALIC_TEXT : ('\\*'|~[#`*\r\n]).+? ; LISTING_TAG : '`' ; RUNNING_TEXT : ('\\#'|'\\`'|'\\*'|~[#*`])+ ; And here's my parser definition: parser grammar dnpMDParser; options { tokenVocab=dnpMDLexer; } dnpMD : subheadline headline

Why does not ANTLR4 match “of” as a word and “,” as punctuation?

阅读更多关于 Why does not ANTLR4 match “of” as a word and “,” as punctuation?

I have a Hello.g4 grammar file with a grammar definition: definition : wordsWithPunctuation ; words : (WORD)+ ; wordsWithPunctuation : word ( word | punctuation word | word punctuation | '(' wordsWithPunctuation ')' | '"' wordsWithPunctuation '"' )* ; NUMBER : [0-9]+ ; word : WORD ; WORD : [A-Za-z-]+ ; punctuation : PUNCTUATION ; PUNCTUATION : (','|'!'|'?'|'\''|':'|'.') ; WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines Now, if I am trying to build a parse tree from the following input: a b c d of at of abc bcd of a b c d at abc, bcd a b c d of at of abc, bcd of it returns errors:

Token with different interpretations (i.e. keyword and identifier)

阅读更多关于 Token with different interpretations (i.e. keyword and identifier)

问题 I am writing a grammar with a lot of case-insensitive keywords in ANTLR4. I collected some example files for the format, that I try to test parse and some use the same tokens which exist as keywords as identifiers in other places. For example there is a CORE keyword, which in other places is used as a ID for a structure from user input. Here some parts of my grammar: fragment A : [aA]; // match either an 'a' or 'A' fragment B : [bB]; fragment C : [cC]; [...] CORE: C O R E ; [...] IDSTRING: [a