antlr4

How to make antlr4 fully tokenize terminal nodes

北城以北 提交于 2019-12-02 19:59:19
问题 I'm trying to use Antlr to make a very simple parser, that basically tokenizes a series of . -delimited identifiers. I've made a simple grammar: r : STRUCTURE_SELECTOR ; STRUCTURE_SELECTOR: '.' (ID STRUCTURE_SELECTOR?)? ; ID : [_a-z0-9$]* ; WS : [ \t\r\n]+ -> skip ; When the parser is generated, I end up with a single terminal node that represents the string instead of being able to find further STRUCTURE_SELECTOR s. I'd like instead to see a sequence (perhaps represented as children of the

HTML/Markdown style grammar for ANTLR4

痴心易碎 提交于 2019-12-02 17:26:29
问题 I want to define a HTML/Markdown like grammar for an document that gets transformed to an AST. I'm aware, that ANTLR4 is not the best tool for doing Markdown things but I'm way closer to the HTML direction. At least I think I am. :) Here's my lexer definition: lexer grammar dnpMDLexer; NL : [\r\n] ; HEAD_TAG : '#' ; HEADING_TEXT : ('\\#'|~[#`\r\n])+ ; ITALIC_TAG : '*' ; ITALIC_TEXT : ('\\*'|~[#`*\r\n]).+? ; LISTING_TAG : '`' ; RUNNING_TEXT : ('\\#'|'\\`'|'\\*'|~[#*`])+ ; And here's my parser

context.getText() excludes spaces in ANTLR4

一曲冷凌霜 提交于 2019-12-02 16:11:22
问题 The getText() returns the complete statement excluding the spaces between the words. One way of considering the spaces is to include them in grammar. But, is there any other way to get the complete String with the spaces considered. 回答1: Yes, there is (assuming here you are using ParserRuleContext.getText() . The idea is to ask the input char stream for a range of characters. The position values are stored in the start and stop tokens of the context. Here's some code (converted from C++, so

mismatched Input when lexing and parsing with modes

北慕城南 提交于 2019-12-02 15:35:55
问题 I'm having an ANTLR4 problem with mismatched input but can't solve it. I've found a lot of questions dealing with that, and the usually revolve around the lexer matching something else to the token, but I don't see it in my case. I've got this lexer grammar: FieldStart : '[' Definition ']' -> pushMode(INFIELD) ; Definition : 'Element'; mode INFIELD; FieldEnd : '[end]' -> popMode ; ContentValue : ~[[]* ; Which then runs on the following parser: field : FieldStart ContentValue FieldEnd #Field

Antlr4: The following sets of rules are mutually left-recursive

北慕城南 提交于 2019-12-02 10:25:35
问题 I am trying to describle simple grammar with AND and OR , but fail with the following error The following sets of rules are mutually left-recursive The grammar is following: expr: NAME | and | or; and: expr AND expr; or: expr OR expr; NAME : 'A' .. 'B' + ; OR: 'OR' | '|'; AND: 'AND' | '&'; Simultaneously, the following grammar expr: NAME | expr AND expr | expr OR expr; NAME : 'A' .. 'B' + ; OR: 'OR' | '|'; AND: 'AND' | '&'; does compile. Why? 回答1: ANTLR4 supports only direct left recursion

Antlr4: Mismatched input

折月煮酒 提交于 2019-12-02 08:52:55
Here's a simple grammar test I thought would be easy to parse, but I get 'mismatched input' right off the bat and I can't figure out what Antlr is looking for. The input: # include "something" program TEST1 { BLAH BLAH } My grammar: grammar ProgHeader; program: header* prog EOF ; header: '#' ( include | define ) ; include: 'include' string ; define: 'define' string string? ; string: '"' QTEXT '"' ; prog: 'program' QTEXT '{' BLOCK '}' ; QTEXT: ~[\r\n\"]+ ; BLOCK: ~[}]+ ; // don't care, example block WS: [ \t\r\n] -> skip ; The output error message: line 1:0 mismatched input '# include

can an element contain attribute as parsed by parser generated by ANTLR? if so, how?

。_饼干妹妹 提交于 2019-12-02 08:46:55
I am following this tutorial and successfully replicated its behavior except that I am using Antlr 4.7 instead of the 4.5 that the tutorial was using. I am trying to build a DSL for expense tracker. Was wondering if each element can have attributes? E.g. this is what it looks like now This is the code for the todo.g4 as seen in https://github.com/simkimsia/learn-antlr-web-js/blob/master/todo.g4 grammar todo; elements : (element|emptyLine)* EOF ; element : '*' ( ' ' | '\t' )* CONTENT NL+ ; emptyLine : NL ; NL : '\r' | '\n' ; CONTENT : [a-zA-Z0-9_][a-zA-Z0-9_ \t]* ; Meaning to say the element

HTML/Markdown style grammar for ANTLR4

我们两清 提交于 2019-12-02 08:40:30
I want to define a HTML/Markdown like grammar for an document that gets transformed to an AST. I'm aware, that ANTLR4 is not the best tool for doing Markdown things but I'm way closer to the HTML direction. At least I think I am. :) Here's my lexer definition: lexer grammar dnpMDLexer; NL : [\r\n] ; HEAD_TAG : '#' ; HEADING_TEXT : ('\\#'|~[#`\r\n])+ ; ITALIC_TAG : '*' ; ITALIC_TEXT : ('\\*'|~[#`*\r\n]).+? ; LISTING_TAG : '`' ; RUNNING_TEXT : ('\\#'|'\\`'|'\\*'|~[#*`])+ ; And here's my parser definition: parser grammar dnpMDParser; options { tokenVocab=dnpMDLexer; } dnpMD : subheadline headline

Why does not ANTLR4 match “of” as a word and “,” as punctuation?

心已入冬 提交于 2019-12-02 08:30:43
I have a Hello.g4 grammar file with a grammar definition: definition : wordsWithPunctuation ; words : (WORD)+ ; wordsWithPunctuation : word ( word | punctuation word | word punctuation | '(' wordsWithPunctuation ')' | '"' wordsWithPunctuation '"' )* ; NUMBER : [0-9]+ ; word : WORD ; WORD : [A-Za-z-]+ ; punctuation : PUNCTUATION ; PUNCTUATION : (','|'!'|'?'|'\''|':'|'.') ; WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines Now, if I am trying to build a parse tree from the following input: a b c d of at of abc bcd of a b c d at abc, bcd a b c d of at of abc, bcd of it returns errors:

Token with different interpretations (i.e. keyword and identifier)

十年热恋 提交于 2019-12-02 08:07:09
问题 I am writing a grammar with a lot of case-insensitive keywords in ANTLR4. I collected some example files for the format, that I try to test parse and some use the same tokens which exist as keywords as identifiers in other places. For example there is a CORE keyword, which in other places is used as a ID for a structure from user input. Here some parts of my grammar: fragment A : [aA]; // match either an 'a' or 'A' fragment B : [bB]; fragment C : [cC]; [...] CORE: C O R E ; [...] IDSTRING: [a