grammar

Antlr3: Could not match token in parser rules which is used in lexer rule

老子叫甜甜 提交于 2019-12-02 13:44:43
I have lexer rules in Antlr3 as: HYPHEN : '-'; TOKEN : HYPHEN CHARS; CHARS : 'a' ..'z'; Parser rule is as: exp : CHARS | some complex expression; parser_rule : exp HYPHEN exp; If I try to match 'abc-abc' with parser_rule, It fails. Because lexer creates TOKEN for HYPHEN exp. How can I match it correctly with parser_rule. In ANTLR lexer, the lexer rule that can match the longest sub-sequence of input is used. So your input abc-abc will be tokenized as CHARS("abc") TOKEN("-abc") and therefore will not match the expected CHARS HYPHEN CHARS . You should consider making TOKEN a parser rule instead

Can all ambiguous grammars be converted to unambiguous grammars?

谁说胖子不能爱 提交于 2019-12-02 10:23:51
问题 There are grammars we convert to unambiguous by using left recursion. Are there grammars that cannot be converted to unambiguous grammars? 回答1: There are unambiguous context-free grammars for most practical languages (ignoring context-sensitive features such as variable declarations, whitespace sensitivity, etc.). But there is no algorithm which can find an unambiguous grammar given an ambiguous grammar. Furthermore, there is not even an algorithm which can tell you for certain whether a

Make a calculator's grammar that make a binary tree with javacc

半腔热情 提交于 2019-12-02 10:23:05
I need to make a simple calculator (with infix operator) parser that handle the operators +,-,*,/ and float and variable. To make this I used javacc, and I have made this grammar with jjtree. It works but it doesn't ensure that the final tree will be a binary tree, which I need. I want something like 5*3+x-y to generate the following tree : * / \ 5 + / \ 3 - / \ x y What would be a proper grammar to do that, that would not be left-recursive ? Something like the following will give you the tree you asked for. void sum(): {} { term() [ plus() sum() | minus() sum() | times() sum() | divide() sum(

Finding a grammar is not LL(1) without using classical methods and transforming it to LL(1)

烈酒焚心 提交于 2019-12-02 10:22:49
Let's say i have this grammar: S -> A C x | u B A A -> z A y | S u | ε B -> C x | y B u C -> B w B | w A This grammar is obviously not LL(1), which i can find constructing the parsing table. But is there any way i can prove that this grammar is not LL(1) without using the classical methods i.e. without constructing the parsing table or finding any conflicts? Also how can i convert this grammar to LL(1)? I think i have to use both epsilon-derivation elimination and left recursion elimination but its a bit tricky and as many times i've tried i couldn't transform it to LL(1). Thank you in advance

Why does not ANTLR4 match “of” as a word and “,” as punctuation?

心已入冬 提交于 2019-12-02 08:30:43
I have a Hello.g4 grammar file with a grammar definition: definition : wordsWithPunctuation ; words : (WORD)+ ; wordsWithPunctuation : word ( word | punctuation word | word punctuation | '(' wordsWithPunctuation ')' | '"' wordsWithPunctuation '"' )* ; NUMBER : [0-9]+ ; word : WORD ; WORD : [A-Za-z-]+ ; punctuation : PUNCTUATION ; PUNCTUATION : (','|'!'|'?'|'\''|':'|'.') ; WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines Now, if I am trying to build a parse tree from the following input: a b c d of at of abc bcd of a b c d at abc, bcd a b c d of at of abc, bcd of it returns errors:

How to implement JavaScript automatic semicolon insertion in JavaCC?

佐手、 提交于 2019-12-02 08:28:25
I am finishing my ECMAScript 5.1/JavaScript grammar for JavaCC . I've done all the tokens and productions according to the specification. Now I'm facing a big question which I don't know how to solve. JavaScript has this nice feature of the automatic semicolon insertion: What are the rules for JavaScript's automatic semicolon insertion (ASI)? To quote the specifications , the rules are: There are three basic rules of semicolon insertion: When, as the program is parsed from left to right, a token (called the offending token) is encountered that is not allowed by any production of the grammar,

ANTLRWorks debugging - the meaning of the different colors?

前提是你 提交于 2019-12-02 07:11:52
问题 I'm using the debugging mode of ANTLRWorks to test my c-grammar. Debugging in ANTLRWorks is really great for better understanding but I have a problem in understanding the different colors of the output tree. I'm using backtrack=true in my grammar. I thought that the red color means that the debugger goes the wrong way while green tells me that it is has gone the right way. But what about dark red and dark green? I added a picture of a "small tree" which only match the following input: int

Can all ambiguous grammars be converted to unambiguous grammars?

自古美人都是妖i 提交于 2019-12-02 05:37:29
There are grammars we convert to unambiguous by using left recursion. Are there grammars that cannot be converted to unambiguous grammars? There are unambiguous context-free grammars for most practical languages (ignoring context-sensitive features such as variable declarations, whitespace sensitivity, etc.). But there is no algorithm which can find an unambiguous grammar given an ambiguous grammar. Furthermore, there is not even an algorithm which can tell you for certain whether a given grammar is ambiguous. These are both undecidable problems . And, to answer your question, yes there are

Closure properties of context free languages

混江龙づ霸主 提交于 2019-12-02 04:50:22
I have the following problem: Languages L1 = {a^n * b^n : n>=0} and L2 = {b^n * a^n : n>=0} are context free languages so they are closed under the L1L2 so L={a^n * b^2n A^n : n>=0} must be context free too because it is generated by a closure property. I have to prove if this is true or not. So I checked the L language and I don’t think that it is context free then I also saw that L2 is just L1 reversed. Do I have to check if L1, L2 are deterministic? L1={a n b n : n>=0} and L2={b n a n : n>=0} are both context free. Since context-free languages are closed under concatenation, L3=L1L2 is also

How to match parentheses / brackets in pyparsing

℡╲_俬逩灬. 提交于 2019-12-01 20:38:43
I have a grammar token specified as: list_value = Suppress(oneOf("[ (")) + Group( delimitedList(string_value | int_value))("list") + Suppress(oneOf("] )")) However, this obviously allows (foo, bar] How do I enforce that the lists opening and closing characters must match? You make a list a choice between two rules: one for parentheses and one for square brackets. Thanks for bringing up pyparsing. I like it. My answer for your question is: delim_value = Group(delimitedList(string_value | int_value))("list") list_value = Or( (Suppress("[") + delim_value + Suppress("]"), Suppress("(") + delim