antlr | 易学教程

Why my antlr lexer java class is “code too large”?

阅读更多关于 Why my antlr lexer java class is “code too large”?

问题 This is the lexer in Antlr (sorry for a long file): lexer grammar SqlServerDialectLexer; /* T-SQL words */ AND: 'AND'; BIGINT: 'BIGINT'; BIT: 'BIT'; CASE: 'CASE'; CHAR: 'CHAR'; COUNT: 'COUNT'; CREATE: 'CREATE'; CURRENT_TIMESTAMP: 'CURRENT_TIMESTAMP'; DATETIME: 'DATETIME'; DECLARE: 'DECLARE'; ELSE: 'ELSE'; END: 'END'; FLOAT: 'FLOAT'; FROM: 'FROM'; GO: 'GO'; IMAGE: 'IMAGE'; INNER: 'INNER'; INSERT: 'INSERT'; INT: 'INT'; INTO: 'INTO'; IS: 'IS'; JOIN: 'JOIN'; NOT: 'NOT'; NULL: 'NULL'; NUMERIC:

Extending simple ANTLR grammar to support input variables

阅读更多关于 Extending simple ANTLR grammar to support input variables

I'm still on my quest for a really simple language and I know now that there are none. So I'm writing one myself using ANTLR3. I found a really great example in this answer : Exp.g: grammar Exp; eval returns [double value] : exp=additionExp {$value = $exp.value;} ; additionExp returns [double value] : m1=multiplyExp {$value = $m1.value;} ( '+' m2=multiplyExp {$value += $m2.value;} | '-' m2=multiplyExp {$value -= $m2.value;} )* ; multiplyExp returns [double value] : a1=atomExp {$value = $a1.value;} ( '*' a2=atomExp {$value *= $a2.value;} | '/' a2=atomExp {$value /= $a2.value;} )* ; atomExp

How to match any symbol in ANTLR parser (not lexer)?

阅读更多关于 How to match any symbol in ANTLR parser (not lexer)?

问题 How to match any symbol in ANTLR parser (not lexer)? Where is the complete language description for ANTLR4 parsers? UPDATE Is the answer is "impossible"? 回答1: You first need to understand the roles of each part in parsing: The lexer: this is the object that tokenizes your input string. Tokenizing means to convert a stream of input characters to an abstract token symbol (usually just a number). The parser: this is the object that only works with tokens to determine the structure of a language.

Antlr rule priorities

阅读更多关于 Antlr rule priorities

问题 Firstly I know this grammar doesn't make sense but it was created to test out the ANTLR rule priority behaviour grammar test; options { output=AST; backtrack=true; memoize=true; } rule_list_in_order : ( first_rule | second_rule | any_left_over_tokens)+ ; first_rule : FIRST_TOKEN ; second_rule: FIRST_TOKEN NEW_LINE SECOND_TOKEN NEW_LINE; any_left_over_tokens : NEW_LINE | FIRST_TOKEN | SECOND_TOKEN; FIRST_TOKEN : 'First token here' ; SECOND_TOKEN : 'Second token here'; NEW_LINE : ('\r'?'\n') ;

What's better, ANTLR or JavaCC? [closed]

阅读更多关于 What's better, ANTLR or JavaCC? [closed]

Concerns are documentation/learnability, eclipse integration, tooling, community support and performance (in roughly that order). Wilfred Springer There are a couple of alternatives you shouldn't rule out: JParsec is a parser combinator framework that allows you to construct your parser entirely from code. Scala's parser combinator framework addresses a similar concern; however, Scala's syntax makes all of this much more readable. Then there's also the parser combinator framework done by John Metsker, for his book Building Parsers With Java ; I don't remember exactly where the library is, but

Advantages of Antlr (versus say, lex/yacc/bison) [closed]

阅读更多关于 Advantages of Antlr (versus say, lex/yacc/bison) [closed]

I've used lex and yacc (more usually bison) in the past for various projects, usually translators (such as a subset of EDIF streamed into an EDA app). Additionally, I've had to support code based on lex/yacc grammars dating back decades. So I know my way around the tools, though I'm no expert. I've seen positive comments about Antlr in various fora in the past, and I'm curious as to what I may be missing. So if you've used both, please tell me what's better or more advanced in Antlr. My current constraints are that I work in a C++ shop, and any product we ship will not include Java, so the

build AST in antlr4

阅读更多关于 build AST in antlr4

问题 I was wondering whether we could build an AST using Antlr version 4. I couldn't find any reference on building it using antlr4. One SO answer says that it would be easy to use antlr4 which produces only parse tree but my question is what about the efficiency ? It forces us to crawl whole parse tree instead of an abstract syntax tree which is not efficient way to walk through the whole tree and perform task using visitors. 回答1: There are two key items I'd like to point out first: Efficiency

antlr3 - Generating a Parse Tree

阅读更多关于 antlr3 - Generating a Parse Tree

I'm having trouble figuring out the antlr3 API so I can generate and use a parse tree in some javascript code. When I open the grammar file using antlrWorks (their IDE), the interpreter is able to show me the parse tree, and it's even correct. I'm having a lot of difficulties tracking down resources on how to get this parse tree in my code using the antlr3 runtime. I've been messing around with the various functions in the runtime and Parser files but to no avail: var input = "(PR=5000)", cstream = new org.antlr.runtime.ANTLRStringStream(input), lexer = new TLexer(cstream), tstream = new org

How do I get an Antlr Parser rule to read from both default AND hidden channel

阅读更多关于 How do I get an Antlr Parser rule to read from both default AND hidden channel

I use the normal whitespace separation into the hidden channel but I have one rule where I would like to include any whitespace for later processing but any example I have found requires some very strange manual coding. Is there no easy option to read from multiple channels like the option to put the whitespace there from the beginning. Ex. this is the WhiteSpace lexer rule WS : ( ' ' | '\t' | '\r' | '\n' ) {$channel=HIDDEN;} ; And this is my rule where I would like to include whitespace raw : '{'? (~('{'))*; Basically it's a catch all rule to capture any content that does not match other

Can I add Antlr tokens at runtime?

阅读更多关于 Can I add Antlr tokens at runtime?

I have a situation where my language contains some words that aren't known at build time but will be known at run time causing the need to constantly rebuild / redeploy the program to take into account new words. I was wandering if it was possible in Antlr generate some of the tokens from a config file? e.g In a simplified example if I have a rule rule : WORDS+; WORDS : 'abc'; And my language comes across 'bcd' at runntime, I would like to be able to modify a config file to define bcd as a word rather than having to rebuild then redeploy. You could add some sort of collection to your lexer