antlr | 易学教程

ANTLR: how to parse a region within matching brackets with a lexer

阅读更多关于 ANTLR: how to parse a region within matching brackets with a lexer

i want to parse something like this in my lexer: ( begin expression ) where expressions are also surrounded by brackets. it isn't important what is in the expression, i just want to have all what's between the (begin and the matching ) as a token. an example would be: (begin (define x (+ 1 2))) so the text of the token should be (define x (+ 1 2))) something like PROGRAM : LPAREN BEGIN .* RPAREN; does (obviously) not work because as soon as he sees a ")", he thinks the rule is over, but i need the matching bracket for this. how can i do that? Bart Kiers Inside lexer rules, you can invoke rules

How to merge two ASTs?

阅读更多关于 How to merge two ASTs?

I'm trying to implement a tool for merging different versions of some source code. Given two versions of the same source code, the idea would be to parse them, generate the respective Abstract Source Trees (AST), and finally merge them into a single output source keeping grammatical consistency - the lexer and parser are those of question ANTLR: How to skip multiline comments . I know there is class ParserRuleReturnScope that helps... but getStop() and getStart() always return null :-( Here is a snippet that illustrates how I modified my perser to get rules printed: parser grammar

ANTLR lexer rule consumes characters even if not matched?

阅读更多关于 ANTLR lexer rule consumes characters even if not matched?

I've got a strange side effect of an antlr lexer rule and I've created an (almost) minimal working example to demonstrate it. In this example I want to match the String [0..1] for example. But when I debug the grammar the token stream that reaches the parser only contains [..1] . The first integer, no matter how many digits it contains is always consumed and I've got no clue as to how that happens. If I remove the FLOAT rule everything is fine so I guess the mistake lies somewhere in that rule. But since it shouldn't match anything in [0..1] at all I'm quite puzzled. I'd be happy for any

Token return values in ANTLR 3 C

阅读更多关于 Token return values in ANTLR 3 C

I'm new to ANTLR, and I'm attempting to write a simple parser using C language target (antler3C). The grammar is simple enough that I'd like to have each rule return a value, eg: number returns [long value] : ( INT {$value = $INT.ivalue;} | HEX {$value = $HEX.hvalue;} ) ; HEX returns [long hvalue] : '0' 'x' ('0'..'9'|'a'..'f'|'A'..'F')+ {$hvalue = strtol((char*)$text->chars,NULL,16);} ; INT returns [long ivalue] : '0'..'9'+ {$ivalue = strtol((char*)$text->chars,NULL,10);} ; Each rule collects the return value of it's child rules until the topmost rule returns a nice struct full of my data. As

ANTLR - NoViableAltException

阅读更多关于 ANTLR - NoViableAltException

I'm trying to learn ANTLR by writing a grammer (I'm using eclipse with the plugins for ANTLR), and it was going alright until I ran into the error: NoViableAltException: line 0:-1 no viable alternative at input '<EOF>' When I try to test my args parser rule; typedident : (INT|CHAR) IDENT; args : (typedident ( COMMA typedident)*)?; An ident is a letter followed by any character, this works, I've tested it. typedident also works for the test. I'm using the input of int a12q2efwe, char a12eqdsf (totally random) and the tree appears fine in the interpreter, the only problem is that args has four

Antlr lexer tokens that match similar strings, what if the greedy lexer makes a mistake?

阅读更多关于 Antlr lexer tokens that match similar strings, what if the greedy lexer makes a mistake?

It seems that sometimes the Antlr lexer makes a bad choice on which rule to use when tokenizing a stream of characters... I'm trying to figure out how to help Antlr make the obvious-to-a-human right choice. I want to parse text like this: d/dt(x)=a a=d/dt d=3 dt=4 This is an unfortunate syntax that an existing language uses and I'm trying to write a parser for. The "d/dt(x)" is representing the left hand side of a differential equation. Ignore the lingo if you must, just know that it is not "d" divided by "dt". However, the second occurrence of "d/dt" really is "d" divided by "dt". Here's my

Hive SQL编译过程（转）

阅读更多关于 Hive SQL编译过程（转）

转自：https://www.cnblogs.com/zhzhang/p/5691997.html Hive是基于Hadoop的一个数据仓库系统，在各大公司都有广泛的应用。美团数据仓库也是基于Hive搭建，每天执行近万次的Hive ETL计算流程，负责每天数百GB的数据存储和分析。Hive的稳定性和性能对我们的数据分析非常关键。在几次升级Hive的过程中，我们遇到了一些大大小小的问题。通过向社区的咨询和自己的努力，在解决这些问题的同时我们对Hive将SQL编译为 MapReduce的过程有了比较深入的理解。对这一过程的理解不仅帮助我们解决了一些Hive的bug，也有利于我们优化Hive SQL，提升我们对Hive的掌控力，同时有能力去定制一些需要的功能。 1、MapReduce实现基本SQL操作的原理详细讲解SQL编译为MapReduce之前，我们先来看看MapReduce框架实现SQL基本操作的原理 1.1 Join的实现原理 select u.name, o.orderid from order o join user u on o.uid = u.uid; 在map的输出value中为不同表的数据打上tag标记，在reduce阶段根据tag判断数据来源。MapReduce的过程如下（这里只是说明最基本的Join的实现，还有其他的实现方式） 1.2 Group

ANTLR: Help on Lexing Errors for a custom grammar example

阅读更多关于 ANTLR: Help on Lexing Errors for a custom grammar example

问题 What approach would allow me to get the most on reporting lexing errors? For a simple example I would like to write a grammar for the following text (white space is ignored and string constants cannot have a \" in them for simplicity): myvariable = 2 myvariable = "hello world" Group myvariablegroup { myvariable = 3 anothervariable = 4 } Catching errors with a lexer How can you maximize the error reporting potential of a lexer? After reading this post: Where should I draw the line between

Is there a valid alternative to ANTLR written in C#? [closed]

阅读更多关于 Is there a valid alternative to ANTLR written in C#? [closed]

Closed. This question is off-topic . It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 3 years ago . ANTLR is a great piece of software, but, in my opinion, is a little bit uncomfortable for a C# programmer (the C# porting is out of date, the parser antlr-3.1.3.jar required java, etc) I'm looking for a "more C# native" language tool in order to parse a simple json-like grammar, any suggestion? I've used the GOLD Parser Generator, a freeware tool that you can use to specify BNF grammars, and then generate a

“FOLLOW_set_in_”… is undefined in generated parser

阅读更多关于 “FOLLOW_set_in_”… is undefined in generated parser

I have written a grammar for vaguely Java-like DSL. While there are still some issues with it (it doesn't recognize all the inputs as I would want it to), what concerns me most is that the generated C code is not compilable. I use AntlrWorks 1.5 with Antlr 3.5 (Antlr 4 apparently does not support C target). The problem is with expression rules. I have rules prio14Expression to prio0Expression which handle operator precedence. To problem is at priority 2, which evaluates prefix and postfix operators: ... prio3Expression: prio2Expression (('*' | '/' | '%') prio2Expression)*; prio2Expression: ('+