lexer

为什么 antlr 用于模板引擎不是个好主意

爱⌒轻易说出口 提交于 2019-12-23 02:19:09
【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>> 我在发布 jfinal 3.0 的时候认为 antlr 用于 "模板引擎" 并不是个好主意,两年多时间过去了,我的观点更进一步:认为 antlr 在多数 “非模板引擎” 的场景下使用也不是个好主意。 在发布 jfinal 3.0 的时候谈到 antlr,只言片语信息量太少,引起了部分人的误解,今天就来稍稍展开聊一聊。 一、antlr 生成 Parser 难于调试、难于阅读 首先现场直观来感受一下 jfinal 手写 Parser 与使用 antlr 生成的 parser 的对比,下面是为 jfinal enjoy 模板引擎手写的 parser: https://gitee.com/jfinal/jfinal/blob/master/src/main/java/com/jfinal/template/stat/Parser.java 空行 + 注释 + java 代码一共 278 行,干净利落,人类轻松阅读。更重要的是其用到的 Recursive Descent 算法简洁可靠,功能强大,随手可得。了解这个算法原理的同学几个小时就可以手撸一个自己的 Parser 出来。 再来看一下 antlr 为模板引擎生成的 parse: https://gitee.com/xiandafu/beetl/blob/master

Parse tree generation with Java CUP

大憨熊 提交于 2019-12-22 08:07:03
问题 I am using CUP with JFlex to validate expression syntax. I have the basic functionality working: I can tell if an expression is valid or not. Next step is to implement simple arithmetic operations, such as "add 1". For example, if my expression is "1 + a", the result should be "2 + a". I need access to parse tree to do that, because simply identifying a numeric term won't do it: the result of adding 1 to "(1 + a) * b" should be "(1 + a) * b + 1", not "(2 + a) * b". Does anyone have a CUP

Parsing optional semicolon at statement end

回眸只為那壹抹淺笑 提交于 2019-12-21 08:02:09
问题 I was writing a parser to parse C-like grammars. First, it could now parse code like: a = 1; b = 2; Now I want to make the semicolon at the end of line optional. The original YACC rule was: stmt: expr ';' { ... } Where the new line is processed by the lexer that written by myself(the code are simplified): rule(/\r\n|\r|\n/) { increase_lineno(); return :PASS } the instruction :PASS here is equivalent to return nothing in LEX, which drop current matched text and skip to the next rule, just like

Where should I draw the line between lexer and parser?

狂风中的少年 提交于 2019-12-21 07:55:25
问题 I'm writing a lexer for the IMAP protocol for educational purposes and I'm stumped as to where I should draw the line between lexer and parser. Take this example of an IMAP server response: * FLAGS (\Answered \Deleted) This response is defined in the formal syntax like this: mailbox-data = "FLAGS" SP flag-list flag-list = "(" [flag *(SP flag)] ")" flag = "\Answered" / "\Deleted" Since they are specified as string literals (aka "terminal" tokens) would it be more correct for the lexer to emit

Examples / tutorials for usage of javax.lang.model or ANTLR JavaParser to get information on Java Source Code

Deadly 提交于 2019-12-21 05:25:10
问题 I would like to create an automatic Flowchart-like visualization to simple Java Logic, for this I need to parse Java Source code, I have 2 candidates, ANTLR and javax.lang.model of Java 6. Neither are easy. I have yet to find a single working example that will be even remotely close to what I want to achieve. I want to find simple variable declarations, assignments, and flows (if, for, switch, boolean conditions etc) Is there a simple example or tutorial for either of these? I found very few

Where can I find a formal grammar for MATLAB?

爷,独闯天下 提交于 2019-12-20 09:06:51
问题 I would like to write a lexer generator to convert a basic subset of the MATLAB language to C#, C++, etc. To help me do this, I would like to find a document containing the formal grammar for MATLAB. Having spent a bit of time investigating this, it seems that Mathworks do not provide one. Does anyone know where I could find such a document? 回答1: Excellent opportunity to write your own formal grammar :) If you should choose to write the grammer your self, I can recommend BNFC which can take a

Antlr3: Could not match token in parser rules which is used in lexer rule

旧城冷巷雨未停 提交于 2019-12-20 07:27:58
问题 I have lexer rules in Antlr3 as: HYPHEN : '-'; TOKEN : HYPHEN CHARS; CHARS : 'a' ..'z'; Parser rule is as: exp : CHARS | some complex expression; parser_rule : exp HYPHEN exp; If I try to match 'abc-abc' with parser_rule, It fails. Because lexer creates TOKEN for HYPHEN exp. How can I match it correctly with parser_rule. 回答1: In ANTLR lexer, the lexer rule that can match the longest sub-sequence of input is used. So your input abc-abc will be tokenized as CHARS("abc") TOKEN("-abc") and

Oracle text search on multiple tables and joins

瘦欲@ 提交于 2019-12-18 18:04:45
问题 I have the following SQL statement. select emp_no,dob,dept_no from v_depts where catsearch (emp_no,'abc',NULL) > 0 or catsearch (dept_no,'abc',NULL) > 0 where v_depts is a view. Now I would like to add one or more tables as join so that I can do text search on columns e.g. employee_details contains employee information and I can join with emp_no I have created index on employee_details table for emp_name column, however I am not able to join with v_depts to search because I modify my sql

How would you parse indentation (python style)?

谁说我不能喝 提交于 2019-12-18 11:58:02
问题 How would you define your parser and lexer rules to parse a language that uses indentation for defining scope. I have already googled and found a clever approach for parsing it by generating INDENT and DEDENT tokens in the lexer. I will go deeper on this problem and post an answer if I come to something interesting, but I would like to see other approaches to the problem. EDIT: As Charlie pointed out, there is already another thread very similar if not the same. Should my post be deleted? 回答1

How would you parse indentation (python style)?

孤者浪人 提交于 2019-12-18 11:57:13
问题 How would you define your parser and lexer rules to parse a language that uses indentation for defining scope. I have already googled and found a clever approach for parsing it by generating INDENT and DEDENT tokens in the lexer. I will go deeper on this problem and post an answer if I come to something interesting, but I would like to see other approaches to the problem. EDIT: As Charlie pointed out, there is already another thread very similar if not the same. Should my post be deleted? 回答1