lexer | 易学教程

为什么 antlr 用于模板引擎不是个好主意

阅读更多关于为什么 antlr 用于模板引擎不是个好主意

【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>> 我在发布 jfinal 3.0 的时候认为 antlr 用于 "模板引擎" 并不是个好主意，两年多时间过去了，我的观点更进一步：认为 antlr 在多数 “非模板引擎” 的场景下使用也不是个好主意。在发布 jfinal 3.0 的时候谈到 antlr，只言片语信息量太少，引起了部分人的误解，今天就来稍稍展开聊一聊。一、antlr 生成 Parser 难于调试、难于阅读首先现场直观来感受一下 jfinal 手写 Parser 与使用 antlr 生成的 parser 的对比，下面是为 jfinal enjoy 模板引擎手写的 parser： https://gitee.com/jfinal/jfinal/blob/master/src/main/java/com/jfinal/template/stat/Parser.java 空行 + 注释 + java 代码一共 278 行，干净利落，人类轻松阅读。更重要的是其用到的 Recursive Descent 算法简洁可靠，功能强大，随手可得。了解这个算法原理的同学几个小时就可以手撸一个自己的 Parser 出来。再来看一下 antlr 为模板引擎生成的 parse： https://gitee.com/xiandafu/beetl/blob/master

Parse tree generation with Java CUP

阅读更多关于 Parse tree generation with Java CUP

问题 I am using CUP with JFlex to validate expression syntax. I have the basic functionality working: I can tell if an expression is valid or not. Next step is to implement simple arithmetic operations, such as "add 1". For example, if my expression is "1 + a", the result should be "2 + a". I need access to parse tree to do that, because simply identifying a numeric term won't do it: the result of adding 1 to "(1 + a) * b" should be "(1 + a) * b + 1", not "(2 + a) * b". Does anyone have a CUP

Parsing optional semicolon at statement end

阅读更多关于 Parsing optional semicolon at statement end

问题 I was writing a parser to parse C-like grammars. First, it could now parse code like: a = 1; b = 2; Now I want to make the semicolon at the end of line optional. The original YACC rule was: stmt: expr ';' { ... } Where the new line is processed by the lexer that written by myself(the code are simplified): rule(/\r\n|\r|\n/) { increase_lineno(); return :PASS } the instruction :PASS here is equivalent to return nothing in LEX, which drop current matched text and skip to the next rule, just like

Where should I draw the line between lexer and parser?

阅读更多关于 Where should I draw the line between lexer and parser?

问题 I'm writing a lexer for the IMAP protocol for educational purposes and I'm stumped as to where I should draw the line between lexer and parser. Take this example of an IMAP server response: * FLAGS (\Answered \Deleted) This response is defined in the formal syntax like this: mailbox-data = "FLAGS" SP flag-list flag-list = "(" [flag *(SP flag)] ")" flag = "\Answered" / "\Deleted" Since they are specified as string literals (aka "terminal" tokens) would it be more correct for the lexer to emit

Examples / tutorials for usage of javax.lang.model or ANTLR JavaParser to get information on Java Source Code

阅读更多关于 Examples / tutorials for usage of javax.lang.model or ANTLR JavaParser to get information on Java Source Code

问题 I would like to create an automatic Flowchart-like visualization to simple Java Logic, for this I need to parse Java Source code, I have 2 candidates, ANTLR and javax.lang.model of Java 6. Neither are easy. I have yet to find a single working example that will be even remotely close to what I want to achieve. I want to find simple variable declarations, assignments, and flows (if, for, switch, boolean conditions etc) Is there a simple example or tutorial for either of these? I found very few

Where can I find a formal grammar for MATLAB?

阅读更多关于 Where can I find a formal grammar for MATLAB?

问题 I would like to write a lexer generator to convert a basic subset of the MATLAB language to C#, C++, etc. To help me do this, I would like to find a document containing the formal grammar for MATLAB. Having spent a bit of time investigating this, it seems that Mathworks do not provide one. Does anyone know where I could find such a document? 回答1: Excellent opportunity to write your own formal grammar :) If you should choose to write the grammer your self, I can recommend BNFC which can take a

Antlr3: Could not match token in parser rules which is used in lexer rule

阅读更多关于 Antlr3: Could not match token in parser rules which is used in lexer rule

问题 I have lexer rules in Antlr3 as: HYPHEN : '-'; TOKEN : HYPHEN CHARS; CHARS : 'a' ..'z'; Parser rule is as: exp : CHARS | some complex expression; parser_rule : exp HYPHEN exp; If I try to match 'abc-abc' with parser_rule, It fails. Because lexer creates TOKEN for HYPHEN exp. How can I match it correctly with parser_rule. 回答1: In ANTLR lexer, the lexer rule that can match the longest sub-sequence of input is used. So your input abc-abc will be tokenized as CHARS("abc") TOKEN("-abc") and

Oracle text search on multiple tables and joins

阅读更多关于 Oracle text search on multiple tables and joins

问题 I have the following SQL statement. select emp_no,dob,dept_no from v_depts where catsearch (emp_no,'abc',NULL) > 0 or catsearch (dept_no,'abc',NULL) > 0 where v_depts is a view. Now I would like to add one or more tables as join so that I can do text search on columns e.g. employee_details contains employee information and I can join with emp_no I have created index on employee_details table for emp_name column, however I am not able to join with v_depts to search because I modify my sql

How would you parse indentation (python style)?

阅读更多关于 How would you parse indentation (python style)?

问题 How would you define your parser and lexer rules to parse a language that uses indentation for defining scope. I have already googled and found a clever approach for parsing it by generating INDENT and DEDENT tokens in the lexer. I will go deeper on this problem and post an answer if I come to something interesting, but I would like to see other approaches to the problem. EDIT: As Charlie pointed out, there is already another thread very similar if not the same. Should my post be deleted? 回答1

How would you parse indentation (python style)?

阅读更多关于 How would you parse indentation (python style)?