lexer

Antlr(DSL)

柔情痞子 提交于 2020-03-02 08:32:13
Antlr Name:ANother Tool for language for Language Recognition Site: https://github.com/antlr/ https://theantlrguy.atlassian.net/wiki/display/ANTLR3/ANTLR+v3+documentation http://www.antlr3.org/grammar/list.html http://www.crifan.com/files/doc/docbook/antlr_tutorial/release/pdf/antlr_tutorial.pdf 作用:生成某种语言的Lexer, Parser, Tree Walker or Lexer&Parser的combinor 用例: Hibernate解析HQL Spring解析 EL Gemfire(or Geode)解析OQL 版本:3.3(3.3实际上是用2.7依据Antlr.g grammar文件生成的parser) (由这个parser来解析我们的grammar 文件,然后由它的另一个library StringTemplate 来生成我们的parser 或者lexer) 输入:特定语言A的文法文件 (.g文件) 输出:特定语言A的解析程序(可以是Java C# C++ 等等) 文法文件

两周自制脚本语言-第6天 通过解释器执行程序

白昼怎懂夜的黑 提交于 2020-02-27 15:16:40
第6天 通过解释器执行程序 解释器从抽象语法树的根节点开始遍历该树直至叶节点,并计算各节点的内容 6.1 eval方法与环境对象 eval方法:eval是evaluate(求值)的缩写。eval方法将计算与该节点为根的子树对应的语句、表达式及子表达式,并返回执行结果。 eval方法递归调用子节点的eval方法 不同类型的节点的类,对eval方法有着不同的定义 eval方法的简化版本 public Object eval(Environment env){ Object left = left().eval(env); Object right = right().eval(env); return (Integer)letf + (Integer)right; } public Object eval(Environment env){ return value(); } // value()将返回该对象表示的整型字面量 类似深度优先树节点搜索算法 代码清单 6.1 环境对象的接口Environment.java package chap6; public interface Environment { void put(String name, Object value); Object get(String name); } 代码清单 6.2 环境对象的类BasicEnv

两周自制脚本语言-第5天 设计语法分析器

若如初见. 提交于 2020-02-26 15:42:56
第5天 设计语法分析器 5.1 Stone语言的语法 代码清单 5.1 Stone 语言的语法定义 primary : "(" expr ")" | NUMBER | IDENTIFIER | STRING factor : "-" primary | primary expr : factor { OP factor } block : "{" [ statement ] { (";" | EOL) [ statement ] } "}" simple : expr statement : "if" expr block [ "else" block ] | "while" expr block | simple program : [ statement ] (";" | EOF) 5.2 使用解析器和组合子 Parser库: 一种解析器组合子类型的库 工作是将BNF写成的语法规则改写成Java语言程序 在书中第十七章有详细解说 代码清单 5.2 Stone 语言的语法分析器BasicParser.java //代码清单5.2 由代码清单5.1中列出的Stone语言语法转换而成的语法分析程序。 /* A basic Parser for Stone grammatical analysis */ package stone; import stone.Parser

两周自制脚本语言-第11天 优化变量读写性能

给你一囗甜甜゛ 提交于 2020-02-26 08:23:47
第11天 优化变量读写性能 以变量值的读写为例,向读者介绍基于这种理念的语言处理器性能优化方式。 11.1 通过简单数组来实现环境 假如函数包含局部变量x与y,程序可以事先将x设为数组的第0个元素,将y设为第1个元素,以此类推。这样一来,语言处理器引用变量时就无需计算哈希值。也就是说,这是一个通过编号,而非名称来查找变量值的环境 为了实现这种设计,语言处理器需要在函数定义完成后遍历对应的抽象语法树节点,获取该节点使用的所有函数参数与局部变量。遍历之后程序将得到函数中用到的参数与局部变量的数量,于是确定了用于保存这些变量的数组的长度 之后,语言处理器在实际调用函数,对变量的值进行读写操作时,将会直接引用数组中的元素。变量引用无需再像之前那样通过在哈希表中查找变量名的方式实现。 确定变量的值在数组中的保存位置之后,这些信息将被记录于抽象语法树节点对象的字段中。例如,程序中出现的变量名在抽象语法树中以Name对象表示。这一Name对象将事先在字段中保存数组元素的下标,这样语言处理器在需要引用该变量时,就能知道应该引用数组中的哪一个元素。Name对象的eval方法将通过该字段来引用数组元素,获得变量的值。 不必在程序执行时通过变量名来查找变量。 如果希望在Name对象的字段中保存变量的引用,仅凭数组元素仍然不够,还需要同时记录与环境对应的作用域。环境将以嵌套结构实现闭包。为此

ANTLR How to use lexer rules having same starting?

走远了吗. 提交于 2020-01-14 09:53:29
问题 How to use lexer rules having same starting? I am trying to use two similar lexer rules (having the same starting): TIMECONSTANT: ('0'..'9')+ ':' ('0'..'9')+; INTEGER : ('0'..'9')+; COLON : ':'; Here is my sample grammar: grammar TestTime; text : (timeexpr | caseblock)*; timeexpr : TIME; caseblock : INT COLON ID; TIME : ('0'..'9')+ ':' ('0'..'9')+; INT : ('0'..'9')+; COLON : ':'; ID : ('a'..'z')+; WS : (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;}; When i try to parse text: 12:44 123 : abc

How to make lex/flex recognize tokens not separated by whitespace?

不羁的心 提交于 2020-01-12 08:07:43
问题 I'm taking a course in compiler construction, and my current assignment is to write the lexer for the language we're implementing. I can't figure out how to satisfy the requirement that the lexer must recognize concatenated tokens. That is, tokens not separated by whitespace. E.g.: the string 39if is supposed to be recognized as the number 39 and the keyword if . Simultaneously, the lexer must also exit(1) when it encounters invalid input. A simplified version of the code I have: %{ #include

How to make lex/flex recognize tokens not separated by whitespace?

泪湿孤枕 提交于 2020-01-12 08:06:51
问题 I'm taking a course in compiler construction, and my current assignment is to write the lexer for the language we're implementing. I can't figure out how to satisfy the requirement that the lexer must recognize concatenated tokens. That is, tokens not separated by whitespace. E.g.: the string 39if is supposed to be recognized as the number 39 and the keyword if . Simultaneously, the lexer must also exit(1) when it encounters invalid input. A simplified version of the code I have: %{ #include

How would I go about Implementing A Simple Stack-Based Programming Language

走远了吗. 提交于 2020-01-11 18:33:51
问题 I am interested in extending my knowledge of computer programming by implementing a stack-based programming language. I am seeking out advice on where to begin, as I intend for it to have functions like " pushint 1 " which would push an integer with value 1 on to the top of the stack and flow-control via labels like " L01: jump L01: ". So far I have made a C# implementation of what I want my language to act like (wanted to link to it but IDEOne is blocked), but it is very messy and needs

ANTLR lexer can't lookahead at all

老子叫甜甜 提交于 2020-01-11 09:23:10
问题 I have the following grammar: rule: 'aaa' | 'a' 'a'; It can successfully parse the string 'aaa', but it fails to parse 'aa' with the following error: line 1:2 mismatched character '<EOF>' expecting 'a' FYI, it is the lexer's problem not the parser's because I don't even call the parser. The main function looks like: @members { public static void main(String[] args) throws Exception { RecipeLexer lexer = new RecipeLexer(new ANTLRInputStream(System.in)); for (Token t = lexer.nextToken(); t

ANTLR3 grammar does not match rule with predicate

这一生的挚爱 提交于 2020-01-06 04:16:29
问题 I have a combined grammar where I need to provide for two identifier lexer rules. Both identifiers can be used at the same time. Identifier1 comes before Identifer2 in grammar. First identifier is static, whereas second identifier rule changes on the basis of some flag.(Using predicate). I want the second identifier to match in parser rules. But as both identifiers may match some common inputs, It does not fall on identifer2. I have created small grammar to make it understandable. Grammar is