lex | 易学教程

Remote version of flex misinterprets my rules

阅读更多关于 Remote version of flex misinterprets my rules

问题 I've written a little assembler using flex and bison, that builds and runs OK on my machine (ubuntu 10.10). Someone else is now trying to build it on arch linux, and their install of flex produces a different lex.yy.c which is mis-matching rules. Both versions report the same lex 2.5.35 version, but I've already seen differences between mine and another flex on Mac OSX which didn't grok (?i patterns, so I don't trust that version string much. I don't have access to the remote machine, so I'm

Tracking source position of AST nodes in a compiler (ocaml)

阅读更多关于 Tracking source position of AST nodes in a compiler (ocaml)

问题 I'm writing a compiler in ocaml, using ocamllex/yacc. Things are going well, but I've a design problem. For each AST node I create, it'd be good to have information about line/character position of that node in the source code. That would be useful for providing error messages to the user later. Now, I can add some kind of meta type to my nodes: type node = Node1 of ... * meta | Node2 of ... * meta but that seems redundant. Later, when I'm done with verifying the AST, I'll have to write match

Lex和Yacc

阅读更多关于 Lex和Yacc

lex负责词法解析，而yacc负责语法解析，其实说白了就是lex负责根据指定的正则表达式，将输入的字符串匹配成一个一个的token，同时允许用户将当前匹配到的字符串进行处理，并且允许返回一个标识当前token的标识码。而yacc则负责进行语法解析，将一个个的token最终形成一个完整的语法。 lex和yacc类似的，分为三个部分 %{ 这里可以写任何的c代码比如一些初始化的状态 %} 这里是一些的lex或者yacc的定义比如lex里的%s yacc里的%type %token %left %right %union %% 这里可以写任何的lex或者yacc代码 %% 这里可以写任何c代码这里需要注意的是，对于一个语法分析器来说，可以不使用lex而自己根据需要来完成，但是使用lex可以直接使用正则来进行匹配，使得整个过程更加简单。而lex与yacc之间的通信就是靠%union里定义的联合体来完成。其实%union里定义的联合体最终会被生成一个叫yylval的全局变量，这个全局变量可以在lex和yacc之间传递变量。 %type<xxx> 这里的xxx必须是%union里定义的一个成员变量，那么%type<xxx> 里定义的类型就会被存储这个成员变量里 %token<xxx> 与%type类似，只是它所代表的是token而已 $$代表当前的type所计算的最终结果，而

Removing nested comments bz lex

阅读更多关于 Removing nested comments bz lex

问题 How should I do program in lex (or flex) for removing nested comments from text and print just the text which is not in comments? I should probably somehow recognize states when I am in comment and number of starting "tags" of block comment. Lets have rules: 1.block comment /* block comment */ 2. line comment // line comment 3. Comments can be nested. Example 1 show /* comment /* comment */ comment */ show output: show show Example 2 show /* // comment comment */ show output: show show

Error recovery in an LALR(1) grammar

阅读更多关于 Error recovery in an LALR(1) grammar

问题 I'm using some parser and lexer generating tools (similar to Lex and Bison, but for C#) to generate programs that parse strings into abstract syntax trees that can later be evaluated. I wanted to do error recovery (i.e. report in the produced abstract sentence tree that there are missing tokens and such). I had two approaches in mind to structuring the generated grammars, and I was wondering which approach was better/more flexible/wouldn't have conflicts (the .y and .lex files are generated

How do I write a non-greedy match in LEX / FLEX?

阅读更多关于 How do I write a non-greedy match in LEX / FLEX?

问题 I'm trying to parse a legacy language (which is similar to 'C') using FLEX and BISON. Everything is working nicely except for matching strings. This rather odd legacy language doesn't support quoting characters in string literals, so the following are all valid string literals: "hello" "" "\" I'm using the following rule to match string literals: \".*\" { yylval.strval = _strdup( yytext ); return LIT_STRING; } Unfortunately this is a greedy match, so it matches code like the following: "hello

Class文件格式

阅读更多关于 Class文件格式

我们知道Java是一门跨平台的语言，我们编写的Java代码会被编译成中间class文件以让Java虚拟机解析运行。而Java虚拟机规范仅仅描述了抽象的Java虚拟机，在实现具体的Java虚拟机时，仅指出了设计规范。Java虚拟机的实现必须体现规范中的内容，但仅在确有必要时才应该受制于这些规范。对于完整内容，可以查看原文档，以JDK7为例，可查看 https://docs.oracle.com/javase/specs/jvms/se7/html/ ，或者《深入理解Java虚拟机 JVM高级特性与最佳实践》一书。完整的规范主要包含以下内容：第2章：概览Java虚拟机整体架构第3章：介绍如何将Java语言编写的程序转换为虚拟机指令集第4章：定义class文件格式。它是一种与硬件和操作系统无关的二进制格式，用来表示编译后的类和接口第5章：定义了Java虚拟机启动以及类和接口的加载、链接和初始化的过程第6章：定义了Java虚拟机指令集第7章：提供了一张以操作码值为索引的Java虚拟机操作码助记表本文只是大概记录项目需要了解的基础概念，着重在介绍Class文件格式上，为该系列后续内容做铺垫。 Class文件是一组以8字节为基础单位的二进制流，各个数据项目严格按照顺序紧凑排列在class文件中，中间没有任何分割符。每个 Class 文件都是由 8 字节为单位的字节流组成

Indentation control while developing a small python like language

阅读更多关于 Indentation control while developing a small python like language

问题 I'm developing a small python like language using flex, byacc (for lexical and parsing) and C++, but i have a few questions regarding scope control. just as python it uses white spaces (or tabs) for indentation, not only that but i want to implement index breaking like for instance if you type "break 2" inside a while loop that's inside another while loop it would not only break from the last one but from the first loop as well (hence the number 2 after break) and so on. example: while 1

Lexer and Parser Generators for Common Lisp [closed]

阅读更多关于 Lexer and Parser Generators for Common Lisp [closed]

问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 6 years ago . Can you recommend lexer and parser generators for Common Lisp? I have seen the following lists on CLiki, but most on the lists seem to be in their alpha stages: http://www.cliki.net/LEXER http://www.cliki.net/parser%20generator So it would be helpful if you could share your good or bad experience with any of

Class文件格式

阅读更多关于 Class文件格式

订阅 lex