lex

Ignoring errors in yacc/lex

半世苍凉 提交于 2019-12-20 06:32:50
问题 I'm new to yacc/lex and I'm working on a parser that was written by someone else. I notice that when an undefined token is found, the parser returns an error and stops. Is there a simple way to just make it ignore completely lines that it cannot parse and just move on to the next one? 回答1: just add a rule that looks like . { // do nothing } at the bottom of all of your rules, and it will just ignore everything it comes across that doesn't fit any of the previous rules. Edit: if you have

Lex regex gets some extra characters

為{幸葍}努か 提交于 2019-12-20 06:01:07
问题 I have the following definition in my lex file: L [a-zA-Z_] A [a-zA-Z_0-9] %% {L}{A}* { yylval.id = yytext; return IDENTIFIER; } And I do the following in my YACC file: primary_expression : IDENTIFIER { puts("IDENTIFIER: "); printf("%s", $1); } My source code (the one I'm analyzing) has the following assignment: ab= 10; For some reason, that printf("%s", $1); part is printing ab= and not only ab . I'm pretty sure that's the section that is printing ab= because when I delete the printf("%s",

Parser - Segmentation fault when calling yytext

自古美人都是妖i 提交于 2019-12-20 05:47:09
问题 My parser is recognizing the grammar and indicating the correct error line using yylineno. I want to print the symbol wich caused the error. int yyerror(string s) { extern int yylineno; // defined and maintained in lex.yy.c extern char *yytext; // defined and maintained in lex.yy.c cerr << "error: " << s << " -> " << yytext << " @ line " << yylineno << endl; //exit(1); } I get this error when I write something not acceptable by the grammar: error: syntax error -> Segmentation fault Am I not

What is regular expression for multi string?

喜欢而已 提交于 2019-12-20 05:01:57
问题 I am learning to make a compiler and it's got some rules like single string: char ch[] ="abcd"; and multi string: printf("This is\ a multi\ string"); I wrote the regular expression STRING \"([^\"\n]|\\{NEWLINE})*\" It works fine with single line string but it doesn't work with multi line string where one line ends with a '\' character. What should I change? 回答1: A common string pattern is \"([^"\\\n]|\\(.|\n))*\" This will match strings which include escaped double quotes ( \" ) and

【flex&bison翻译】前言

时间秒杀一切 提交于 2019-12-20 00:37:59
【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>> ****** 译者注:去年的时候曾经计划翻译本书,后来终于还是抵不过懒惰,给放下了,有句经典的话:现在的努力,是为了小时候吹过的牛逼。现在体会深刻啊。。。 本文是在Ubuntu 12.04.1系统下,使用LibreOffice Writer工具一个字一个字码上的,然后再手动调整字体和字号发表的,发布出去的那一刻,感觉真的很好,希望自己能坚持下来,写博客,写技术文章,翻译技术文章和书籍。+U ****** #### 当然,有语句不通顺的地方,或者描述不准确的地方,请不吝指出,我会尽快修正。 #### $$$$ 本书翻译时对一些术语采用了以下翻译:lexical词法,syntax句法,grammar语法。是否合适还有待商榷。$$$$ Flex和 bison 这两个工具是专为开发编译器( compilers) 和解释器( interpreters) 的开发人员而设计的。但是 flex 和 bison 的功能不仅仅如此,只要对程序的输入信息进行匹配查找,或者程序本身是 CLI 界面的,都可以使用 flex 和 bison 来进行开发。更进一步来讲,它们(指 flex 和 bison 这两个工具,下同。)还可以快速构建应用程序原型,并易于修改和维护,因此对于一些非编译器开发人员, flex 和 bison

Ply Lex parsing problem

牧云@^-^@ 提交于 2019-12-19 18:24:53
问题 I'm using ply as my lex parser. My specifications are the following : t_WHILE = r'while' t_THEN = r'then' t_ID = r'[a-zA-Z_][a-zA-Z0-9_]*' t_NUMBER = r'\d+' t_LESSEQUAL = r'<=' t_ASSIGN = r'=' t_ignore = r' \t' When i try to parse the following string : "while n <= 0 then h = 1" It gives following output : LexToken(ID,'while',1,0) LexToken(ID,'n',1,6) LexToken(LESSEQUAL,'<=',1,8) LexToken(NUMBER,'0',1,11) LexToken(ID,'hen',1,14) ------> PROBLEM! LexToken(ID,'h',1,18) LexToken(ASSIGN,'=',1,20)

Any differences between terms parse trees and derivation trees?

断了今生、忘了曾经 提交于 2019-12-18 12:25:15
问题 The terms AST (Abstract Syntax Tree), parse tree and derivation tree are bandied about by different people when referring to the result of parsing texts conforming to a grammar. Assuming we are talking about parsing computer languages, are their differences minute enough that we can use these terms interchangeably ? If not, how do we use the terms correctly ? 回答1: AFAIK, "derivation tree" and "parse tree" are the same. Abstract syntax tree In computer science, an abstract syntax tree (AST),

Simple Flex/Bison C++

三世轮回 提交于 2019-12-18 11:50:49
问题 I already looked for my answer but I didn't get any quick response for a simple example. I want to compile a flex/bison scanner+parser using g++ just because I want to use C++ classes to create AST and similar things. Searching over internet I've found some exploits, all saying that the only needed thing is to declare some function prototypes using extern "C" in lex file. So my shady.y file is %{ #include <stdio.h> #include "opcodes.h" #include "utils.h" void yyerror(const char *s) { fprintf

Shift Reduce Conflict

99封情书 提交于 2019-12-18 09:38:48
问题 I'm having trouble fixing a shift reduce conflict in my grammar. I tried to add -v to read the output of the issue and it guides me towards State 0 and mentions that my INT and FLOAT is reduced to variable_definitions by rule 9. I cannot see the conflict and I'm having trouble finding a solution. %{ #include <stdio.h> #include <stdlib.h> %} %token INT FLOAT %token ADDOP MULOP INCOP %token WHILE IF ELSE RETURN %token NUM ID %token INCLUDE %token STREAMIN ENDL STREAMOUT %token CIN COUT %token

yytext contains characters not in match

谁都会走 提交于 2019-12-18 09:31:19
问题 Background I am using flex to generate a lexer for a programming language I am implementing. I have some problems with this rule for identifiers: [a-zA-Z_][a-zA-Z_0-9]* { printf("yytext is %s\n", yytext); yylval.s = yytext; return TOK_IDENTIFIER; } The rule works as it should when my parser is parsing expressions like this: var0 = var1 + var2; The printf statement will print out this: yytext is 'var0' yytext is 'var1' yytext is 'var2' Which is what it should. The problem But when my parser is