lex

Add error checking via production rules to LALR(1) grammar to handle all inputs

北慕城南 提交于 2019-12-06 08:05:58
I have a grammar that represents expressions. Let's say for simplicity it's: S -> E E -> T + E | T T -> P * T | P P -> a | (E) With a , + , * , ( and ) being the letters in my alphabet. The above rules can generate valid arithmetic expressions containing parenthesis, multiplication and addition using proper order of operations and associativity. My goal is to accept every string, containing 0 or more of the letters of my alphabet. Here are my constraints: The grammar must "accept" all strings contained 0 or more letters of my alphabet. New terminals may be introduced and inserted into the

Lex: identifier vs integer

元气小坏坏 提交于 2019-12-06 02:21:33
I'm trying to create my own simple programming language. For this I need to insert some regex into Lex. I'm using the following regex to match identifiers and integers. [a-zA-Z][a-zA-Z0-9]* /* identifier */ return IDENTIFIER; ("+"|"-")?[0-9]+ /* integer */ return INTEGER; Now when I check for example an illegal identifier like: 0a = 1; The leading zero is recognized as an integer followed by the 'a' recognized as an identifier. Instead of this I want this token '0a' to be recognized as an illegal character. How do I include this functionality? What regex do I have to adjust? The easiest way to

Why is yylval null?

拈花ヽ惹草 提交于 2019-12-06 01:00:11
I'm trying to write my first parser with Flex & Bison. When parsing numbers, I'm trying to save their values into the yylval structure. The problem is, yylval is null when the lexer reaches a number, which causes a segmentation fault. (Related point of confusion: why is it that in most Flex examples (e.g. here ), yylval is a structure, rather than a pointer to a structure? I couldn't get yylval to be recognized in test.l without %option bison-bridge , and that option made yylval a pointer. Also, I tried initializing yylval in main of test.y, but yylval = malloc(...) gives a type mismatch-- as

Boost spirit lex write token value back to input stream

橙三吉。 提交于 2019-12-05 20:01:21
I'm wondering if there's a way in boost::spirit::lex to write a token value back to the input stream (possibly after editing) and rescanning again. What I'm basically looking for is a functionality like that offered by unput() in Flex. Thanks! Sounds like you just want to accept tokens in different orders but with the same meaning. Without further ado, here is a complete sample that shows how this would be done, exposing the identifier regardless of input order. Output: Input 'abc(' Parsed as: '(abc' Input '(abc' Parsed as: '(abc' Code #include <boost/spirit/include/qi.hpp> #include <boost

Flex rule with a period “.” is not compiling

人走茶凉 提交于 2019-12-05 17:56:57
I am facing a problem compiling this regular expression with flex "on"[ \t\r]*[.\n]{0,300}"."[ \t\r]*[.\n]{0,300}"from" {counter++;} I had 100 hundred rules in rules section of flex specification file. I tried to compile it flex -Ce -Ca rule.flex I waited for 10 hours still it didn't complete so I killed it. I started to find the issue and narrowed down the problem to this rule. If I remove this rule from 100 rules, it takes 21 seconds to compile it to C code. If I replace the period with some other character it compiles successfully. e.g. "on"[ \t\r]*[.\n]{0,300}"A"[ \t\r]*[.\n]{0,300}"from"

Is there a Sublime Text Syntax for Flex and Bison?

梦想与她 提交于 2019-12-05 15:09:26
问题 I'm looking for a syntax in Sublime Text that highlights my Flex and Bison files (or lex/yacc) in a way that makes them readable... Sublime Text automatically chooses Lisp for Flex files, but that doesn't do the trick all that well. Any suggestions to try another syntax? Or is there a plugin somewhere that's useful (haven't found anything so far)?. 回答1: I haven't found one built specifically for Sublime, but I've found one for TextMate, which Sublime is compatible with. Therefore, for Flex

Multiple flex/bison parsers

旧巷老猫 提交于 2019-12-05 06:44:35
What is the best way to handle multiple Flex/Bison parsers inside a project? I wrote a parser and now I need a second one in the same project. So far in the third section of parser1.y I inserted the main(..) method and called yyparse from there. What I want to obtain is having two different parsers ( parser1.y and parser2.y ) and be able to use them from an external function (let's assume main in main.cpp ). Which precautions should I use to export yyparse functions outside .y files and how should I handle two parsers? PS. I'm using g++ to compile but not the C++ versions of Flex and Bison and

How do I write a non-greedy match in LEX / FLEX?

我的未来我决定 提交于 2019-12-05 00:37:20
I'm trying to parse a legacy language (which is similar to 'C') using FLEX and BISON. Everything is working nicely except for matching strings. This rather odd legacy language doesn't support quoting characters in string literals, so the following are all valid string literals: "hello" "" "\" I'm using the following rule to match string literals: \".*\" { yylval.strval = _strdup( yytext ); return LIT_STRING; } Unfortunately this is a greedy match, so it matches code like the following: "hello", "world" As a single string ( hello", "world ). The usual non-greedy quantifier .*? doesn't seem to

What is the regex expression for CDATA

浪子不回头ぞ 提交于 2019-12-05 00:34:03
问题 Hi I have an example CDATA here <![CDATA[asd[f]]]> and <tag1><![CDATA[asd[f]]]></tag1><tag2><![CDATA[asd[f]]]></tag2> The CDATA regex i have is not able to recognize this "<![CDATA["([^\]]|"]"[^\]]|"]]"[^>])*"]]>" this does not work too "<![CDATA["[^\]]*[\]]{2,}([^\]>][^\]]*[\]]{2,})*">" Will someone please give me a regex for <![CDATA[asd[f]]]> , I need to use it in Lex/Flex : I have answered this question, please vote on my answer, thanks. 回答1: Easy enough, it should be this: <!\[CDATA\[.*?

error handling in YACC

微笑、不失礼 提交于 2019-12-04 15:51:13
hi there i'm trying to make a simple parser and using lex and yacc. the thing is i wanna print my own error messages rather than error symbol used by yacc which prints syntax error . for example this is my yacc code; %{ #include <stdio.h> #include <string.h> #include "y.tab.h" extern FILE *yyin; extern int linenum; %} %token INTRSW IDENTIFIER INTEGER ASSIGNOP SEMICOLON DOUBLEVAL DOUBLERSW COMMA %token IF ELSE WHILE FOR %token CLOSE_BRA OPEN_BRA CLOSE_PARA OPEN_PARA EQ LE GE %token SUM MINUS MULTIP DIV %left OPEN_BRA OPEN_PARA %left MULTIP DIV %left SUM MINUS %union { int number; char* string;