lex

lex & yacc get current position

末鹿安然 提交于 2019-12-01 09:55:54
问题 In lex & yacc there is a macro called YY_INPUT which can be redefined, for example in a such way #define YY_INPUT(buf,result,maxlen) do { \ const int n = gzread(gz_yyin, buf, maxlen); \ if (n < 0) { \ int errNumber = 0; \ reportError( gzerror(gz_yyin, &errNumber)); } \ \ result = n > 0 ? n : YY_NULL; \ } while (0) I have some grammar rule which called YYACCEPT macro. If after YYACCEPT I called gztell (or ftell), then I got a wrong number, because parser already read some unnecessary data. So

Non-Greedy Regular Expression Matching in Flex

爷,独闯天下 提交于 2019-12-01 05:45:05
I have just started with Flex and can't seem to figure out how to match the following Expression : "Dog".*"Cat" ------------------ Input : Dog Ca Cat Cc Cat ------------------ Output: Dog Ca Cat Cc Cat But I want a non-greedy matching, with the following output : Output: Dog Ca Cat How can this be acheived on Flex ? EDIT Tried the following : %% Dog.*Cat/.*Cat printf("Matched : ||%s||", yytext); dog.*cat printf("Matched : ||%s||", yytext); dOg[^c]*cAt printf("Matched : ||%s||", yytext); DOG.*?CAT printf("Matched : ||%s||", yytext); %% Input : Dog Ca Cat Cc Cat dog Ca cat Cc cat dOg Ca cAt Cc

Emulation of lex like functionality in Perl or Python

落花浮王杯 提交于 2019-12-01 05:09:19
问题 Here's the deal. Is there a way to have strings tokenized in a line based on multiple regexes? One example: I have to get all href tags, their corresponding text and some other text based on a different regex. So I have 3 expressions and would like to tokenize the line and extract tokens of text matching every expression. I have actually done this using flex (not to be confused with Adobe), which is an implementation of the good old lex. lex provides an elegant way to do this by executing

Non-Greedy Regular Expression Matching in Flex

天涯浪子 提交于 2019-12-01 03:44:04
问题 I have just started with Flex and can't seem to figure out how to match the following Expression : "Dog".*"Cat" ------------------ Input : Dog Ca Cat Cc Cat ------------------ Output: Dog Ca Cat Cc Cat But I want a non-greedy matching, with the following output : Output: Dog Ca Cat How can this be acheived on Flex ? EDIT Tried the following : %% Dog.*Cat/.*Cat printf("Matched : ||%s||", yytext); dog.*cat printf("Matched : ||%s||", yytext); dOg[^c]*cAt printf("Matched : ||%s||", yytext); DOG.*

Flex / Lex Encoding Strings with Escaped Characters

眉间皱痕 提交于 2019-12-01 00:06:22
I'll refer to this question for some of the background: Regular expression for a string literal in flex/lex The problem I am having is handling the input with escaped characters in my lexer and I think it may be an issue to do with the encoding of the string, but I'm not sure. Here's is how I am handling string literals in my lexer: \"(\\.|[^\\"])*\" { char* text1 = strndup(yytext + 1, strlen(yytext) - 2); char* text2 = "text\n"; printf("value = <%s> <%x>\n", text1, text1); printf("value = <%s> <%x>\n", text2, text2); } This outputs the following: value = <text\n"> <15a1bb0> value = <text >

Regular expression to recognize variable declarations in C

梦想的初衷 提交于 2019-11-30 16:12:44
问题 I'm working on a regular expression to recognize variable declarations in C and I have got this. [a-zA-Z_][a-zA-Z0-9]* Is there any better solution? 回答1: A pattern to recognize variable declarations in C. Looking at a conventional declaration, we see: int variable; If that's the case, one should test for the type keyword before anything, to avoid matching something else, like a string or a constant defined with the preprocessor (?:\w+\s+)([a-zA-Z_][a-zA-Z0-9]+) variable name resides in \1.

Boost.Spirit: Lex + Qi error reporting

自古美人都是妖i 提交于 2019-11-30 14:53:51
问题 I am writing a parser for quite complicated config files that make use of indentation etc. I decided to use Lex to break input into tokens as it seems to make life easier. The problem is that I cannot find any examples of using Qi error reporting tools ( on_error ) with parsers that operate on stream of tokens instead of characters. Error handler to be used in on_error takes some to be able to indicate exactly where the error is in the input stream. All examples just construct std::string

Generating a compiler from lex and yacc grammar

南楼画角 提交于 2019-11-30 14:01:16
I'm trying to generate a compiler so I can pass him a .c file after. I've downloaded both YACC and LEX grammars from http://www.quut.com/c/ANSI-C-grammar-y.html and named them clexyacc.l and clexyacc.y When generating it on terminal I did : yacc -d clexyacc.y lex clexyacc.l All went fine. When I move on to the last part I get a few errors. The last part is : cc lex.yy.c y.tab.c -oclexyacc.exe But I get these errors : y.tab.c:2261:16: warning: implicit declaration of function 'yylex' is invalid in C99 [-Wimplicit-function-declaration] yychar = YYLEX; ^ y.tab.c:1617:16: note: expanded from macro

Boost.Spirit: Lex + Qi error reporting

為{幸葍}努か 提交于 2019-11-30 12:03:37
I am writing a parser for quite complicated config files that make use of indentation etc. I decided to use Lex to break input into tokens as it seems to make life easier. The problem is that I cannot find any examples of using Qi error reporting tools ( on_error ) with parsers that operate on stream of tokens instead of characters. Error handler to be used in on_error takes some to be able to indicate exactly where the error is in the input stream. All examples just construct std::string from the pair of iterators and print them. But if Lex is used, that iterators are iterators to the

Lex - How to run / compile a lex program on commandline

痞子三分冷 提交于 2019-11-30 09:48:43
I am very new to Lex and Yacc. I have a Lex program. Example: wordcount.l I am using windows and putty. I am just trying to run this file.. Does the wordcount.l file go on the C drive? Do I compile the Lex program and it generates a .c program and then what do I run? I tried on the command-line: Lex wordcount.l but I just get file not found... wordcount.l %{ #include <stdlib.h> #include <stdio.h> int charCount=0; int wordCount=0; int lineCount=0; %} %% \n {charCount++; lineCount++;} [^ \t\n]+ {wordCount++; charCount+=yyleng;} . {charCount++;} %% main(argc, argv) int argc; char** argv; { if