lex | 易学教程

re模块

阅读更多关于 re模块

re模块一、正则表达式正则表达式本身是一种小型的、高度专业化的编程语言，它并不是Python的一部分。正则表达式是用于处理字符串的强大工具，拥有自己独特的语法以及一个独立的处理引擎，效率上可能不如str自带的方法，但功能十分强大。得益于这一点，在提供了正则表达式的语言里，正则表达式的语法都是一样的，区别只在于不同的编程语言实现支持的语法数量不同；但不用担心，不被支持的语法通常是不常用的部分。如果已经在其他语言里使用过正则表达式，只需要简单看一看就可以上手了。而在python中，通过内嵌集成re模块，程序员们可以直接调用来实现正则匹配。正则表达式模式被编译成一系列的字节码，然后由用C编写的匹配引擎执行。下图展示了使用正则表达式进行匹配的流程：正则表达式的大致匹配过程是：依次拿出表达式和文本中的字符比较，如果每一个字符都能匹配，则匹配成功；一旦有匹配不成功的字符则匹配失败。如果表达式中有量词或边界，这个过程会稍微有一些不同，但也是很好理解的，看下图中的示例以及自己多使用几次就能明白。下图列出了Python支持的正则表达式元字符和语法： 1.1 数量词的贪婪模式与非贪婪模式正则表达式通常用于在文本中查找匹配的字符串。Python里数量词默认是贪婪的（在少数语言里也可能是默认非贪婪），总是尝试匹配尽可能多的字符；非贪婪的则相反，总是尝试匹配尽可能少的字符。例如：正则表达式"ab

BISON + FLEX grammar - why tokens are being concatenated together

阅读更多关于 BISON + FLEX grammar - why tokens are being concatenated together

问题 I would like to understand why BISON is concatenating two tokens on the following rule stmt: declaration { ... } | assignment { ... } | exp { ... } | ID ';' <-- this rule { ... fprintf(stderr, "\n my id is '%s'", $1); ... if you check the output will get what I mean. I run my parser and I input the characters ab; to the program. According to my bison grammar this should be parsed as an ID followed by a ; . And at some extent it is what happens. However, when I try to use the $1 variable of

How to pass the yytext from the lex file to yacc?

阅读更多关于 How to pass the yytext from the lex file to yacc?

Please i am facing a simple problem.. here is the issue, In my lex file i have something similiar to: char *ptr_String; "name = " { BEGIN sName; } <sName>.+ { ptr_String = (char *)calloc(strlen(yytext)+1, sizeof(char)); strcpy(ptr_String, yytext); yylval.sValue = ptr_String; return NAME; } Now in my Yacc file i have something similar to: stmt_Name: NAME { /*Now here i need to get the matched string of <sName>.+ and measure it's length. */ /*The aim is simply outputing the name to the screen and storing the length in a global variable. } ; Please any suggestions? Thanks so much for all your

Make bison reduce to start symbol only if EOF is found

阅读更多关于 Make bison reduce to start symbol only if EOF is found

问题 I am using Bison with Flex. I have the following rule in my Yacc input file: program : PROGRAM m2 declarations m0 block {cout << "Success\n"} ; The problem is that if I have a program that is partially correct, but then there is some "garbage" before EOF, it will reduce according to the previous rule, report "success" and only then report an error. I want to include EOF at the end of the rule above, but then, Flex would have to return EOF when it read <<EOF>> , and how would Bison know when

Lex/Flex - Scanning for the EOF character

阅读更多关于 Lex/Flex - Scanning for the EOF character

问题 Other people have had the following problem that I am having but I can't find anyone that has reported a solution.. getting Flex to spot the EOF (end of file). I need Flex to find EOF and return a token indicating that it has found it so it can tell Yacc/Bison that it has reached the end of an input source file and can report a successful parse. Note that this question is not the same as this one because this is about Lex/Flex. Any help would be awesome. Thank you. 回答1: Flex has <<EOF>>

How do I remove the following 'implicit declaration of function' warnings?

阅读更多关于 How do I remove the following 'implicit declaration of function' warnings?

问题 How do I compile the lex file with gcc without receiving the following warnings? lex.yy.c: In function `yy_init_buffer': lex.yy.c:1688: warning: implicit declaration of function `fileno' lex.l: In function `storeLexeme': lex.l:134: warning: implicit declaration of function `strdup' These are the libraries I included. %{ #include <stdio.h> #include <stdlib.h> #include <ctype.h> #include <string.h> %} The function yy_init_buffer is not in the file. The following is the function storeLexeme. int

Is there a Sublime Text Syntax for Flex and Bison?

阅读更多关于 Is there a Sublime Text Syntax for Flex and Bison?

I'm looking for a syntax in Sublime Text that highlights my Flex and Bison files (or lex/yacc) in a way that makes them readable... Sublime Text automatically chooses Lisp for Flex files, but that doesn't do the trick all that well. Any suggestions to try another syntax? Or is there a plugin somewhere that's useful (haven't found anything so far)?. I haven't found one built specifically for Sublime, but I've found one for TextMate, which Sublime is compatible with. Therefore, for Flex highlight, all you need to do is git clone the TextMate's syntax files to your Packages folder. Regarding

Flex / Lex Encoding Strings with Escaped Characters

阅读更多关于 Flex / Lex Encoding Strings with Escaped Characters

问题 I'll refer to this question for some of the background: Regular expression for a string literal in flex/lex The problem I am having is handling the input with escaped characters in my lexer and I think it may be an issue to do with the encoding of the string, but I'm not sure. Here's is how I am handling string literals in my lexer: \"(\\.|[^\\"])*\" { char* text1 = strndup(yytext + 1, strlen(yytext) - 2); char* text2 = "text\n"; printf("value = <%s> <%x>\n", text1, text1); printf("value = <

Library to parse ERB files

阅读更多关于 Library to parse ERB files

I am attempting to parse, not evaluate, rails ERB files in a Hpricot/Nokogiri type manner. The files I am attempting to parse contain HTML fragments intermixed with dynamic content generated using ERB (standard rails view files) I am looking for a library that will not only parse the surrounding content, much the way that Hpricot or Nokogiri will but will also treat the ERB symbols, <%, <%= etc, as though they were html/xml tags. Ideally I would get back a DOM like structure where the <%, <%= etc symbols would be included as their own node types. I know that it is possible to hack something

How to make lex/flex recognize tokens not separated by whitespace?

阅读更多关于 How to make lex/flex recognize tokens not separated by whitespace?

I'm taking a course in compiler construction, and my current assignment is to write the lexer for the language we're implementing. I can't figure out how to satisfy the requirement that the lexer must recognize concatenated tokens. That is, tokens not separated by whitespace. E.g.: the string 39if is supposed to be recognized as the number 39 and the keyword if . Simultaneously, the lexer must also exit(1) when it encounters invalid input. A simplified version of the code I have: %{ #include <stdio.h> %} %option main warn debug %% if | then | else printf("keyword: %s\n", yytext); [[:digit:]]+