lex

Print tokens properly using Lex and Yacc

倾然丶 夕夏残阳落幕 提交于 2019-12-13 05:32:48
问题 I'm having difficulties printing a sequence of tokens that behaves recursively. To better explain, I will show the sections of the corresponding codes: First, the code on Lex: %{ #include <stdio.h> #include "y.tab.h" installID(){ } %} abreparentese "(" fechaparentese ")" pontoevirgula ";" virgula "," id {letra}(({letra}|{digito})|({letra}|{digito}|{underline}))* digito [0-9] letra [a-z|A-Z] porreal "%real" portexto "%texto" porinteiro "%inteiro" leia "leia" %% {abreparentese} { return

Distinguishing identifiers from common strings

风格不统一 提交于 2019-12-13 04:49:39
问题 I want to write a parser using Bison/Yacc + Lex which can parse statements like: VARIABLE_ID = 'STRING' where: ID [a-zA-Z_][a-zA-Z0-9_]* and: STRING [a-zA-Z0-9_]+ So, var1 = '123abc' is a valid statement while 1var = '123abc' isn't. Therefore, a VARIABLE_ID is a STRING but a STRING not always is a VARIABLE_ID . What I would like to know is if the only way to distinguish between the two is writing a checking procedure at a higher level (i.e. inside Bison code) or if I can work it out in the

getter setter as function in python class giving “no attribute found” error

风格不统一 提交于 2019-12-13 03:59:53
问题 import operator import re from ply import lex, yacc class Lexer(object): tokens = [ 'COMMA', 'TILDE', 'PARAM', 'LP', 'RP', 'FUNC' ] # Regular expression rules for simple tokens t_COMMA = r'\,' t_TILDE = r'\~' t_PARAM = r'[^\s\(\),&:\"\'~]+' def __init__(self, dict_obj): self.dict_obj = dict_obj def t_LP(self, t): r'\(' return t def t_RP(self, t): r'\)' return t def t_FUNC(self, t): # I want to generate token for this FUNC from the keys of model map # For eg: r'key1|key2' r'(?i)FUNC' return t

(F) Lex, how do I match negation?

谁都会走 提交于 2019-12-13 02:06:29
问题 Some language grammars use negations in their rules. For example, in the Dart specification the following rule is used: ~('\'|'"'|'$'|NEWLINE) Which means match anything that is not one of the rules inside the parenthesis. Now, I know in flex I can negate character rules (ex: [^ab] , but some of the rules I want to negate could be more complicated than a single character so I don't think I could use character rules for that. For example I may need to negate the sequence '"""' for multiline

Initiating Short circuit rule in Bison for && and || operations

人盡茶涼 提交于 2019-12-12 12:15:50
问题 I'm programming a simple calculator in Bison & Flex , using C/C++ (The logic is done in Bison , and the C/C++ part is responsible for the data structures , e.g. STL and more) . I have the following problem : In my calculator the dollar sign $ means i++ and ++i (both prefix and postfix) , e.g. : int y = 3; -> $y = 4 -> y$ = 4 When the user hits : int_expression1 && int_expression2 , if int_expression1 is evaluated to 0 (i.e. false) , then I don't wan't bison to evaluate int_expression2 ! For

Flex default rule

随声附和 提交于 2019-12-12 07:23:48
问题 How do I customize the default action for flex. I found something like <*> but when I run it it says "flex scanner jammed"? Also the . rule only adds a rule so it does not work either. What I want is comment "/*"[^"*/"]*"*/" %% {comment} return 1; {default} return 0; <<EOF>> return -1; Is it possible to change the behavior of matching longest to match first? If so I would do something like this default (.|\n)* but because this almost always gives a longer match it will hide the comment rule.

Meaning of “<*>” in lex

只愿长相守 提交于 2019-12-12 05:23:15
问题 I know,we can define some conditions in lex, matching: 1.<DIRECTIVE>{STRING} {printf("Matching the DIRECTIVE state!");} 2.<REFERENCE>{INTEGER} {printf("Matching the REFERNCE state!");} 3.[\n] {printf("Matching the INITIAL state?");} 4.<*>{DOBULE} {printf("Matching all state include INITIAL? Seem not!");} How to use the states in the right way? What is the difference in conditions on line 3 and 4? The whole .l file, cut by me,now it just to realize a reference.When I run it,it can work well

lex parser not displaying hex correctly

℡╲_俬逩灬. 提交于 2019-12-12 03:39:56
问题 I'm trying to identify a hex number from a parsed text file and everything is about 99% accurate however I keep having an issue with this certain instance 0xa98h. whenever it finds this line it will output 0xa98 instead of ignoring it altogether since it is not valid. I've tried so many variations to this code and have yet to find a way to exclude that issue. [-]?[0][x|X][0-9A-F]+ {cout << yytext << " Number" << endl; } 回答1: The pattern for hex numbers does not consider digits 'a' ... 'f'.

How to detect partial unfinished token and join its pieces that are obtained from two consequent portions of input?

可紊 提交于 2019-12-12 03:36:38
问题 I am writing toy terminal, where I use Flex to parse normal text and control sequences that I get from tty. One detail of Cocoa machinery is that it reads from tty by chunks of 1024 bytes so that any token described in my .lex file at any time can become broken into two parts: some bytes of a token are the last bytes of first 1024 chunk and remaining bytes are the very first bytes of next 1024 bytes chunk. So I need to somehow: First of all detect this situation: when a token is split between

My lex pattern doesn't work to match my input file, how to correct it?

烈酒焚心 提交于 2019-12-12 03:36:33
问题 I've got a simple pattern to match: head+content+tail, I've got a lex file like below: $ cat b.l %{ #include<stdio.h> %} %% "12" {printf("head\n");} "34" {printf("tail\n");} .* {printf("content\n");} %% I hope when meeting "12" it will print "head", when meet "34" it will print "tail", any other contiguous string, it will print "content". So I compile and run it: lex b.l && gcc lex.yy.c -ll $ echo '12sdaesre34'|a.out content My expectation is, it will print head content tail But actually it