lexical-analysis

How to write simple parser for if and while statements? [closed]

折月煮酒 提交于 2019-12-02 11:17:30
I need to write a simple parser that will convert the tokens to parser tree. I've already wrote LexicalAnalyzer that returns the tokens. Now, I want to write rules for "if and while" statements(for the beginning), so I could pass this rules to parser and it will create a tree. So i need to write the parser in the way, so I could write new rules. Can you advise me how I can implement it in C#? Can you give me some example? In a recursive descent parser it's easy to implement these statements if you have the normal block and expression parsers. In pseudo-code, they are basically: void ParseIf()

How to define a Regex in StandardTokenParsers to identify path?

允我心安 提交于 2019-12-02 08:37:06
I am writing a parser in which I want to parse arithmetic expressions like: /hdfs://xxx.xx.xx.x:xxxx/path1/file1.jpg+1 I want to parse it change the infix to postfix and do the calculation. I used helps from a part of code in another discussion as well. class InfixToPostfix extends StandardTokenParsers { import lexical._ def regexStringLit(r: Regex): Parser[String] = acceptMatch( "string literal matching regex " + r, { case StringLit(s) if r.unapplySeq(s).isDefined => s }) def pathIdent: Parser[String] =regexStringLit("/hdfs://([\d\.]+):(\d+)/([\w/]+/(\w+\.\w+))".r) lexical.delimiters ++= List

How to use Finite Automaton to implement a scanner

时光怂恿深爱的人放手 提交于 2019-12-02 05:39:17
I'm building a simple scanner. Suppose I have the following tokens defined for my language: !, !=, !==, <, <<, { Now I can specify them using regular expressions, so: !=?=? | { | <<? Then I used http://hackingoff.com to build NFA and DFA. Each machine now can determine if the input is in the language of regexp or not. But my program is a sequence of tokens, not one token: !!=!<!==<<!{ My question is how I should use the machines to parse the string into tokens ? I'm interested in the approach rather then implementation. The most common rule is "maximal munch", which always selects the longest

How to use backslash escape char for new line in JavaCC?

邮差的信 提交于 2019-12-01 23:30:55
I have an assignment to create a lexical analyser and I've got everything working except for one bit. I need to create a string that will accept a new line, and the string is delimited by double quotes. The string accepts any number, letter, some specified punctuation, backslashes and double quotes within the delimiters. I can't seem to figure out how to escape a new line character. Is there a certain way of escaping characters like new line and tab? Here's some of my code that might help < STRING : ( < QUOTE> (< QUOTE > | < BACKSLASH > | < ID > | < NUM > | " " )* <QUOTE>) > < #QUOTE : "\"" >

lex & yacc get current position

断了今生、忘了曾经 提交于 2019-12-01 12:59:17
In lex & yacc there is a macro called YY_INPUT which can be redefined, for example in a such way #define YY_INPUT(buf,result,maxlen) do { \ const int n = gzread(gz_yyin, buf, maxlen); \ if (n < 0) { \ int errNumber = 0; \ reportError( gzerror(gz_yyin, &errNumber)); } \ \ result = n > 0 ? n : YY_NULL; \ } while (0) I have some grammar rule which called YYACCEPT macro. If after YYACCEPT I called gztell (or ftell), then I got a wrong number, because parser already read some unnecessary data. So how I can get current position if I have some rule which called YYACCEPT in it(one bad solution will be

lex & yacc get current position

末鹿安然 提交于 2019-12-01 09:55:54
问题 In lex & yacc there is a macro called YY_INPUT which can be redefined, for example in a such way #define YY_INPUT(buf,result,maxlen) do { \ const int n = gzread(gz_yyin, buf, maxlen); \ if (n < 0) { \ int errNumber = 0; \ reportError( gzerror(gz_yyin, &errNumber)); } \ \ result = n > 0 ? n : YY_NULL; \ } while (0) I have some grammar rule which called YYACCEPT macro. If after YYACCEPT I called gztell (or ftell), then I got a wrong number, because parser already read some unnecessary data. So

In function ‘yylex’: 'Variable’ undeclared

只愿长相守 提交于 2019-12-01 06:19:21
问题 I am working with Lexical Analysis . For this I am using Flex and I fetch following Problems. work.l int cnt = 0,num_lines=0,num_chars=0; // Problem here. %% [" "]+[a-zA-Z0-9]+ {++cnt;} \n {++num_lines; ++num_chars;} . {++num_chars;} %% int yywrap() { return 1; } int main() { yyin = freopen("in.txt", "r", stdin); yylex(); printf("%d %d %d\n", cnt, num_lines,num_chars); return 0; } then, I use following command and it work properly and create lex.yy.c . Rezwans-iMac:laqb-2 rezwan$ flex work.l

Writing re-entrant lexer with Flex

為{幸葍}努か 提交于 2019-12-01 05:11:46
I'm newbie to flex. I'm trying to write a simple re-entrant lexer/scanner with flex. The lexer definition goes below. I get stuck with compilation errors as shown below (yyg issue): reentrant.l: /* Definitions */ digit [0-9] letter [a-zA-Z] alphanum [a-zA-Z0-9] identifier [a-zA-Z_][a-zA-Z0-9_]+ integer [0-9]+ natural [0-9]*[1-9][0-9]* decimal ([0-9]+\.|\.[0-9]+|[0-9]+\.[0-9]+) %{ #include <stdio.h> #define ECHO fwrite(yytext, yyleng, 1, yyout) int totalNums = 0; %} %option reentrant %option prefix="simpleit_" %% ^(.*)\r?\n printf("%d\t%s", yylineno++, yytext); %% /* Routines */ int yywrap

Profiler/Analyzer for Erlang?

℡╲_俬逩灬. 提交于 2019-12-01 02:37:30
Are there any good code profilers/analyzers for Erlang? I need something that can build a Call graph for my code. For static code analysis you have XREF and DIALYZER , for profiling you can use cprof, fprof or eprof, you can get good reference here ... The 'fprof' module includes profiling features. From the fprof module documentation : fprof:apply(foo, create_file_slow, [junk, 1024]). fprof:profile(). fprof:analyse(). fprof:apply (or trace ) runs the function, profile converts the trace file into something useful, and analyse prints out the summary. This will give you a list of function calls

How to make a flex (lexical scanner) to read UTF-8 characters input?

自作多情 提交于 2019-11-30 23:57:11
It seems that flex doesn't support UTF-8 input. Whenever the scanner encounter a non-ASCII char, it stops scanning as if it was an EOF. Is there a way to force flex to eat my UTF-8 chars? I don't want it to actually match UTF-8 chars, just eat them when using the '.' pattern. Any suggestion? EDIT The most simple solution would be: ANY [\x00-\xff] and use 'ANY' instead of '.' in my rules. I have been looking into this myself and reading the Flex mailing list to see if anyone thought about it. To get Flex to read unicode is a complex affair ... UTF-8 encoding can be done, and most other