lexical-analysis

How to turn a token stream into a parse tree [closed]

a 夏天 提交于 2019-12-05 11:39:59
Closed. This question is off-topic . It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 3 years ago . I have a lexer built that streams out tokens from in input but I'm not sure how to build the next step in the process - the parse tree. Does anybody have any good resources or examples on how to accomplish this? I would really recommend http://www.antlr.org/ and of course the classic Dragon Compilers book. For an easy language like JavaScript it's not hard to hand roll a recursive descent parser, but it's almost

Ignore whitespace with PEG.js

北城余情 提交于 2019-12-04 15:55:43
问题 I want to ignore whitespaces and new lines with my grammar so they are missing in the PEG.js output. Also, a literal within brackets should be returned in a new array. Grammar start = 'a'? sep+ ('cat'/'dog') sep* '(' sep* stmt_list sep* ')' stmt_list = exp: [a-zA-Z]+ { return new Array(exp.join('')) } sep = [' '\t\r\n] Test case a dog( Harry ) Output [ "a", [ " " ], "dog", [], "(", [ " " ], [ "Harry" ], [ " " ], ")" ] Output I want [ "a", "dog", [ "Harry" ] ] 回答1: You have to break up the

error handling in YACC

微笑、不失礼 提交于 2019-12-04 15:51:13
hi there i'm trying to make a simple parser and using lex and yacc. the thing is i wanna print my own error messages rather than error symbol used by yacc which prints syntax error . for example this is my yacc code; %{ #include <stdio.h> #include <string.h> #include "y.tab.h" extern FILE *yyin; extern int linenum; %} %token INTRSW IDENTIFIER INTEGER ASSIGNOP SEMICOLON DOUBLEVAL DOUBLERSW COMMA %token IF ELSE WHILE FOR %token CLOSE_BRA OPEN_BRA CLOSE_PARA OPEN_PARA EQ LE GE %token SUM MINUS MULTIP DIV %left OPEN_BRA OPEN_PARA %left MULTIP DIV %left SUM MINUS %union { int number; char* string;

How to combine Regexp and keywords in Scala parser combinators

≡放荡痞女 提交于 2019-12-04 13:03:41
I've seen two approaches to building parsers in Scala. The first is to extends from RegexParsers and define your won lexical patterns. The issue I see with this is that I don't really understand how it deals with keyword ambiguities. For example, if my keyword match the same pattern as idents, then it processes the keywords as idents. To counter that, I've seen posts like this one that show how to use the StandardTokenParsers to specify keywords. But then, I don't understand how to specify the regexp patterns! Yes, StandardTokenParsers comes with "ident" but it doesn't come with the other ones

Get character offsets for elements in jsoup

一世执手 提交于 2019-12-04 12:16:11
问题 I need to map jsoup elements back to specific character offsets in the source HTML. In other words, if I have HTML that looks like this: Hello <br/> World I need to know that "Hello " starts at offset 0 and has a length of 6 characters, <br/> starts at offset 6 and has a length of 5 characters, etc.. I could not find a getter in the Element javadoc that returns this information. Can it be retrieved? 回答1: I don't believe Jsoup has this functionality. This question seems closer to lexical

Parsing Python function calls to get argument positions

拟墨画扇 提交于 2019-12-04 07:21:51
I want code that can analyze a function call like this: whatever(foo, baz(), 'puppet', 24+2, meow=3, *meowargs, **meowargs) And return the positions of each and every argument, in this case foo , baz() , 'puppet' , 24+2 , meow=3 , *meowargs , **meowargs . I tried using the _ast module, and it seems to be just the thing for the job, but unfortunately there were problems. For example, in an argument like baz() which is a function call itself, I couldn't find a simple way to get its length. (And even if I found one, I don't want a bunch of special cases for every different kind of argument.) I

How do I write a parser in C or Objective-C without a parser generator?

↘锁芯ラ 提交于 2019-12-04 03:24:28
I am trying to make a calculator in C or Objective-C that accepts a string along the lines of 8/2+4(3*9)^2 and returns the answer 2920. I would prefer not to use a generator like Lex or Yacc, so I want to code it from the ground up. How should I go about doing this? Other than the Dragon book, are there any recommended texts that cover this subject matter? Try this: http://en.wikipedia.org/wiki/Shunting-yard_algorithm Dave DeLong's DDMathParser class may save you a lot of time and trouble. If I remember correctly, you can solve this problem with two stacks, one for the operators, the other for

How to use yylval with strings in yacc

拥有回忆 提交于 2019-12-03 16:07:35
I want to pass the actual string of a token. If I have a token called ID, then I want my yacc file to actually know what ID is called. I thing I have to pass a string using yylval to the yacc file from the flex file. How do I do that? See the Flex manual section on Interfacing with YACC . 15 Interfacing with Yacc One of the main uses of flex is as a companion to the yacc parser-generator. yacc parsers expect to call a routine named yylex() to find the next input token. The routine is supposed to return the type of the next token as well as putting any associated value in the global yylval. To

Writing a Z80 assembler - lexing ASM and building a parse tree using composition?

核能气质少年 提交于 2019-12-03 13:06:02
I'm very new to the concept of writing an assembler and even after reading a great deal of material, I'm still having difficulties wrapping my head around a couple of concepts. What is the process to actually break up a source file into tokens? I believe this process is called lexing, and I've searched high and low for a real code examples that make sense, but I can't find a thing so simple code examples very welcome ;) When parsing, does information ever need to be passed up or down the tree? The reason I ask is as follows, take: LD BC, nn It needs to be turned into the following parse tree

Python - lexical analysis and tokenization

☆樱花仙子☆ 提交于 2019-12-03 10:10:37
问题 I'm looking to speed along my discovery process here quite a bit, as this is my first venture into the world of lexical analysis. Maybe this is even the wrong path. First, I'll describe my problem: I've got very large properties files (in the order of 1,000 properties), which when distilled, are really just about 15 important properties and the rest can be generated or rarely ever change. So, for example: general { name = myname ip = 127.0.0.1 } component1 { key = value foo = bar } This is