lexer

Non-left-recursive PEG grammar for an “expression”

China☆狼群 提交于 2019-12-03 12:00:55
It's either a simple identifier (like cow ) something surrounded by brackets ( (...) ) something that looks like a method call ( ...(...) ) or something that looks like a member access ( thing.member ): def expr = identifier | "(" ~> expr <~ ")" | expr ~ ("(" ~> expr <~ ")") | expr ~ "." ~ identifier It's given in Scala Parser Combinator syntax, but it should be pretty straightforward to understand. It's similar to how expressions end up looking in many programming languages (hence the name expr ) However, as it stands, it is left-recursive and causes my nice PEG parser to explode. I have not

Lexing partial SQL in C#

烈酒焚心 提交于 2019-12-03 11:28:46
I'd need to parse partial SQL queries (it's for a SQL injection auditing tool). For example '1' AND 1=1-- Should break down into tokens like [0] => [SQL_STRING, '1'] [1] => [SQL_AND] [2] => [SQL_INT, 1] [3] => [SQL_AND] [4] => [SQL_INT, 1] [5] => [SQL_COMMENT] [6] => [SQL_QUERY_END] Are their any at least lexers for SQL that I base mine off of or any good tools like bison for C# (though I'd rather not write my own grammar as I need to support most if not all the grammar of MySQL 5) Matt DeKrey Seems that there's a few good parsers out there. This SO article has a sample using MS's Entity

How to write a (shell) lexer by hand

混江龙づ霸主 提交于 2019-12-03 10:07:58
问题 I'm working on a shell, a small bash-like shell, without scripting (if while ...) I have to make the lexer/parser (LL) by hand. So the lexer will transform the command ( char *cmd ) to a linked list ( t_list *list ). And the LL parser will transform the linked list ( t_list *list ) to an AST (binary tree t_btree *root ) with a grammar So, I know how to make the LL parser but I don't know how to tokenize my command. For example: ps | grep ls >> file ; make && ./a.out => 'ps' '|' 'grep' 'ls' '>

How do I implement a lexer given that I have already implemented a basic regular expression matcher?

折月煮酒 提交于 2019-12-03 08:32:54
I'm trying to implement a lexer for fun. I have already implemented a basic regular expression matcher(by first converting a pattern to a NFA and then to a DFA). Now I'm clueless about how to proceed. My lexer would be taking a list of tokens and their corresponding regexs. What is the general algorithm used to create a lexer out of this? I thought about (OR)ing all the regex, but then I can't identify which specific token was matched. Even if I extend my regex module to return the pattern matched when a match is successful, how do I implement lookahead in the matcher? Assuming you have a

How can we get the Syntax Tree of TypeScript?

别等时光非礼了梦想. 提交于 2019-12-03 06:53:30
问题 Is there a process on getting a syntax tree of a compiler. We had been assigned on a project that needs to access typescript's syntax tree (which is opensource so we could see the whole compiler's code). But we don't know how to get it. I've been reading some articles in the Internet but I can't really find a user-friendly article or which is written in lehman's term. I believe some mentioned that the first step we need to do is to find the parsing step. But after that we had no idea what to

Does C# have (direct) flex/yacc port? Or what lexer/parser people use for C#? [closed]

无人久伴 提交于 2019-12-03 02:48:38
问题 As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance. Closed 6 years ago . I might be wrong, but it looks like that there's no direct flex/bison (lex/yacc) port for C#/.NET so far. For LALR parser, I found

Should I use a lexer when using a parser combinator library like Parsec?

前提是你 提交于 2019-12-03 01:52:15
问题 When writing a parser in a parser combinator library like Haskell's Parsec, you usually have 2 choices: Write a lexer to split your String input into tokens, then perform parsing on [Token] Directly write parser combinators on String The first method often seems to make sense given that many parsing inputs can be understood as tokens separated by whitespace. In other places, I have seen people recommend against tokenizing (or scanning or lexing , how some call it), with simplicity being

Different lexer rules in different state

不羁的心 提交于 2019-12-03 00:40:37
I've been working on a parser for some template language embeded in HTML (FreeMarker), piece of example here: ${abc} <html> <head> <title>Welcome!</title> </head> <body> <h1> Welcome ${user}<#if user == "Big Joe">, our beloved leader</#if>! </h1> <p>Our latest product: <a href="${latestProduct}">${latestProduct}</a>! </body> </html> The template language is between some specific tags, e.g. '${' '}', '<#' '>'. Other raw texts in between can be treated like as the same tokens (RAW). The key point here is that the same text, e.g. an integer, will mean differently thing for the parser depends on

How to write a (shell) lexer by hand

霸气de小男生 提交于 2019-12-03 00:39:52
I'm working on a shell, a small bash-like shell, without scripting (if while ...) I have to make the lexer/parser (LL) by hand. So the lexer will transform the command ( char *cmd ) to a linked list ( t_list *list ). And the LL parser will transform the linked list ( t_list *list ) to an AST (binary tree t_btree *root ) with a grammar So, I know how to make the LL parser but I don't know how to tokenize my command. For example: ps | grep ls >> file ; make && ./a.out => 'ps' '|' 'grep' 'ls' '>>' 'file' ';' ''make '&&' './a.out' Thanks. (I don't wanna use any generator) (This explains the idea

How can we get the Syntax Tree of TypeScript?

◇◆丶佛笑我妖孽 提交于 2019-12-02 20:33:14
Is there a process on getting a syntax tree of a compiler. We had been assigned on a project that needs to access typescript's syntax tree (which is opensource so we could see the whole compiler's code). But we don't know how to get it. I've been reading some articles in the Internet but I can't really find a user-friendly article or which is written in lehman's term. I believe some mentioned that the first step we need to do is to find the parsing step. But after that we had no idea what to do next. Sorry for the noob question. :) The TypeScript compiler API is really quite easy to use. To