lexer

ANTLR Parser with manual lexer

别等时光非礼了梦想. 提交于 2019-11-30 04:13:07
I'm migrating a C#-based programming language compiler from a manual lexer/parser to Antlr. Antlr has been giving me severe headaches because it usually mostly works, but then there are the small parts that do not and are incredibly painful to solve. I discovered that most of my headaches are caused by the lexer parts of Antlr, rather than the parser. Then I noticed parser grammar X; and realized that perhaps I could have my manually written lexer and then an Antlr generated parser. So I'm looking for more documentation on this topic. I guess a custom ITokenStream could work, but there appears

hand coding a parser

天涯浪子 提交于 2019-11-29 19:45:32
For all you compiler gurus, I wanna write a recursive descent parser and I wanna do it with just code. No generating lexers and parsers from some other grammar and don't tell me to read the dragon book, i'll come around to that eventually. I wanna get into the gritty details about implementing a lexer and parser for a reasonable simple language, say CSS. And I wanna do this right. This will probably end up being a series of questions but right now I'm starting with a lexer. Tokenization rules for CSS can be found here . I find my self writing code like this (hopefully you can infer the rest

In antlr4 lexer, How to have a rule that catches all remaining “words” as Unknown token?

十年热恋 提交于 2019-11-29 18:08:24
问题 I have an antlr4 lexer grammar. It has many rules for words, but I also want it to create an Unknown token for any word that it can not match by other rules. I have something like this: Whitespace : [ \t\n\r]+ -> skip; Punctuation : [.,:;?!]; // Other rules here Unknown : .+? ; Now generated matcher catches '~' as unknown but creates 3 '~' Unknown tokens for input '~~~' instead of a single '~~~' token. What should I do to tell lexer to generate word tokens for unknown consecutive characters.

Generate AST of a PHP source file

强颜欢笑 提交于 2019-11-29 16:48:27
问题 I want to parse a PHP source file, into an AST (preferably as a nested array of instructions). I basically want to convert things like f($a, $b + 1) into something like array( 'function_call', array( array( 'var', '$a' ), array( 'expression', array( array( 'binary_operation', '+', array ('var', '$b'), array( 'int', '1' ) ) ) ) ) ) Are there any inbuilt PHP library or third party libraries (preferably in PHP) that would let me do this? 回答1: I have implemented a PHP Parser after I figured out

Island grammar antlr3

大城市里の小女人 提交于 2019-11-29 15:44:06
What are and how to use the "island grammar" in antlr3? An island grammar is one that treats most of a language as a blob of text ("water") and picks out the part of the langauge of interest to parse using grammar rules ("island"). For instance, you might choose to build an island grammar to pick out all the expressions found in a C# program, and ignore the variable/method/class declarations and the statement syntax (if, while, ...). The real question is, "Should you use island grammars at all?". The positive benefits: you don't have to write a full, complete grammar for the language you want

C++ parser generator [closed]

六眼飞鱼酱① 提交于 2019-11-29 12:03:05
I'm writing my own scripting language and I need a software tool which generates C++ code for parsing my language. I need a lexical analyzer and a parser generator which generates C++ code. It would be nice for me to be able also to generate a Visual C++ 2010 project. Suggestions? Try with Flex and Bison. They are good lexical analizers and parser generator usefull to define new languages. http://en.wikipedia.org/wiki/Flex_lexical_analyser http://en.wikipedia.org/wiki/Comparison_of_parser_generators for C/C++: http://epaperpress.com/lexandyacc/ Or look at: Boost.Spirit: "Spirit is a set of C++

How Get error messages of antlr parsing?

假如想象 提交于 2019-11-29 11:12:11
I wrote a grammar with antlr 4.4 like this : grammar CSV; file : row+ EOF ; row : value (Comma value)* (LineBreak | EOF) ; value : SimpleValueA | QuotedValue ; Comma : ',' ; LineBreak : '\r'? '\n' | '\r' ; SimpleValue : ~(',' | '\r' | '\n' | '"')+ ; QuotedValue : '"' ('""' | ~'"')* '"' ; then I use antlr 4.4 for generating parser & lexer, this process is successful after generate classes I wrote some java code for using grammar import org.antlr.v4.runtime.ANTLRInputStream; import org.antlr.v4.runtime.CommonTokenStream; public class Main { public static void main(String[] args) { String source

ANTLR Parser with manual lexer

江枫思渺然 提交于 2019-11-29 01:02:49
问题 I'm migrating a C#-based programming language compiler from a manual lexer/parser to Antlr. Antlr has been giving me severe headaches because it usually mostly works, but then there are the small parts that do not and are incredibly painful to solve. I discovered that most of my headaches are caused by the lexer parts of Antlr, rather than the parser. Then I noticed parser grammar X; and realized that perhaps I could have my manually written lexer and then an Antlr generated parser. So I'm

Good parser generator (think lex/yacc or antlr) for .NET? Build time only? [closed]

本秂侑毒 提交于 2019-11-28 19:22:58
Is there a good parser generator (think lex/yacc or antlr) for .NET? Any that have a license that would not scare lawyers? Lot’s of LGPL but I am working on embedded components and some organizations are not comfortable with me taking an LGPL dependency. I've heard that Oslo may provide this functionality but I'm not sure if it's a build time dependency or also a runtime dependency. Can anyone clarify what Oslo will provide? UPDATE What I would really like is a parser generator that is a build time only dependency. It looks like ANTLR has a runtime component. I just discovered that F# ships

Island grammar antlr3

孤者浪人 提交于 2019-11-28 10:04:28
问题 What are and how to use the "island grammar" in antlr3? 回答1: An island grammar is one that treats most of a language as a blob of text ("water") and picks out the part of the langauge of interest to parse using grammar rules ("island"). For instance, you might choose to build an island grammar to pick out all the expressions found in a C# program, and ignore the variable/method/class declarations and the statement syntax (if, while, ...). The real question is, "Should you use island grammars