lexer | 易学教程

Using C++11 regex to capture the contents of a context-free-grammar file

阅读更多关于 Using C++11 regex to capture the contents of a context-free-grammar file

问题 Preface I'm trying to write my own context-free-grammar specification, to associate with the rules of my lexer/parser. It is meant to be similar to that of ANTLR's, where upper-case identifiers classify as a Lexer rule and lower-case identifiers classify as a Parser rule. It is meant to accept any combination of string literals and/or regular expressions for lexer rules, and any combination of lexer/regex rules and/or other parser identifiers for parser rules. Each rule in is the format of

Call methods on native Javascript types without wrapping with ()

阅读更多关于 Call methods on native Javascript types without wrapping with ()

问题 In Javascript, we can call methods on string literals directly without enclosing it within round brackets. But not for other types such as numbers, or functions. It is a syntax error, but is there a reason as to why the Javascript lexer needs these other types to be enclosed in round brackets? For example, if we extend Number, String, and Function with an alert method and try calling this method on the literals, it's a SyntaxError for Number and Function, while it works for a String. function

C# Lua Parser / Analyser

阅读更多关于 C# Lua Parser / Analyser

问题 first things first; I am writing a little LUA-Ide in C#. The code execution is done by an Assembly named LuaInterface. The code-editing is done by a Scintilla-Port & the RAD / UI Interface is via the extensible IDesignSurfaceExt Visual Studio (one way code generation). File handling is provided by a little sql-lite-db used as a project-package-file. So all in all i've got everything i need together... The only problem unsolved is the parser / lexer for lua. I do not want to load & execute the

ANTLR4: TokenStreamRewriter output doesn't have proper format (removes whitespaces)

阅读更多关于 ANTLR4: TokenStreamRewriter output doesn't have proper format (removes whitespaces)

I am using Antlr4 and java7 grammar ( source ) for modifying an input Java Source file. More specifically, I am using the TokenStreamRewriter class to modify some tokens. The following code is a sample that shows how the tokens are modified: public class TestListener extends JavaBaseListener { private TokenStreamRewriter rewriter; rewriter = new TokenStreamRewriter(tokenStream); rewriter.replace(ctx.getStart(), ctx.getStop(), "someText"); } When I print the altered source code, the white spaces and tabs are removed and the new source file's format is like this: importjava.util.ArrayList

Using ANTLR Parser and Lexer Separatly

阅读更多关于 Using ANTLR Parser and Lexer Separatly

I used ANTLR version 4 for creating compiler.First Phase was the Lexer part. I created "CompilerLexer.g4" file and putted lexer rules in it.It works fine. CompilerLexer.g4: lexer grammar CompilerLexer; INT : 'int' ; //1 FLOAT : 'float' ; //2 BEGIN : 'begin' ; //3 END : 'end' ; //4 To : 'to' ; //5 NEXT : 'next' ; //6 REAL : 'real' ; //7 BOOLEAN : 'bool' ; //8 . . . NOTEQUAL : '!=' ; //46 AND : '&&' ; //47 OR : '||' ; //48 POW : '^' ; //49 ID : [a-zA-Z]+ ; //50 WS : ' ' -> channel(HIDDEN) //50 ; Now it is time for phase 2 which is the parser.I created "CompilerParser.g4" file and putted grammars

ANTLR4: TokenStreamRewriter output doesn't have proper format (removes whitespaces)

阅读更多关于 ANTLR4: TokenStreamRewriter output doesn't have proper format (removes whitespaces)

问题 I am using Antlr4 and java7 grammar (source) for modifying an input Java Source file. More specifically, I am using the TokenStreamRewriter class to modify some tokens. The following code is a sample that shows how the tokens are modified: public class TestListener extends JavaBaseListener { private TokenStreamRewriter rewriter; rewriter = new TokenStreamRewriter(tokenStream); rewriter.replace(ctx.getStart(), ctx.getStop(), "someText"); } When I print the altered source code, the white spaces

Oracle text search on multiple tables and joins

阅读更多关于 Oracle text search on multiple tables and joins

I have the following SQL statement. select emp_no,dob,dept_no from v_depts where catsearch (emp_no,'abc',NULL) > 0 or catsearch (dept_no,'abc',NULL) > 0 where v_depts is a view. Now I would like to add one or more tables as join so that I can do text search on columns e.g. employee_details contains employee information and I can join with emp_no I have created index on employee_details table for emp_name column, however I am not able to join with v_depts to search because I modify my sql statement as select a.emp_no,a.dob,a.dept_no from v_depts a left outer join employee_details b on (a.emp_no

In antlr4 lexer, How to have a rule that catches all remaining “words” as Unknown token?

阅读更多关于 In antlr4 lexer, How to have a rule that catches all remaining “words” as Unknown token?

I have an antlr4 lexer grammar. It has many rules for words, but I also want it to create an Unknown token for any word that it can not match by other rules. I have something like this: Whitespace : [ \t\n\r]+ -> skip; Punctuation : [.,:;?!]; // Other rules here Unknown : .+? ; Now generated matcher catches '~' as unknown but creates 3 '~' Unknown tokens for input '~~~' instead of a single '~~~' token. What should I do to tell lexer to generate word tokens for unknown consecutive characters. I also tried "Unknown: . ;" and "Unknown : .+ ;" with no results. EDIT: In current antlr versions .+?

Generate AST of a PHP source file

阅读更多关于 Generate AST of a PHP source file

I want to parse a PHP source file, into an AST (preferably as a nested array of instructions). I basically want to convert things like f($a, $b + 1) into something like array( 'function_call', array( array( 'var', '$a' ), array( 'expression', array( array( 'binary_operation', '+', array ('var', '$b'), array( 'int', '1' ) ) ) ) ) ) Are there any inbuilt PHP library or third party libraries (preferably in PHP) that would let me do this? NikiC I have implemented a PHP Parser after I figured out that there was no existing parser . It parses the PHP code into a node tree. HipHop You can use

hand coding a parser

阅读更多关于 hand coding a parser

问题 For all you compiler gurus, I wanna write a recursive descent parser and I wanna do it with just code. No generating lexers and parsers from some other grammar and don't tell me to read the dragon book, i'll come around to that eventually. I wanna get into the gritty details about implementing a lexer and parser for a reasonable simple language, say CSS. And I wanna do this right. This will probably end up being a series of questions but right now I'm starting with a lexer. Tokenization rules