lexer

Using C++11 regex to capture the contents of a context-free-grammar file

痴心易碎 提交于 2019-12-01 07:34:05
问题 Preface I'm trying to write my own context-free-grammar specification, to associate with the rules of my lexer/parser. It is meant to be similar to that of ANTLR's, where upper-case identifiers classify as a Lexer rule and lower-case identifiers classify as a Parser rule. It is meant to accept any combination of string literals and/or regular expressions for lexer rules, and any combination of lexer/regex rules and/or other parser identifiers for parser rules. Each rule in is the format of

Call methods on native Javascript types without wrapping with ()

让人想犯罪 __ 提交于 2019-12-01 06:53:55
问题 In Javascript, we can call methods on string literals directly without enclosing it within round brackets. But not for other types such as numbers, or functions. It is a syntax error, but is there a reason as to why the Javascript lexer needs these other types to be enclosed in round brackets? For example, if we extend Number, String, and Function with an alert method and try calling this method on the literals, it's a SyntaxError for Number and Function, while it works for a String. function

C# Lua Parser / Analyser

守給你的承諾、 提交于 2019-12-01 06:47:36
问题 first things first; I am writing a little LUA-Ide in C#. The code execution is done by an Assembly named LuaInterface. The code-editing is done by a Scintilla-Port & the RAD / UI Interface is via the extensible IDesignSurfaceExt Visual Studio (one way code generation). File handling is provided by a little sql-lite-db used as a project-package-file. So all in all i've got everything i need together... The only problem unsolved is the parser / lexer for lua. I do not want to load & execute the

ANTLR4: TokenStreamRewriter output doesn't have proper format (removes whitespaces)

拈花ヽ惹草 提交于 2019-12-01 05:59:37
I am using Antlr4 and java7 grammar ( source ) for modifying an input Java Source file. More specifically, I am using the TokenStreamRewriter class to modify some tokens. The following code is a sample that shows how the tokens are modified: public class TestListener extends JavaBaseListener { private TokenStreamRewriter rewriter; rewriter = new TokenStreamRewriter(tokenStream); rewriter.replace(ctx.getStart(), ctx.getStop(), "someText"); } When I print the altered source code, the white spaces and tabs are removed and the new source file's format is like this: importjava.util.ArrayList

Using ANTLR Parser and Lexer Separatly

孤街醉人 提交于 2019-12-01 04:12:51
I used ANTLR version 4 for creating compiler.First Phase was the Lexer part. I created "CompilerLexer.g4" file and putted lexer rules in it.It works fine. CompilerLexer.g4: lexer grammar CompilerLexer; INT : 'int' ; //1 FLOAT : 'float' ; //2 BEGIN : 'begin' ; //3 END : 'end' ; //4 To : 'to' ; //5 NEXT : 'next' ; //6 REAL : 'real' ; //7 BOOLEAN : 'bool' ; //8 . . . NOTEQUAL : '!=' ; //46 AND : '&&' ; //47 OR : '||' ; //48 POW : '^' ; //49 ID : [a-zA-Z]+ ; //50 WS : ' ' -> channel(HIDDEN) //50 ; Now it is time for phase 2 which is the parser.I created "CompilerParser.g4" file and putted grammars

ANTLR4: TokenStreamRewriter output doesn't have proper format (removes whitespaces)

拟墨画扇 提交于 2019-12-01 02:25:16
问题 I am using Antlr4 and java7 grammar (source) for modifying an input Java Source file. More specifically, I am using the TokenStreamRewriter class to modify some tokens. The following code is a sample that shows how the tokens are modified: public class TestListener extends JavaBaseListener { private TokenStreamRewriter rewriter; rewriter = new TokenStreamRewriter(tokenStream); rewriter.replace(ctx.getStart(), ctx.getStop(), "someText"); } When I print the altered source code, the white spaces

Oracle text search on multiple tables and joins

杀马特。学长 韩版系。学妹 提交于 2019-11-30 16:14:24
I have the following SQL statement. select emp_no,dob,dept_no from v_depts where catsearch (emp_no,'abc',NULL) > 0 or catsearch (dept_no,'abc',NULL) > 0 where v_depts is a view. Now I would like to add one or more tables as join so that I can do text search on columns e.g. employee_details contains employee information and I can join with emp_no I have created index on employee_details table for emp_name column, however I am not able to join with v_depts to search because I modify my sql statement as select a.emp_no,a.dob,a.dept_no from v_depts a left outer join employee_details b on (a.emp_no

In antlr4 lexer, How to have a rule that catches all remaining “words” as Unknown token?

我的梦境 提交于 2019-11-30 12:47:25
I have an antlr4 lexer grammar. It has many rules for words, but I also want it to create an Unknown token for any word that it can not match by other rules. I have something like this: Whitespace : [ \t\n\r]+ -> skip; Punctuation : [.,:;?!]; // Other rules here Unknown : .+? ; Now generated matcher catches '~' as unknown but creates 3 '~' Unknown tokens for input '~~~' instead of a single '~~~' token. What should I do to tell lexer to generate word tokens for unknown consecutive characters. I also tried "Unknown: . ;" and "Unknown : .+ ;" with no results. EDIT: In current antlr versions .+?

Generate AST of a PHP source file

倾然丶 夕夏残阳落幕 提交于 2019-11-30 11:26:52
I want to parse a PHP source file, into an AST (preferably as a nested array of instructions). I basically want to convert things like f($a, $b + 1) into something like array( 'function_call', array( array( 'var', '$a' ), array( 'expression', array( array( 'binary_operation', '+', array ('var', '$b'), array( 'int', '1' ) ) ) ) ) ) Are there any inbuilt PHP library or third party libraries (preferably in PHP) that would let me do this? NikiC I have implemented a PHP Parser after I figured out that there was no existing parser . It parses the PHP code into a node tree. HipHop You can use

hand coding a parser

拈花ヽ惹草 提交于 2019-11-30 10:17:39
问题 For all you compiler gurus, I wanna write a recursive descent parser and I wanna do it with just code. No generating lexers and parsers from some other grammar and don't tell me to read the dragon book, i'll come around to that eventually. I wanna get into the gritty details about implementing a lexer and parser for a reasonable simple language, say CSS. And I wanna do this right. This will probably end up being a series of questions but right now I'm starting with a lexer. Tokenization rules