lexer

RegEx with variable data in it - ply.lex

允我心安 提交于 2019-12-05 19:12:39
im using the python module ply.lex to write a lexer. I got some of my tokens specified with regular expression but now im stuck. I've a list of Keywords who should be a token . data is a list with about 1000 Keywords which should be all recognised as one sort of Keyword. This can be for example: _Function1 _UDFType2 and so on. All words in the list are separated by whitespaces thats it. I just want that lexer to recognise the words in this list, so that it would return a token of type `KEYWORD. data = 'Keyword1 Keyword2 Keyword3 Keyword4' def t_KEYWORD(t): # ... r'\$' + data ?? return t text =

Writing a custom Xtext/ANTLR lexer without a grammar file

那年仲夏 提交于 2019-12-05 18:34:42
问题 I'm writing an Eclipse/Xtext plugin for CoffeeScript, and I realized I'll probably need to write a lexer for it by hand. CoffeeScript parser also uses a hand-written lexer to handle indentation and other tricks in the grammar. Xtext generates a class that extends org.eclipse.xtext.parser.antlr.Lexer which in turn extends org.antlr.runtime.Lexer . So I suppose I'll have extend it. I can see two ways to do that Override mTokens() . This is done by the generated code, changing the internal state

Is there a working C++ grammar file for ANTLR?

自作多情 提交于 2019-12-05 17:22:28
问题 Are there any existing C++ grammar files for ANTLR? I'm looking to lex, not parse some C++ source code files. I've looked on the ANTLR grammar page and it looks like there is one listed created by Sun Microsystems here. However, it seems to be a generated Parser. Can anyone point me to a C++ ANTLR lexer or grammar file? 回答1: C++ parsers are tough to build. I can't speak with experience about using ANTLR's C++ grammars. Here I discuss what I learned by reading the notes attached to the the one

ANTLR: how to parse a region within matching brackets with a lexer

喜夏-厌秋 提交于 2019-12-05 12:20:49
i want to parse something like this in my lexer: ( begin expression ) where expressions are also surrounded by brackets. it isn't important what is in the expression, i just want to have all what's between the (begin and the matching ) as a token. an example would be: (begin (define x (+ 1 2))) so the text of the token should be (define x (+ 1 2))) something like PROGRAM : LPAREN BEGIN .* RPAREN; does (obviously) not work because as soon as he sees a ")", he thinks the rule is over, but i need the matching bracket for this. how can i do that? Bart Kiers Inside lexer rules, you can invoke rules

How to merge two ASTs?

一曲冷凌霜 提交于 2019-12-05 11:19:34
I'm trying to implement a tool for merging different versions of some source code. Given two versions of the same source code, the idea would be to parse them, generate the respective Abstract Source Trees (AST), and finally merge them into a single output source keeping grammatical consistency - the lexer and parser are those of question ANTLR: How to skip multiline comments . I know there is class ParserRuleReturnScope that helps... but getStop() and getStart() always return null :-( Here is a snippet that illustrates how I modified my perser to get rules printed: parser grammar

What would be a good Delphi lexer/parser for Javascript language file? [closed]

两盒软妹~` 提交于 2019-12-05 11:03:09
Background I want to be able to parse Javascript source in a Delphi Application. I need to be able to identify variables and functions within the source for the purpose of making changes to the code through later code. I understand that I probably need to use a lexer for this purpose but have not had much luck using the lexer which I found ( Dyaclexx ). Question Is there a suitable freeware or open source delphi parser/lexer which already has token sets for Javascript or could be easily modified for this purpose without too much trouble? If there isn't such a tool already available then what

How Lexer lookahead works with greedy and non-greedy matching in ANTLR3 and ANTLR4?

北城以北 提交于 2019-12-04 21:35:01
If someone would clear my mind from the confusion behind look-ahead relation to tokenizing involving greery/non-greedy matching i'd be more than glad. Be ware this is a slightly long post because it's following my thought process behind. I'm trying to write antlr3 grammar that allows me to match input such as: "identifierkeyword" I came up with a grammar like so in Antlr 3.4: KEYWORD: 'keyword' ; IDENTIFIER : (options {greedy=false;}: (LOWCHAR|HIGHCHAR))+ ; /** lowercase letters */ fragment LOWCHAR : 'a'..'z'; /** uppercase letters */ fragment HIGHCHAR : 'A'..'Z'; parse: IDENTIFIER KEYWORD EOF

Non-left-recursive PEG grammar for an “expression”

强颜欢笑 提交于 2019-12-04 18:37:37
问题 It's either a simple identifier (like cow ) something surrounded by brackets ( (...) ) something that looks like a method call ( ...(...) ) or something that looks like a member access ( thing.member ): def expr = identifier | "(" ~> expr <~ ")" | expr ~ ("(" ~> expr <~ ")") | expr ~ "." ~ identifier It's given in Scala Parser Combinator syntax, but it should be pretty straightforward to understand. It's similar to how expressions end up looking in many programming languages (hence the name

Lexing partial SQL in C#

老子叫甜甜 提交于 2019-12-04 17:49:28
问题 I'd need to parse partial SQL queries (it's for a SQL injection auditing tool). For example '1' AND 1=1-- Should break down into tokens like [0] => [SQL_STRING, '1'] [1] => [SQL_AND] [2] => [SQL_INT, 1] [3] => [SQL_AND] [4] => [SQL_INT, 1] [5] => [SQL_COMMENT] [6] => [SQL_QUERY_END] Are their any at least lexers for SQL that I base mine off of or any good tools like bison for C# (though I'd rather not write my own grammar as I need to support most if not all the grammar of MySQL 5) 回答1: Seems

Writing part of a compiler (written in c++) in Perl

。_饼干妹妹 提交于 2019-12-04 14:57:56
i am trying to learn more about compilers and programming languages, unfortunately my university doesnt offer a course about compilers and so i have to do myself (thank you internet). At the moment im tryin to understand and to implement a lexer for my language and i need regular expressions. I am used to script perl regex pretty quickly and i thought that i could embed Perl in my C++ lexer . Now the questions are: Will it cause Heavy overhead? Should i try to make peace with BOOST (or any other c++ library good gor regex) ? Thank you for reading this :) No reason you can't, part of being a