lexer | 易学教程

RegEx with variable data in it - ply.lex

阅读更多关于 RegEx with variable data in it - ply.lex

im using the python module ply.lex to write a lexer. I got some of my tokens specified with regular expression but now im stuck. I've a list of Keywords who should be a token . data is a list with about 1000 Keywords which should be all recognised as one sort of Keyword. This can be for example: _Function1 _UDFType2 and so on. All words in the list are separated by whitespaces thats it. I just want that lexer to recognise the words in this list, so that it would return a token of type `KEYWORD. data = 'Keyword1 Keyword2 Keyword3 Keyword4' def t_KEYWORD(t): # ... r'\$' + data ?? return t text =

Writing a custom Xtext/ANTLR lexer without a grammar file

阅读更多关于 Writing a custom Xtext/ANTLR lexer without a grammar file

问题 I'm writing an Eclipse/Xtext plugin for CoffeeScript, and I realized I'll probably need to write a lexer for it by hand. CoffeeScript parser also uses a hand-written lexer to handle indentation and other tricks in the grammar. Xtext generates a class that extends org.eclipse.xtext.parser.antlr.Lexer which in turn extends org.antlr.runtime.Lexer . So I suppose I'll have extend it. I can see two ways to do that Override mTokens() . This is done by the generated code, changing the internal state

Is there a working C++ grammar file for ANTLR?

阅读更多关于 Is there a working C++ grammar file for ANTLR?

问题 Are there any existing C++ grammar files for ANTLR? I'm looking to lex, not parse some C++ source code files. I've looked on the ANTLR grammar page and it looks like there is one listed created by Sun Microsystems here. However, it seems to be a generated Parser. Can anyone point me to a C++ ANTLR lexer or grammar file? 回答1: C++ parsers are tough to build. I can't speak with experience about using ANTLR's C++ grammars. Here I discuss what I learned by reading the notes attached to the the one

ANTLR: how to parse a region within matching brackets with a lexer

阅读更多关于 ANTLR: how to parse a region within matching brackets with a lexer

i want to parse something like this in my lexer: ( begin expression ) where expressions are also surrounded by brackets. it isn't important what is in the expression, i just want to have all what's between the (begin and the matching ) as a token. an example would be: (begin (define x (+ 1 2))) so the text of the token should be (define x (+ 1 2))) something like PROGRAM : LPAREN BEGIN .* RPAREN; does (obviously) not work because as soon as he sees a ")", he thinks the rule is over, but i need the matching bracket for this. how can i do that? Bart Kiers Inside lexer rules, you can invoke rules

How to merge two ASTs?

阅读更多关于 How to merge two ASTs?

I'm trying to implement a tool for merging different versions of some source code. Given two versions of the same source code, the idea would be to parse them, generate the respective Abstract Source Trees (AST), and finally merge them into a single output source keeping grammatical consistency - the lexer and parser are those of question ANTLR: How to skip multiline comments . I know there is class ParserRuleReturnScope that helps... but getStop() and getStart() always return null :-( Here is a snippet that illustrates how I modified my perser to get rules printed: parser grammar

What would be a good Delphi lexer/parser for Javascript language file? [closed]

阅读更多关于 What would be a good Delphi lexer/parser for Javascript language file? [closed]

Background I want to be able to parse Javascript source in a Delphi Application. I need to be able to identify variables and functions within the source for the purpose of making changes to the code through later code. I understand that I probably need to use a lexer for this purpose but have not had much luck using the lexer which I found ( Dyaclexx ). Question Is there a suitable freeware or open source delphi parser/lexer which already has token sets for Javascript or could be easily modified for this purpose without too much trouble? If there isn't such a tool already available then what

How Lexer lookahead works with greedy and non-greedy matching in ANTLR3 and ANTLR4?

阅读更多关于 How Lexer lookahead works with greedy and non-greedy matching in ANTLR3 and ANTLR4?

If someone would clear my mind from the confusion behind look-ahead relation to tokenizing involving greery/non-greedy matching i'd be more than glad. Be ware this is a slightly long post because it's following my thought process behind. I'm trying to write antlr3 grammar that allows me to match input such as: "identifierkeyword" I came up with a grammar like so in Antlr 3.4: KEYWORD: 'keyword' ; IDENTIFIER : (options {greedy=false;}: (LOWCHAR|HIGHCHAR))+ ; /** lowercase letters */ fragment LOWCHAR : 'a'..'z'; /** uppercase letters */ fragment HIGHCHAR : 'A'..'Z'; parse: IDENTIFIER KEYWORD EOF

Non-left-recursive PEG grammar for an “expression”

阅读更多关于 Non-left-recursive PEG grammar for an “expression”

问题 It's either a simple identifier (like cow ) something surrounded by brackets ( (...) ) something that looks like a method call ( ...(...) ) or something that looks like a member access ( thing.member ): def expr = identifier | "(" ~> expr <~ ")" | expr ~ ("(" ~> expr <~ ")") | expr ~ "." ~ identifier It's given in Scala Parser Combinator syntax, but it should be pretty straightforward to understand. It's similar to how expressions end up looking in many programming languages (hence the name

Lexing partial SQL in C#

阅读更多关于 Lexing partial SQL in C#

问题 I'd need to parse partial SQL queries (it's for a SQL injection auditing tool). For example '1' AND 1=1-- Should break down into tokens like [0] => [SQL_STRING, '1'] [1] => [SQL_AND] [2] => [SQL_INT, 1] [3] => [SQL_AND] [4] => [SQL_INT, 1] [5] => [SQL_COMMENT] [6] => [SQL_QUERY_END] Are their any at least lexers for SQL that I base mine off of or any good tools like bison for C# (though I'd rather not write my own grammar as I need to support most if not all the grammar of MySQL 5) 回答1: Seems

Writing part of a compiler (written in c++) in Perl

阅读更多关于 Writing part of a compiler (written in c++) in Perl

i am trying to learn more about compilers and programming languages, unfortunately my university doesnt offer a course about compilers and so i have to do myself (thank you internet). At the moment im tryin to understand and to implement a lexer for my language and i need regular expressions. I am used to script perl regex pretty quickly and i thought that i could embed Perl in my C++ lexer . Now the questions are: Will it cause Heavy overhead? Should i try to make peace with BOOST (or any other c++ library good gor regex) ? Thank you for reading this :) No reason you can't, part of being a