lexical-analysis

How can I modify the text of tokens in a CommonTokenStream with ANTLR?

喜你入骨 提交于 2019-12-03 08:08:19
I'm trying to learn ANTLR and at the same time use it for a current project. I've gotten to the point where I can run the lexer on a chunk of code and output it to a CommonTokenStream. This is working fine, and I've verified that the source text is being broken up into the appropriate tokens. Now, I would like to be able to modify the text of certain tokens in this stream, and display the now modified source code. For example I've tried: import org.antlr.runtime.*; import java.util.*; public class LexerTest { public static final int IDENTIFIER_TYPE = 4; public static void main(String[] args) {

Efficiently match multiple regexes in Python

戏子无情 提交于 2019-12-03 05:08:48
问题 Lexical analyzers are quite easy to write when you have regexes. Today I wanted to write a simple general analyzer in Python, and came up with: import re import sys class Token(object): """ A simple Token structure. Contains the token type, value and position. """ def __init__(self, type, val, pos): self.type = type self.val = val self.pos = pos def __str__(self): return '%s(%s) at %s' % (self.type, self.val, self.pos) class LexerError(Exception): """ Lexer error exception. pos: Position in

is there a simple compiler for a small language

被刻印的时光 ゝ 提交于 2019-12-03 01:55:52
问题 I am looking for a simple compiler that compiles a simple language, I need it to write a paper about it and to learn how compilers work, I am not looking for a sophisticated thing just a simple language (by simple I mean a small code because for example gcc is toooooo big). any help is appreciated. 回答1: If you want to look at code, I'm very impressed with Eijiro Sumii's MinCaml compiler. It's only 2000 lines long. It compiles a pretty interesting source language. It generates real machine

Python - lexical analysis and tokenization

人盡茶涼 提交于 2019-12-02 23:37:20
I'm looking to speed along my discovery process here quite a bit, as this is my first venture into the world of lexical analysis. Maybe this is even the wrong path. First, I'll describe my problem: I've got very large properties files (in the order of 1,000 properties), which when distilled, are really just about 15 important properties and the rest can be generated or rarely ever change. So, for example: general { name = myname ip = 127.0.0.1 } component1 { key = value foo = bar } This is the type of format I want to create to tokenize something like: property.${general.name}blah.home

Efficiently match multiple regexes in Python

梦想的初衷 提交于 2019-12-02 19:30:35
Lexical analyzers are quite easy to write when you have regexes. Today I wanted to write a simple general analyzer in Python, and came up with: import re import sys class Token(object): """ A simple Token structure. Contains the token type, value and position. """ def __init__(self, type, val, pos): self.type = type self.val = val self.pos = pos def __str__(self): return '%s(%s) at %s' % (self.type, self.val, self.pos) class LexerError(Exception): """ Lexer error exception. pos: Position in the input line where the error occurred. """ def __init__(self, pos): self.pos = pos class Lexer(object)

Haskell Parsec - error messages are less helpful while using custom tokens

[亡魂溺海] 提交于 2019-12-02 18:37:47
I'm working on seperating lexing and parsing stages of a parser. After some tests, I realized error messages are less helpful when I'm using some tokens other than Parsec's Char tokens. Here are some examples of Parsec's error messages while using Char tokens: ghci> P.parseTest (string "asdf" >> spaces >> string "ok") "asdf wrong" parse error at (line 1, column 7): unexpected "w" expecting space or "ok" ghci> P.parseTest (choice [string "ok", string "nop"]) "wrong" parse error at (line 1, column 1): unexpected "w" expecting "ok" or "nop" So, string parser shows what string is expected when

How to define a Regex in StandardTokenParsers to identify path?

╄→гoц情女王★ 提交于 2019-12-02 16:11:18
问题 I am writing a parser in which I want to parse arithmetic expressions like: /hdfs://xxx.xx.xx.x:xxxx/path1/file1.jpg+1 I want to parse it change the infix to postfix and do the calculation. I used helps from a part of code in another discussion as well. class InfixToPostfix extends StandardTokenParsers { import lexical._ def regexStringLit(r: Regex): Parser[String] = acceptMatch( "string literal matching regex " + r, { case StringLit(s) if r.unapplySeq(s).isDefined => s }) def pathIdent:

How to write simple parser for if and while statements? [closed]

你说的曾经没有我的故事 提交于 2019-12-02 14:22:51
问题 It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center. Closed 8 years ago . I need to write a simple parser that will convert the tokens to parser tree. I've already wrote LexicalAnalyzer that returns the tokens. Now, I want to write rules for "if and while" statements(for the beginning)

is there a simple compiler for a small language

假如想象 提交于 2019-12-02 14:04:38
I am looking for a simple compiler that compiles a simple language, I need it to write a paper about it and to learn how compilers work, I am not looking for a sophisticated thing just a simple language (by simple I mean a small code because for example gcc is toooooo big). any help is appreciated. If you want to look at code, I'm very impressed with Eijiro Sumii's MinCaml compiler. It's only 2000 lines long. It compiles a pretty interesting source language. It generates real machine code, none of this namby-pamby C or LLVM stuff :-) Speed of compiled code is competetive with gcc and the

How to use Finite Automaton to implement a scanner

若如初见. 提交于 2019-12-02 12:18:09
问题 I'm building a simple scanner. Suppose I have the following tokens defined for my language: !, !=, !==, <, <<, { Now I can specify them using regular expressions, so: !=?=? | { | <<? Then I used http://hackingoff.com to build NFA and DFA. Each machine now can determine if the input is in the language of regexp or not. But my program is a sequence of tokens, not one token: !!=!<!==<<!{ My question is how I should use the machines to parse the string into tokens ? I'm interested in the approach