lexical-analysis | 易学教程

How can I modify the text of tokens in a CommonTokenStream with ANTLR?

阅读更多关于 How can I modify the text of tokens in a CommonTokenStream with ANTLR?

I'm trying to learn ANTLR and at the same time use it for a current project. I've gotten to the point where I can run the lexer on a chunk of code and output it to a CommonTokenStream. This is working fine, and I've verified that the source text is being broken up into the appropriate tokens. Now, I would like to be able to modify the text of certain tokens in this stream, and display the now modified source code. For example I've tried: import org.antlr.runtime.*; import java.util.*; public class LexerTest { public static final int IDENTIFIER_TYPE = 4; public static void main(String[] args) {

Efficiently match multiple regexes in Python

阅读更多关于 Efficiently match multiple regexes in Python

问题 Lexical analyzers are quite easy to write when you have regexes. Today I wanted to write a simple general analyzer in Python, and came up with: import re import sys class Token(object): """ A simple Token structure. Contains the token type, value and position. """ def __init__(self, type, val, pos): self.type = type self.val = val self.pos = pos def __str__(self): return '%s(%s) at %s' % (self.type, self.val, self.pos) class LexerError(Exception): """ Lexer error exception. pos: Position in

is there a simple compiler for a small language

阅读更多关于 is there a simple compiler for a small language

问题 I am looking for a simple compiler that compiles a simple language, I need it to write a paper about it and to learn how compilers work, I am not looking for a sophisticated thing just a simple language (by simple I mean a small code because for example gcc is toooooo big). any help is appreciated. 回答1: If you want to look at code, I'm very impressed with Eijiro Sumii's MinCaml compiler. It's only 2000 lines long. It compiles a pretty interesting source language. It generates real machine

Python - lexical analysis and tokenization

阅读更多关于 Python - lexical analysis and tokenization

I'm looking to speed along my discovery process here quite a bit, as this is my first venture into the world of lexical analysis. Maybe this is even the wrong path. First, I'll describe my problem: I've got very large properties files (in the order of 1,000 properties), which when distilled, are really just about 15 important properties and the rest can be generated or rarely ever change. So, for example: general { name = myname ip = 127.0.0.1 } component1 { key = value foo = bar } This is the type of format I want to create to tokenize something like: property.${general.name}blah.home

Efficiently match multiple regexes in Python

阅读更多关于 Efficiently match multiple regexes in Python

Lexical analyzers are quite easy to write when you have regexes. Today I wanted to write a simple general analyzer in Python, and came up with: import re import sys class Token(object): """ A simple Token structure. Contains the token type, value and position. """ def __init__(self, type, val, pos): self.type = type self.val = val self.pos = pos def __str__(self): return '%s(%s) at %s' % (self.type, self.val, self.pos) class LexerError(Exception): """ Lexer error exception. pos: Position in the input line where the error occurred. """ def __init__(self, pos): self.pos = pos class Lexer(object)

Haskell Parsec - error messages are less helpful while using custom tokens

阅读更多关于 Haskell Parsec - error messages are less helpful while using custom tokens

I'm working on seperating lexing and parsing stages of a parser. After some tests, I realized error messages are less helpful when I'm using some tokens other than Parsec's Char tokens. Here are some examples of Parsec's error messages while using Char tokens: ghci> P.parseTest (string "asdf" >> spaces >> string "ok") "asdf wrong" parse error at (line 1, column 7): unexpected "w" expecting space or "ok" ghci> P.parseTest (choice [string "ok", string "nop"]) "wrong" parse error at (line 1, column 1): unexpected "w" expecting "ok" or "nop" So, string parser shows what string is expected when

How to define a Regex in StandardTokenParsers to identify path?

阅读更多关于 How to define a Regex in StandardTokenParsers to identify path?

问题 I am writing a parser in which I want to parse arithmetic expressions like: /hdfs://xxx.xx.xx.x:xxxx/path1/file1.jpg+1 I want to parse it change the infix to postfix and do the calculation. I used helps from a part of code in another discussion as well. class InfixToPostfix extends StandardTokenParsers { import lexical._ def regexStringLit(r: Regex): Parser[String] = acceptMatch( "string literal matching regex " + r, { case StringLit(s) if r.unapplySeq(s).isDefined => s }) def pathIdent:

How to write simple parser for if and while statements? [closed]

阅读更多关于 How to write simple parser for if and while statements? [closed]

问题 It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center. Closed 8 years ago . I need to write a simple parser that will convert the tokens to parser tree. I've already wrote LexicalAnalyzer that returns the tokens. Now, I want to write rules for "if and while" statements(for the beginning)

is there a simple compiler for a small language

阅读更多关于 is there a simple compiler for a small language

I am looking for a simple compiler that compiles a simple language, I need it to write a paper about it and to learn how compilers work, I am not looking for a sophisticated thing just a simple language (by simple I mean a small code because for example gcc is toooooo big). any help is appreciated. If you want to look at code, I'm very impressed with Eijiro Sumii's MinCaml compiler. It's only 2000 lines long. It compiles a pretty interesting source language. It generates real machine code, none of this namby-pamby C or LLVM stuff :-) Speed of compiled code is competetive with gcc and the

How to use Finite Automaton to implement a scanner

阅读更多关于 How to use Finite Automaton to implement a scanner

问题 I'm building a simple scanner. Suppose I have the following tokens defined for my language: !, !=, !==, <, <<, { Now I can specify them using regular expressions, so: !=?=? | { | <<? Then I used http://hackingoff.com to build NFA and DFA. Each machine now can determine if the input is in the language of regexp or not. But my program is a sequence of tokens, not one token: !!=!<!==<<!{ My question is how I should use the machines to parse the string into tokens ? I'm interested in the approach