lexer

How would I go about Implementing A Simple Stack-Based Programming Language

痞子三分冷 提交于 2019-12-02 19:44:20
I am interested in extending my knowledge of computer programming by implementing a stack-based programming language. I am seeking out advice on where to begin, as I intend for it to have functions like " pushint 1 " which would push an integer with value 1 on to the top of the stack and flow-control via labels like " L01: jump L01: ". So far I have made a C# implementation of what I want my language to act like (wanted to link to it but IDEOne is blocked), but it is very messy and needs optimization. It translates the input to XML and then parses it. My goals are to go to a lower level

Should I use a lexer when using a parser combinator library like Parsec?

前提是你 提交于 2019-12-02 15:29:06
When writing a parser in a parser combinator library like Haskell's Parsec, you usually have 2 choices: Write a lexer to split your String input into tokens, then perform parsing on [Token] Directly write parser combinators on String The first method often seems to make sense given that many parsing inputs can be understood as tokens separated by whitespace. In other places, I have seen people recommend against tokenizing (or scanning or lexing , how some call it), with simplicity being quoted as the main reason. What are general trade-offs between lexing and not doing it? nh2 The most

Where can I learn the basics of writing a lexer?

独自空忆成欢 提交于 2019-12-02 13:50:42
I want to learn how to write a lexer. My university course had an assignment where we had to write a parser (and a lexer to go along with it) but this was given to us with no instruction or feedback (beyond the mark) so I didn't really learn much from it. After searching for this topic, I can only find fairly advanced write ups which focus on areas which I feel are a few steps ahead of where I am at. I want a discussion on the basics of writing a lexer for a very simple language which I can use as a basis for investigating tokenising more complex languages. At this stage I'm not really

Antlr3: Could not match token in parser rules which is used in lexer rule

老子叫甜甜 提交于 2019-12-02 13:44:43
I have lexer rules in Antlr3 as: HYPHEN : '-'; TOKEN : HYPHEN CHARS; CHARS : 'a' ..'z'; Parser rule is as: exp : CHARS | some complex expression; parser_rule : exp HYPHEN exp; If I try to match 'abc-abc' with parser_rule, It fails. Because lexer creates TOKEN for HYPHEN exp. How can I match it correctly with parser_rule. In ANTLR lexer, the lexer rule that can match the longest sub-sequence of input is used. So your input abc-abc will be tokenized as CHARS("abc") TOKEN("-abc") and therefore will not match the expected CHARS HYPHEN CHARS . You should consider making TOKEN a parser rule instead

adjacency as an operator - can any lexer handle it?

◇◆丶佛笑我妖孽 提交于 2019-12-02 03:46:32
问题 Say a language defines adjacency of two mathematical unicode alphanumerical symbols as an operator. Say, 𝑥𝑦+1 means 𝑥 %adj 𝑦 + 1, where %adj stands for whatever operator adjacency defines, multiplication in this case. I was wondering, can any existing lexical analysis tool handle this? 回答1: Invisible operators cannot be recognized with lexical analysis, for reasons which should be more or less obvious. You can only deduce the presence of an invisible operator by analyzing the syntactic

adjacency as an operator - can any lexer handle it?

倖福魔咒の 提交于 2019-12-02 02:28:36
Say a language defines adjacency of two mathematical unicode alphanumerical symbols as an operator. Say, 𝑥𝑦+1 means 𝑥 %adj 𝑦 + 1, where %adj stands for whatever operator adjacency defines, multiplication in this case. I was wondering, can any existing lexical analysis tool handle this? rici Invisible operators cannot be recognized with lexical analysis, for reasons which should be more or less obvious. You can only deduce the presence of an invisible operator by analyzing the syntactic context, which is the role of a parser. Of course, most lexical analysis tools allow arbitrary code to be

ANTLR4: How to inject tokens

我只是一个虾纸丫 提交于 2019-12-02 00:04:20
I'm trying to implement a preprocessor for a DSL, modeled after the CPP example in code/extras. However, I'm not using token factory. Is one required? Calling emit(token) does not inject the tokens into the tokens stream as expected. Here's the lexer: // string-delimited path SPATH : '"' (~[\n\r])*? '"' { emit(); // inject the current token // launch another lexer on the include file, get tokens, // emit them all at once here List<CommonToken> tokens = Preprocessor.include(getText()); if (null != tokens) { for (CommonToken tok : tokens) { emit(tok); } } } ; Here's the include method:

Output of Lexer

主宰稳场 提交于 2019-12-01 11:16:48
I am currently writing a compiler and I'm in the Lexer phase. I know that the lexer tokenizes the input stream. However, consider the following stream: int foo = 0; should the output of the lexer be: Keyword letter letter letter equals digit semicolon ? And then the parser reduces the letter letter letter to an identifier ? In general, your lexer should produce a stream of structs that contain language elements: operators, identifiers, keywords, comments, etc. These structs should be marked with type of the lexeme, and carry content relevant to the type of lexeme it represents. To enable good

C# Lua Parser / Analyser

此生再无相见时 提交于 2019-12-01 08:46:45
first things first; I am writing a little LUA-Ide in C#. The code execution is done by an Assembly named LuaInterface. The code-editing is done by a Scintilla-Port & the RAD / UI Interface is via the extensible IDesignSurfaceExt Visual Studio (one way code generation). File handling is provided by a little sql-lite-db used as a project-package-file. So all in all i've got everything i need together... The only problem unsolved is the parser / lexer for lua. I do not want to load & execute the code! I just want to parse the String containing the Lua code and get some information about it like

Output of Lexer

五迷三道 提交于 2019-12-01 08:22:57
问题 I am currently writing a compiler and I'm in the Lexer phase. I know that the lexer tokenizes the input stream. However, consider the following stream: int foo = 0; should the output of the lexer be: Keyword letter letter letter equals digit semicolon ? And then the parser reduces the letter letter letter to an identifier ? 回答1: In general, your lexer should produce a stream of structs that contain language elements: operators, identifiers, keywords, comments, etc. These structs should be