lexer

Looking for a clear definition of what a “tokenizer”, “parser” and “lexers” are and how they are related to each other and used?

强颜欢笑 提交于 2019-11-27 05:50:39
I am looking for a clear definition of what a "tokenizer", "parser" and "lexer" are and how they are related to each other (e.g., does a parser use a tokenizer or vice versa)? I need to create a program will go through c/h source files to extract data declaration and definitions. I have been looking for examples and can find some info, but I really struggling to grasp the underlying concepts like grammar rules, parse trees and abstract syntax tree and how they interrelate to each other. Eventually these concepts need to be stored in an actual program, but 1) what do they look like, 2) are

How do I get an Antlr Parser rule to read from both default AND hidden channel

只谈情不闲聊 提交于 2019-11-26 23:17:22
问题 I use the normal whitespace separation into the hidden channel but I have one rule where I would like to include any whitespace for later processing but any example I have found requires some very strange manual coding. Is there no easy option to read from multiple channels like the option to put the whitespace there from the beginning. Ex. this is the WhiteSpace lexer rule WS : ( ' ' | '\t' | '\r' | '\n' ) {$channel=HIDDEN;} ; And this is my rule where I would like to include whitespace raw

ANTLR4 visitor pattern on simple arithmetic example

左心房为你撑大大i 提交于 2019-11-26 22:53:47
问题 I am a complete ANTLR4 newbie, so please forgive my ignorance. I ran into this presentation where a very simple arithmetic expression grammar is defined. It looks like: grammar Expressions; start : expr ; expr : left=expr op=('*'|'/') right=expr #opExpr | left=expr op=('+'|'-') right=expr #opExpr | atom=INT #atomExpr ; INT : ('0'..'9')+ ; WS : [ \t\r\n]+ -> skip ; Which is great because it will generate a very simple binary tree that can be traversed using the visitor pattern as explained in

ANTLR What is simpliest way to realize python like indent-depending grammar?

不羁岁月 提交于 2019-11-26 17:39:14
问题 I am trying realize python like indent-depending grammar. Source example: ABC QWE CDE EFG EFG CDE ABC QWE ZXC As i see, what i need is to realize two tokens INDENT and DEDENT, so i could write something like: grammar mygrammar; text: (ID | block)+; block: INDENT (ID|block)+ DEDENT; INDENT: ????; DEDENT: ????; Is there any simple way to realize this using ANTLR? (I'd prefer, if it's possible, to use standard ANTLR lexer.) 回答1: I don't know what the easiest way to handle it is, but the

Looking for a clear definition of what a “tokenizer”, “parser” and “lexers” are and how they are related to each other and used?

ぃ、小莉子 提交于 2019-11-26 17:29:29
问题 I am looking for a clear definition of what a "tokenizer", "parser" and "lexer" are and how they are related to each other (e.g., does a parser use a tokenizer or vice versa)? I need to create a program will go through c/h source files to extract data declaration and definitions. I have been looking for examples and can find some info, but I really struggling to grasp the underlying concepts like grammar rules, parse trees and abstract syntax tree and how they interrelate to each other.

Lexer written in Javascript?

空扰寡人 提交于 2019-11-26 12:07:22
问题 I have a project where a user needs to define a set of instructions for a ui that is completely written in javascript. I need to have the ability to parse a string of instructions and then translate them into instructions. Is there any libraries out there for parsing that are 100% javascript? Or a generator that will generate in javascript? Thanks! 回答1: Something like http://jscc.phorward-software.com/, maybe? JS/CC is the first available parser development system for JavaScript and

How does the ANTLR lexer disambiguate its rules (or why does my parser produce “mismatched input” errors)?

旧巷老猫 提交于 2019-11-26 06:48:05
问题 Note: This is a self-answered question that aims to provide a reference about one of the most common mistakes made by ANTLR users. When I test this very simple grammar: grammar KeyValues; keyValueList: keyValue*; keyValue: key=IDENTIFIER \'=\' value=INTEGER \';\'; IDENTIFIER: [A-Za-z0-9]+; INTEGER: [0-9]+; WS: [ \\t\\r\\n]+ -> skip; With the following input: foo = 42; I end up with the following run-time error: line 1:6 mismatched input \'42\' expecting INTEGER line 1:8 mismatched input \';\'

Poor man's “lexer” for C#

早过忘川 提交于 2019-11-26 06:17:19
问题 I\'m trying to write a very simple parser in C#. I need a lexer -- something that lets me associate regular expressions with tokens, so it reads in regexs and gives me back symbols. It seems like I ought to be able to use Regex to do the actual heavy lifting, but I can\'t see an easy way to do it. For one thing, Regex only seems to work on strings, not streams (why is that!?!?). Basically, I want an implementation of the following interface: interface ILexer : IDisposable { /// <summary> ///

lexers vs parsers

扶醉桌前 提交于 2019-11-26 02:39:07
问题 Are lexers and parsers really that different in theory? It seems fashionable to hate regular expressions: coding horror, another blog post. However, popular lexing based tools: pygments, geshi, or prettify, all use regular expressions. They seem to lex anything... When is lexing enough, when do you need EBNF? Has anyone used the tokens produced by these lexers with bison or antlr parser generators? 回答1: What parsers and lexers have in common: They read symbols of some alphabet from their