lexer

Good parser generator (think lex/yacc or antlr) for .NET? Build time only? [closed]

六月ゝ 毕业季﹏ 提交于 2019-12-17 23:12:39
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 6 years ago . Is there a good parser generator (think lex/yacc or antlr) for .NET? Any that have a license that would not scare lawyers? Lot’s of LGPL but I am working on embedded components and some organizations are not comfortable with me taking an LGPL dependency. I've heard that Oslo may provide this functionality but I

Antlr v3 error with parser/lexer rules

走远了吗. 提交于 2019-12-17 21:13:45
问题 I am having problems with my Antlr grammar. I'm trying to write a parser rule for 'typedident' which can accept the following inputs: 'int a' or 'char a' The variable name 'a' is from my lexer rule 'IDENT' which is defined as follows: IDENT : (('a'..'z'|'A'..'Z') | '_') (('a'..'z'|'A'..'Z')|('0'..'9')| '_')*; My 'typedident' parser rule is as follows: typedident : (INT|CHAR) IDENT; INT and CHAR having been defined as tokens. The problem I'm having is that when I test 'typedident' the variable

boost-sprit-lex unifying multiple tokens into a single token in lex differentiated by the id

霸气de小男生 提交于 2019-12-17 19:56:28
问题 edit : I have ripped out the lexer as it does not cleanly integrate with Qi and just obfuscates grammars (see answer below). My lexer looks as follows : template <typename Lexer> struct tokens : lex::lexer<Lexer> { tokens() : left_curly("\"{\""), right_curly("\"}\""), left_paren("\"(\""), right_paren("\")\""), colon(":"), scolon(";"), namespace_("(?i:namespace)"), event("(?i:event)"), optional("(?i:optional)"), required("(?i:required)"), repeated("(?i:repeated)"), t_int_4("(?i:int4)"), t_int

Is it a Lexer's Job to Parse Numbers and Strings?

你说的曾经没有我的故事 提交于 2019-12-17 10:52:43
问题 Is it a lexer's job to parse numbers and strings? This may or may not sound dumb, given that fact that I'm asking whether a lexer should parse input. However, I'm not sure whether that's in fact the lexer's job or the parser's job, because in order to lex properly, the lexer needs to parse the string/number in the first place , so it would seem like code would be duplicated if the parser does this. Is it indeed the lexer's job? Or should the lexer simply break up a string like 123.456 into

Matching Lua's “Long bracket” string syntax

安稳与你 提交于 2019-12-14 03:21:27
问题 I'm writing a jFlex lexer for Lua, and I'm having problems designing a regular expression to match one particular part of the language specification: Literal strings can also be defined using a long format enclosed by long brackets. We define an opening long bracket of level n as an opening square bracket followed by n equal signs followed by another opening square bracket. So, an opening long bracket of level 0 is written as [[, an opening long bracket of level 1 is written as [=[, and so on

Jsoup代码解读之五-parser(中)

那年仲夏 提交于 2019-12-12 20:28:10
【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>> 上一篇文章讲到了状态机和词法分析的基本知识,这一节我们来分析Jsoup是如何进行词法分析的。 代码结构 先介绍以下parser包里的主要类: Parser Jsoup parser的入口facade,封装了常用的parse静态方法。可以设置 maxErrors ,用于收集错误记录,默认是0,即不收集。与之相关的类有 ParseError , ParseErrorList 。基于这个功能,我写了一个 PageErrorChecker 来对页面做语法检查,并输出语法错误。 Token 保存单个的词法分析结果。Token是一个抽象类,它的实现有 Doctype , StartTag , EndTag , Comment , Character , EOF 6种,对应6种词法类型。 Tokeniser 保存词法分析过程的状态及结果。比较重要的两个字段是 state 和 emitPending ,前者保存状态,后者保存输出。其次还有 tagPending / doctypePending / commentPending ,保存还没有填充完整的Token。 CharacterReader 对读取字符的逻辑的封装,用于Tokenize时候的字符输入。CharacterReader包含了类似NIO里ByteBuffer的

How to Lex, Parse, and Serialize-to-XML Email Messages using Alex and Happy

回眸只為那壹抹淺笑 提交于 2019-12-12 17:17:59
问题 I am working toward being able to input any email message and output an equivalent XML encoding. I am starting small, with one of the email headers -- the "From Header" Here is an example of a From Header: From: John Doe <john@doe.org> I want it transformed into this XML: <From> <Mailbox> <DisplayName>John Doe</DisplayName> <Address>john@doe.org</Address> </Mailbox> </From> I want to use the lexical analyzer "Alex" (http://www.haskell.org/alex/doc/html/) to break apart (tokenize) the From

Understanding ANTLR4 Tokens

試著忘記壹切 提交于 2019-12-12 03:54:50
问题 I'm pretty new to ANTLR and I'm trying to understand what exactly Token is in ATNLR4. Consider the following pretty nonsensical grammar: grammar Tst; init: A token=('+'|'-') B; A: .+?; B: .+?; ADD: '+'; SUB: '-'; ANTLR4 generates the following TstParser.InitContext for it: public static class InitContext extends ParserRuleContext { public Token token; //<---------------------------- HERE public TerminalNode A() { return getToken(TstParser.A, 0); } public TerminalNode B() { return getToken

How to detect partial unfinished token and join its pieces that are obtained from two consequent portions of input?

可紊 提交于 2019-12-12 03:36:38
问题 I am writing toy terminal, where I use Flex to parse normal text and control sequences that I get from tty. One detail of Cocoa machinery is that it reads from tty by chunks of 1024 bytes so that any token described in my .lex file at any time can become broken into two parts: some bytes of a token are the last bytes of first 1024 chunk and remaining bytes are the very first bytes of next 1024 bytes chunk. So I need to somehow: First of all detect this situation: when a token is split between

Overlapping rules - mismatched input

爱⌒轻易说出口 提交于 2019-12-11 21:00:08
问题 My grammar (as follows (trimmed down from the original)) requires somewhat overlapping rules grammar NOVIANum; statement : (priorityStatement | integerStatement)* ; priorityStatement : T_PRIO TwoDigits ; integerStatement : T_INTEGER Integer ; WS : [ \t\r\n]+ -> skip ; T_PRIO : 'PRIO' ; T_INTEGER : 'INTEGER' ; Integer: OneToNine Digit* | ZERO ; TwoDigits : Digit Digit ; fragment OneToNine : ('1'..'9') ; fragment Digit: ('0'..'9'); ZERO : [0] ; so "Integer" and "TwoDigits" overlap to a certain