lexer

Recognize multiple line comments within a single line with ANTLR4

北城余情 提交于 2019-12-10 21:52:04
问题 I want to parse PostScript code with ANTLR4. I finished with the grammar, but one particular language extension (which was introduced by someone else) makes trouble being reconized. A short example: 1: % This is a line comment 2: % The next line just pushes the value 10 onto the stack 3: 10 4: 5: %?description This is the special line-comment in question 6: /procedure { 7: /var1 30 def %This just creates a variable 8: /var2 10 def %?description A description associated with var2 %?default 20

Token recognition error: antlr

强颜欢笑 提交于 2019-12-10 15:25:39
问题 I have an ANTLR 4 grammar: grammar Test; start : NonZeroDigit '.' Digit Digit? EOF ; DOT : '.' ; PLUS : '+' ; MINUS : '-' ; COLON : ':' ; COMMA : ',' ; QUOTE : '\"' ; EQUALS : '=' ; SEMICOLON : ';' ; UNDERLINE : '_' ; BACKSLASH : '\\' ; SINGLEQUOTE : '\'' ; RESULT_TYPE_NONE : 'NONE' ; RESULT_TYPE_RESULT : 'RESULT' ; RESULT_TYPE_RESULT_SET : 'RESULT_SET' ; TYPE_INT : 'Int' ; TYPE_LONG : 'Long' ; TYPE_BOOL : 'Bool' ; TYPE_DATE : 'Date' ; TYPE_DOUBLE : 'Double' ; TYPE_STRING : 'String' ; TYPE

lexers / parsers for (un) structured text documents [closed]

强颜欢笑 提交于 2019-12-10 13:11:10
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 2 years ago . There are lots of parsers and lexers for scripts (i.e. structured computer languages). But I'm looking for one which can break a (almost) non-structured text document into larger sections e.g. chapters, paragraphs, etc. It's relatively easy for a person to identify them: where the Table of Contents,

How do I implement a lexer given that I have already implemented a basic regular expression matcher?

点点圈 提交于 2019-12-09 06:34:54
问题 I'm trying to implement a lexer for fun. I have already implemented a basic regular expression matcher(by first converting a pattern to a NFA and then to a DFA). Now I'm clueless about how to proceed. My lexer would be taking a list of tokens and their corresponding regexs. What is the general algorithm used to create a lexer out of this? I thought about (OR)ing all the regex, but then I can't identify which specific token was matched. Even if I extend my regex module to return the pattern

Using ANTLR Parser and Lexer Separatly

若如初见. 提交于 2019-12-09 03:08:00
问题 I used ANTLR version 4 for creating compiler.First Phase was the Lexer part. I created "CompilerLexer.g4" file and putted lexer rules in it.It works fine. CompilerLexer.g4: lexer grammar CompilerLexer; INT : 'int' ; //1 FLOAT : 'float' ; //2 BEGIN : 'begin' ; //3 END : 'end' ; //4 To : 'to' ; //5 NEXT : 'next' ; //6 REAL : 'real' ; //7 BOOLEAN : 'bool' ; //8 . . . NOTEQUAL : '!=' ; //46 AND : '&&' ; //47 OR : '||' ; //48 POW : '^' ; //49 ID : [a-zA-Z]+ ; //50 WS : ' ' -> channel(HIDDEN) //50

How does ANTLR decide which lexer rule to apply? The longest matching lexer rule wins?

孤街醉人 提交于 2019-12-08 04:32:09
问题 The input content: The grammar: grammar test; p : EOF; Char : [a-z]; fragment Tab : '\t'; fragment Space : ' '; T1 : (Tab|Space)+ ->skip; T2 : '#' T1+ Char+; The matching result is this: [@0,0:6='# abc',<T2>,1:0] <<<<<<<< PLACE 1 [@1,7:6='<EOF>',<EOF>,1:7] line 1:0 extraneous input '# abc' expecting <EOF> Please ignore the error in the last line. I am wondering why the token matched at PLACE 1 is T2 . In the grammar file, the T2 lexer rule goes after the T1 lexer rule. So I expect T1 rule

RegEx with variable data in it - ply.lex

此生再无相见时 提交于 2019-12-07 12:35:34
问题 im using the python module ply.lex to write a lexer. I got some of my tokens specified with regular expression but now im stuck. I've a list of Keywords who should be a token . data is a list with about 1000 Keywords which should be all recognised as one sort of Keyword. This can be for example: _Function1 _UDFType2 and so on. All words in the list are separated by whitespaces thats it. I just want that lexer to recognise the words in this list, so that it would return a token of type

ANTLR: how to parse a region within matching brackets with a lexer

别来无恙 提交于 2019-12-07 09:00:42
问题 i want to parse something like this in my lexer: ( begin expression ) where expressions are also surrounded by brackets. it isn't important what is in the expression, i just want to have all what's between the (begin and the matching ) as a token. an example would be: (begin (define x (+ 1 2))) so the text of the token should be (define x (+ 1 2))) something like PROGRAM : LPAREN BEGIN .* RPAREN; does (obviously) not work because as soon as he sees a ")", he thinks the rule is over, but i

How Lexer lookahead works with greedy and non-greedy matching in ANTLR3 and ANTLR4?

五迷三道 提交于 2019-12-06 12:49:41
问题 If someone would clear my mind from the confusion behind look-ahead relation to tokenizing involving greery/non-greedy matching i'd be more than glad. Be ware this is a slightly long post because it's following my thought process behind. I'm trying to write antlr3 grammar that allows me to match input such as: "identifierkeyword" I came up with a grammar like so in Antlr 3.4: KEYWORD: 'keyword' ; IDENTIFIER : (options {greedy=false;}: (LOWCHAR|HIGHCHAR))+ ; /** lowercase letters */ fragment

How to manually construct an AST?

牧云@^-^@ 提交于 2019-12-05 20:45:59
问题 I'm currently learning about parsing but i'm a bit confused as how to generate an AST. I have written a parser that correctly verifies whether an expressions conforms to a grammar (it is silent when the expression conforms and raises an exception when it is not). Where do i go from here to build an AST? I found plenty of information on building my LL(1) parser, but very little on then going on to build the AST. My current code (written in very simple Ruby, and including a lexer and a parser)