lexer | 易学教程

Recognize multiple line comments within a single line with ANTLR4

阅读更多关于 Recognize multiple line comments within a single line with ANTLR4

问题 I want to parse PostScript code with ANTLR4. I finished with the grammar, but one particular language extension (which was introduced by someone else) makes trouble being reconized. A short example: 1: % This is a line comment 2: % The next line just pushes the value 10 onto the stack 3: 10 4: 5: %?description This is the special line-comment in question 6: /procedure { 7: /var1 30 def %This just creates a variable 8: /var2 10 def %?description A description associated with var2 %?default 20

Token recognition error: antlr

阅读更多关于 Token recognition error: antlr

问题 I have an ANTLR 4 grammar: grammar Test; start : NonZeroDigit '.' Digit Digit? EOF ; DOT : '.' ; PLUS : '+' ; MINUS : '-' ; COLON : ':' ; COMMA : ',' ; QUOTE : '\"' ; EQUALS : '=' ; SEMICOLON : ';' ; UNDERLINE : '_' ; BACKSLASH : '\\' ; SINGLEQUOTE : '\'' ; RESULT_TYPE_NONE : 'NONE' ; RESULT_TYPE_RESULT : 'RESULT' ; RESULT_TYPE_RESULT_SET : 'RESULT_SET' ; TYPE_INT : 'Int' ; TYPE_LONG : 'Long' ; TYPE_BOOL : 'Bool' ; TYPE_DATE : 'Date' ; TYPE_DOUBLE : 'Double' ; TYPE_STRING : 'String' ; TYPE

lexers / parsers for (un) structured text documents [closed]

阅读更多关于 lexers / parsers for (un) structured text documents [closed]

问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 2 years ago . There are lots of parsers and lexers for scripts (i.e. structured computer languages). But I'm looking for one which can break a (almost) non-structured text document into larger sections e.g. chapters, paragraphs, etc. It's relatively easy for a person to identify them: where the Table of Contents,

How do I implement a lexer given that I have already implemented a basic regular expression matcher?

阅读更多关于 How do I implement a lexer given that I have already implemented a basic regular expression matcher?

问题 I'm trying to implement a lexer for fun. I have already implemented a basic regular expression matcher(by first converting a pattern to a NFA and then to a DFA). Now I'm clueless about how to proceed. My lexer would be taking a list of tokens and their corresponding regexs. What is the general algorithm used to create a lexer out of this? I thought about (OR)ing all the regex, but then I can't identify which specific token was matched. Even if I extend my regex module to return the pattern

Using ANTLR Parser and Lexer Separatly

阅读更多关于 Using ANTLR Parser and Lexer Separatly

问题 I used ANTLR version 4 for creating compiler.First Phase was the Lexer part. I created "CompilerLexer.g4" file and putted lexer rules in it.It works fine. CompilerLexer.g4: lexer grammar CompilerLexer; INT : 'int' ; //1 FLOAT : 'float' ; //2 BEGIN : 'begin' ; //3 END : 'end' ; //4 To : 'to' ; //5 NEXT : 'next' ; //6 REAL : 'real' ; //7 BOOLEAN : 'bool' ; //8 . . . NOTEQUAL : '!=' ; //46 AND : '&&' ; //47 OR : '||' ; //48 POW : '^' ; //49 ID : [a-zA-Z]+ ; //50 WS : ' ' -> channel(HIDDEN) //50

How does ANTLR decide which lexer rule to apply? The longest matching lexer rule wins?

阅读更多关于 How does ANTLR decide which lexer rule to apply? The longest matching lexer rule wins?

问题 The input content: The grammar: grammar test; p : EOF; Char : [a-z]; fragment Tab : '\t'; fragment Space : ' '; T1 : (Tab|Space)+ ->skip; T2 : '#' T1+ Char+; The matching result is this: [@0,0:6='# abc',<T2>,1:0] <<<<<<<< PLACE 1 [@1,7:6='<EOF>',<EOF>,1:7] line 1:0 extraneous input '# abc' expecting <EOF> Please ignore the error in the last line. I am wondering why the token matched at PLACE 1 is T2 . In the grammar file, the T2 lexer rule goes after the T1 lexer rule. So I expect T1 rule

RegEx with variable data in it - ply.lex

阅读更多关于 RegEx with variable data in it - ply.lex

问题 im using the python module ply.lex to write a lexer. I got some of my tokens specified with regular expression but now im stuck. I've a list of Keywords who should be a token . data is a list with about 1000 Keywords which should be all recognised as one sort of Keyword. This can be for example: _Function1 _UDFType2 and so on. All words in the list are separated by whitespaces thats it. I just want that lexer to recognise the words in this list, so that it would return a token of type

ANTLR: how to parse a region within matching brackets with a lexer

阅读更多关于 ANTLR: how to parse a region within matching brackets with a lexer

问题 i want to parse something like this in my lexer: ( begin expression ) where expressions are also surrounded by brackets. it isn't important what is in the expression, i just want to have all what's between the (begin and the matching ) as a token. an example would be: (begin (define x (+ 1 2))) so the text of the token should be (define x (+ 1 2))) something like PROGRAM : LPAREN BEGIN .* RPAREN; does (obviously) not work because as soon as he sees a ")", he thinks the rule is over, but i

How Lexer lookahead works with greedy and non-greedy matching in ANTLR3 and ANTLR4?

阅读更多关于 How Lexer lookahead works with greedy and non-greedy matching in ANTLR3 and ANTLR4?

问题 If someone would clear my mind from the confusion behind look-ahead relation to tokenizing involving greery/non-greedy matching i'd be more than glad. Be ware this is a slightly long post because it's following my thought process behind. I'm trying to write antlr3 grammar that allows me to match input such as: "identifierkeyword" I came up with a grammar like so in Antlr 3.4: KEYWORD: 'keyword' ; IDENTIFIER : (options {greedy=false;}: (LOWCHAR|HIGHCHAR))+ ; /** lowercase letters */ fragment

How to manually construct an AST?

阅读更多关于 How to manually construct an AST?

问题 I'm currently learning about parsing but i'm a bit confused as how to generate an AST. I have written a parser that correctly verifies whether an expressions conforms to a grammar (it is silent when the expression conforms and raises an exception when it is not). Where do i go from here to build an AST? I found plenty of information on building my LL(1) parser, but very little on then going on to build the AST. My current code (written in very simple Ruby, and including a lexer and a parser)