lexer | 易学教程

How to resolve conflict between two choices starting with same tokens in javacc

阅读更多关于 How to resolve conflict between two choices starting with same tokens in javacc

问题 I'm trying to write a compiler for some specific format of messages. My problem if I simplify it is: < WORD : ([LETTER]){2,5}> < ANOTHER_WORD : (<LETTER>|<DIGIT>){1,5}> < SPECIAL_WORLD : "START"> void grammar(): { } { <WORD><ANOTHER_WORD> | <SPECIAL_WORD><ANOTHER_WORD> } Here my special word is matched always as a WORD which is logical of course but since the conflict is at the beginning of the production I don't know how to resolve it. some help would be appreciated. 回答1: Put the rule for

Why is this function not breaking up this input string?

阅读更多关于 Why is this function not breaking up this input string?

问题 I'm trying to break up a string into "symbols" with C++ for further work. I haven't written anything in C++ for a long while, so forgive me if there is something inherently wrong with this code. The purpose of the symbolize() function below is to break up a string, such as "5+5", into a vector of strings, eg {"5","+","5"} . It's not working. If you think the code is too messy, please suggest a way to simplify it. Here's my code so far: #include <iostream> #include <string> #include <vector>

lex program on counting no of comment lines

阅读更多关于 lex program on counting no of comment lines

问题 here the program counts the no of comment lines, single line comments and multi line comments and gives a total comments output with a file.txt as input file.txt //hellow world /*hello world1*/ /*hello world2 */ /*hello world3 hello world3.1*/ #include<> count.l %{ #include<stdio.h> #include<stdlib.h> int a=0,b=0,c=0,d; %} %% "//".* {a++;} "/*" {b++;} .*"*/" {b--;c++;} %% void main(int argc,char *argv[]){ yyin=fopen(argv[1],"r"); yylex(); printf("single line %d \nmultiline %d \n",a,c); d=a+c;

I don't understand how to use the lexeme function

阅读更多关于 I don't understand how to use the lexeme function

问题 From Text.Parsec.Token : lexeme p = do { x <- p; whiteSpace; return x } It appears that lexeme takes a parser p and delivers a parser that has the same behavior as p, except that it also skips all the trailing whitespace. Correct? Then how come the following does not work: constant :: Parser Int constant = do digits <- many1 digit return (read digits) lexConst :: Parser Int lexConst = lexeme constant The last line results in the following error message: Couldn't match expected type `ParsecT

Basic problem with yacc/lex

阅读更多关于 Basic problem with yacc/lex

问题 I have some problems with a very simple yacc/lex program. I have maybe forgotten some basic steps (it's been a long time since I've used these tools). In my lex program I give some basic values like : word [a-zA-Z][a-zA-Z]* %% ":" return(PV); {word} { yylval = yytext; printf("yylval = %s\n",yylval); return(WORD); } "\n" return(ENDLINE); In my yacc program the beginning of my grammar is (where TranslationUnit is my %start) : TranslationUnit: /* Nothing */ | InfoBlock Data ; InfoBlock: /*

ANTLR4: Unrecognized constant value in a lexer command

阅读更多关于 ANTLR4: Unrecognized constant value in a lexer command

问题 I am learning how to use the "more" lexer command. I typed in the lexer grammar shown in the ANTLR book, page 281: lexer grammar Lexer_To_Test_More_Command ; LQUOTE : '"' -> more, mode(STR) ; WS : [ \t\r\n]+ -> skip ; mode STR ; STRING : '"' -> mode(DEFAULT_MODE) ; TEXT : . -> more ; Then I created this simple parser to use the lexer: grammar Parser_To_Test_More_Command ; import Lexer_To_Test_More_Command ; test: STRING EOF ; Then I opened a DOS window and entered this command: antlr4 Parser

ANTLR behaviour with conflicting tokens

阅读更多关于 ANTLR behaviour with conflicting tokens

问题 How is ANTLR lexer behavior defined in the case of conflicting tokens? Let me explain what I mean by "conflicting" tokens. For example, assume that the following is defined: INT_STAGE : '1'..'6'; INT : '0'..'9'+; There is a conflict here, because after reading a sequence of digits, the lexer would not know whether there is one INT or many INT_STAGE tokens (or different combinations of both). After a test, it looks like that if INT is defined after INT_STAGE, the lexer would prefer to find INT

Parser for xml DTD file

阅读更多关于 Parser for xml DTD file

问题 I am quite new in implementing a parser and I am trying to pars a xml DTD file to generate a context free grammar for it. I tried pyparsing and yacc but still I could get any result. So I would appreciate if some one could provide me some tips or sample code to write such a parser. following is a sample DTD file: <!DOCTYPE PcSpecs [ <!ELEMENT PCS (PC*)> <!ELEMENT PC (MODEL, PRICE, PROCESSOR, RAM, DISK+)> <!ELEMENT MODEL (\#PCDATA)> <!ELEMENT PRICE (\#PCDATA)> <!ELEMENT PROCESSOR (MANF, MODEL,

Antlr (lexer): matching the right token

阅读更多关于 Antlr (lexer): matching the right token

问题 In my Antlr3 grammar, I have several "overlapping" lexer rules, like this: NAT: ('0' .. '9')+ ; INT: ('+' | '-')? ('0' .. '9')+ ; BITVECTOR: ('0' | '1')* ; Although tokens like 100110 and 123 can be matched by more than one of those rules, it is always determined by context which of them it has to be. Example: s: a | b | c ; a: '<' NAT '>' ; b: '{' INT '}' ; c: '[' BITVECTOR ']' ; The input {17} should then match { , INT , and } , but the lexer has already decided that 17 is a NAT-token. How

How can I check if first character of a line is “*” in ANTLR4?

阅读更多关于 How can I check if first character of a line is “*” in ANTLR4?

问题 I am trying to write a parser for a relatively simple but idiosyncratic language. Simply put, one of the rules is that comment lines are denoted by an asterisk only if that asterisk is the first character of the line. How might I go about formalising such a rule in ANTLR4? I thought about using: START_LINE_COMMENT: '\n*' .*? '\n' -> skip; But I am certain this won't work with more than one line comment in a row, as the newline at the end will be consumed as part of the START_LINE_COMMENT