Antlrworks - extraneous input

送分小仙女□ 提交于 2019-12-24 01:37:21

问题


I am new in this stuff, and for that reason I will need your help.. I am trying to parse the Wikipedia Dump, and my first step is to map each rule defined by them into ANTLR, unfortunally I got my first barrier:

line 1:8 extraneous input ''''' expecting '\'\''

I am not understanding what is going on, please lend me your help.

My code:

grammar Test;

options {
    language = Java;
}

parse
    :  term+ EOF
    ;

term 
    :  IDENT
    |  '[[' term ']]'
    |  '\'\'' term '\'\''
    |  '\'\'\'' term '\'\'\''
    ;    

IDENT
    :  ('a'..'z' | 'A'..'Z' | '0'..'9' | '=' | '#' | '"' | ' ')*
    ;

Input '''''Hello World'''''


回答1:


A lexer rule must always match at least 1 character. Your rule:

IDENT : ('a'..'z' | 'A'..'Z' | '0'..'9' | '=' | '#' | '"' | ' ')*;

matches an empty string (of which there are an infinite amount of). Change the * to a +:

IDENT : ('a'..'z' | 'A'..'Z' | '0'..'9' | '=' | '#' | '"' | ' ')+;

EDIT

Input '''''Hello World'''''

Although you put literal tokens inside parser rules ('\'\'\'', '\'\'', etc.), you must understand that they are not created at the behest of the parser. The lexer follows strict rules to create tokens:

  1. it tries to match as much as possible
  2. if 2 different lexer rules match the same amount of characters, the one defined first will get precedence

Let's give your literal tokens a name:

BRACKET_OPEN  : '[[';
BRACKET_CLOSE : ']]';
Q3            : '\'\'\'';
Q2            : '\'\'';
IDENT         :  ('a'..'z' | 'A'..'Z' | '0'..'9' | '=' | '#' | '"' | ' ')+;

Now, because of rule #1 (match as much as possible), the input '''''Hello World''''' will be tokenized as follows:

  • Q3
  • Q2
  • IDENT
  • Q3 (yes, a Q3!)
  • Q2

But your parser rule term will only accept Q3 Q2 IDENT Q2 Q3, so it is correct that your input fails to parse properly.

Also, I recommend you not use the interpreter: it's rather buggy. The debugger works like a charm though!



来源:https://stackoverflow.com/questions/21242730/antlrworks-extraneous-input

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!