ANTLR4 parsing RegEx

痞子三分冷 提交于 2020-01-17 06:04:05

问题


I am tryingo to parse RegEx and specifically the following:

[A-Z0-9]{1,20}

The problem is, i don't know how to make the following grammar work beacuse the Char and Int tokens are both recognizing the digit.

grammar RegEx;            

regEx : (character count? )+ ;

character : Char 
          | range ;

range  : '[' (rangeChar|rangeX)+ ']' ;
rangeX : rangeStart '-' rangeEnd ;
rangeChar : Char ;
rangeStart : Char ;
rangeEnd : Char ;

count : '{' (countExact | (countMin ',' countMax) ) '}' ;
countMin : D+ ;
countMax : Int ;
countExact : Int ;

channels {
  COUNT_CHANNEL,
  RANGE_CHANNEL
}

Char : D | C ; 
Int : D+ -> channel(COUNT_CHANNEL) ;

Semicolon : ';' ;
Comma : ',' ;
Asterisk : '*' ;
Plus : '+' ; 
Dot : '.' ;  
Dash : '-' ;
//CourlyBracketL : '{' ;
//CourlyBracketR : '}' ;

WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines, \r (Windows)

fragment D : [0-9] ;
fragment C : [a-zA-Z] ;

Now, I'm a noob and I am lost wether should I try lexer modes, channels some ifs or what is the "normal" approach here. Thanks!


回答1:


Putting tokens on any channel other than the default hides them from the normal operation of the parser.

Try not to combine tokens in the lexer -- winds up loosing information that can be useful in the parser.

Try this:

grammar RegEx;

regEx   : ( value count? )+ ;

value   : alphNum | range ;
range   : LBrack set+ RBrack ;
set     : b=alphNum ( Dash e=alphNum)? ;

count   : LBrace min=num ( Comma max=num )? RBrace ;

alphNum : Char | Int ;
num     : Int+   ;

Char    : ALPHA  ;
Int     : DIGIT  ;

Semi    : ';' ;
Comma   : ',' ;
Star    : '*' ;
Plus    : '+' ;
Dot     : '.' ;
Dash    : '-' ;
LBrace  : '{' ;
RBrace  : '}' ;
LBrack  : '[' ;
RBrack  : ']' ;

WS : [ \t\r\n]+ -> skip ;

fragment DIGIT : [0-9] ;
fragment ALPHA : [a-zA-Z] ;


来源:https://stackoverflow.com/questions/34177478/antlr4-parsing-regex

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!