问题
I am tryingo to parse RegEx and specifically the following:
[A-Z0-9]{1,20}
The problem is, i don't know how to make the following grammar work beacuse the Char and Int tokens are both recognizing the digit.
grammar RegEx;
regEx : (character count? )+ ;
character : Char
| range ;
range : '[' (rangeChar|rangeX)+ ']' ;
rangeX : rangeStart '-' rangeEnd ;
rangeChar : Char ;
rangeStart : Char ;
rangeEnd : Char ;
count : '{' (countExact | (countMin ',' countMax) ) '}' ;
countMin : D+ ;
countMax : Int ;
countExact : Int ;
channels {
COUNT_CHANNEL,
RANGE_CHANNEL
}
Char : D | C ;
Int : D+ -> channel(COUNT_CHANNEL) ;
Semicolon : ';' ;
Comma : ',' ;
Asterisk : '*' ;
Plus : '+' ;
Dot : '.' ;
Dash : '-' ;
//CourlyBracketL : '{' ;
//CourlyBracketR : '}' ;
WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines, \r (Windows)
fragment D : [0-9] ;
fragment C : [a-zA-Z] ;
Now, I'm a noob and I am lost wether should I try lexer modes, channels some ifs or what is the "normal" approach here. Thanks!
回答1:
Putting tokens on any channel other than the default hides them from the normal operation of the parser.
Try not to combine tokens in the lexer -- winds up loosing information that can be useful in the parser.
Try this:
grammar RegEx;
regEx : ( value count? )+ ;
value : alphNum | range ;
range : LBrack set+ RBrack ;
set : b=alphNum ( Dash e=alphNum)? ;
count : LBrace min=num ( Comma max=num )? RBrace ;
alphNum : Char | Int ;
num : Int+ ;
Char : ALPHA ;
Int : DIGIT ;
Semi : ';' ;
Comma : ',' ;
Star : '*' ;
Plus : '+' ;
Dot : '.' ;
Dash : '-' ;
LBrace : '{' ;
RBrace : '}' ;
LBrack : '[' ;
RBrack : ']' ;
WS : [ \t\r\n]+ -> skip ;
fragment DIGIT : [0-9] ;
fragment ALPHA : [a-zA-Z] ;
来源:https://stackoverflow.com/questions/34177478/antlr4-parsing-regex