ANTLR4: Using non-ASCII characters in token rules
On page 74 of the ANTRL4 book it says that any Unicode character can be used in a grammar simply by specifying its codepoint in this manner: '\uxxxx' where xxxx is the hexadecimal value for the Unicode codepoint. So I used that technique in a token rule for an ID token: grammar ID; id : ID EOF ; ID : ('a' .. 'z' | 'A' .. 'Z' | '\u0100' .. '\u017E')+ ; WS : [ \t\r\n]+ -> skip ; When I tried to parse this input: Gŭnter ANTLR throws an error, saying that it does not recognize ŭ . (The ŭ character is hex 016D, so it is within the range specified) What am I doing wrong please? ANTLR is ready to