write a grammar rule name in unicode [ANTLR 4]

问题

I am still a beginner in ANTLR 4 and I was wondering if there is a way to write a grammar rule name in unicode. For example, the following rule is fine:

atomExp returns [double value] : n=Number {$value = Double.parseDouble($n.text);} | '(' exp=additionExp ')' {$value = $exp.value;} ;

However, let's say I want to write the same rule but instead of writing its name as "atomExp" , I want to write the name as an Arabic word "تعبير"

تعبير returns [double value] : n=Number {$value = Double.parseDouble($n.text);} | '(' exp=additionExp ')' {$value = $exp.value;} ;

but when I try to write it that way I get "no viable alternative" error. Can someone solve my problem please. Thanks in advance

回答1:

When looking at the lexer grammar for ANTLR4, you can see that lexer and parser names support certain Unicode chars:

/** Allow unicode rule/token names */
ID  :   NameStartChar NameChar*;

fragment
NameChar
    :   NameStartChar
    |   '0'..'9'
    |   '_'
    |   '\u00B7'
    |   '\u0300'..'\u036F'
    |   '\u203F'..'\u2040'
    ;

fragment
NameStartChar
    :   'A'..'Z'
    |   'a'..'z'
    |   '\u00C0'..'\u00D6'
    |   '\u00D8'..'\u00F6'
    |   '\u00F8'..'\u02FF'
    |   '\u0370'..'\u037D'
    |   '\u037F'..'\u1FFF'
    |   '\u200C'..'\u200D'
    |   '\u2070'..'\u218F'
    |   '\u2C00'..'\u2FEF'
    |   '\u3001'..'\uD7FF'
    |   '\uF900'..'\uFDCF'
    |   '\uFDF0'..'\uFFFD'
    ; // ignores | ['\u10000-'\uEFFFF] ;

INT : [0-9]+
       ;

But it appears that your ID تعبير does not comply with the NameChar* part of the ID rule.

来源：https://stackoverflow.com/questions/30614712/write-a-grammar-rule-name-in-unicode-antlr-4

标签

java

parsing

unicode

antlr

antlr4