How to match a string, but case-insensitively?

独自空忆成欢 提交于 2019-12-18 12:13:32

问题


Let's say that I want to match "beer", but don't care about case sensitivity.

Currently I am defining a token to be ('b'|'B' 'e'|'E' 'e'|'E' 'r'|'R') but I have a lot of such and don't really want to handle 'verilythisisaverylongtokenindeedomyyesitis'.

The antlr wiki seems to suggest that it can't be done (in antlr) ... but I just wondered if anyone had some clever tricks ...


回答1:


How about define a lexer token for each permissible identifier character, then construct the parser token as a series of those?

beer: B E E R;

A : 'A'|'a';
B: 'B'|'b';

etc.




回答2:


I would like to add to the accepted answer: a ready -made set can be found at case insensitive antlr building blocks, and the relevant portion included below for convenience

fragment A:('a'|'A');
fragment B:('b'|'B');
fragment C:('c'|'C');
fragment D:('d'|'D');
fragment E:('e'|'E');
fragment F:('f'|'F');
fragment G:('g'|'G');
fragment H:('h'|'H');
fragment I:('i'|'I');
fragment J:('j'|'J');
fragment K:('k'|'K');
fragment L:('l'|'L');
fragment M:('m'|'M');
fragment N:('n'|'N');
fragment O:('o'|'O');
fragment P:('p'|'P');
fragment Q:('q'|'Q');
fragment R:('r'|'R');
fragment S:('s'|'S');
fragment T:('t'|'T');
fragment U:('u'|'U');
fragment V:('v'|'V');
fragment W:('w'|'W');
fragment X:('x'|'X');
fragment Y:('y'|'Y');
fragment Z:('z'|'Z');

So an example is

   HELLOWORLD : H E L L O W O R L D;



回答3:


Define case-insensitive tokens with

BEER: [Bb] [Ee] [Ee] [Rr];



回答4:


New documentation page has appeared in ANTLR GitHub repo: Case-Insensitive Lexing. You can use two approaches:

  1. The one described in @javadba's answer
  2. Or add a character stream to your code, which will transform an input stream to lower or upper case. Examples for the main languages you can find on the same doc page.

My opinion, it's better to use the first approach and have the grammar which describes all the rules. But if you use well-known grammar, for example from Grammars written for ANTLR v4, then second approach may be more appropriate.




回答5:


A solution I used in C#: use ASCII code to shift character to smaller case.

class CaseInsensitiveStream : Antlr4.Runtime.AntlrInputStream {
  public CaseInsensitiveStream(string sExpr)
     : base(sExpr) {
  }
  public override int La(int index) {
     if(index == 0) return 0;
     if(index < 0) index++;
     int pdx = p + index - 1;
     if(pdx < 0 || pdx >= n) return TokenConstants.Eof;
     var x1 = data[pdx];
     return (x1 >= 65 && x1 <= 90) ? (97 + x1 - 65) : x1;
  }
}


来源:https://stackoverflow.com/questions/1844562/how-to-match-a-string-but-case-insensitively

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!