How to include ё in [а-я] regexp char interval

前端 未结 3 910
不思量自难忘°
不思量自难忘° 2021-01-07 10:54

Russian alphabet includes the letter ё, which was undeservedly forgotten at beggining of computing.

So, if i want to use a regexp with character diapaso

3条回答
  •  时光取名叫无心
    2021-01-07 11:51

    This is cool - I had never thought that much about character ranges in unicode.

    It seems that for some reason А-я were encoded in the unicode range 0x410 to 0x44f, but some other characters (such as ё) were added in 0x400 to 0x410 and then 0x450 to 0x45f (wikipedia has a full breakdown of what characters went where)

    As a consequence, /[Ѐ-ё]/ should work, but might feel quite illogical to a native speaker.

    You can of course do raw unicode escapes, i.e. /[\u0400-\u045f]/ (or up until \u04ff if you want the full cyrillic block) but that does make you either remember that (or assign it to some constant for future use).

    Lastly, you can refer to entire scripts with

    /\p{Cyrillic}/
    

    although my understanding is that this includes more characters, such as Ԧ

提交回复
热议问题