How would you get an array of Unicode code points from a .NET String?

前端 未结 5 1401
日久生厌
日久生厌 2020-12-09 03:47

I have a list of character range restrictions that I need to check a string against, but the char type in .NET is UTF-16 and therefore some characters become wa

5条回答
  •  北海茫月
    2020-12-09 04:14

    This answer is not correct. See @Virtlink's answer for the correct one.

    static int[] ExtractScalars(string s)
    {
      if (!s.IsNormalized())
      {
        s = s.Normalize();
      }
    
      List chars = new List((s.Length * 3) / 2);
    
      var ee = StringInfo.GetTextElementEnumerator(s);
    
      while (ee.MoveNext())
      {
        string e = ee.GetTextElement();
        chars.Add(char.ConvertToUtf32(e, 0));
      }
    
      return chars.ToArray();
    }
    

    Notes: Normalization is required to deal with composite characters.

提交回复
热议问题