Algorithm to check for combining characters in Unicode

后端 未结 3 1631
耶瑟儿~
耶瑟儿~ 2020-12-18 02:45

I intend to normalize to Form C, then divide into \"display units\", basically a glyph plus all following combining characters. For now, I\'m just looking to handle the Lati

3条回答
  •  一个人的身影
    2020-12-18 03:24

    OK I did hack up something similar recently. Enjoy!

      public static List stringToCharacterWithCombiningChars(String fullText) {
        Pattern splitWithCombiningChars = Pattern.compile("(\\p{M}+|\\P{M}\\p{M}*)"); // {M} is any kind of 'mark' http://stackoverflow.com/questions/29110887/detect-any-combining-character-in-java/29111105
        Matcher matcher = splitWithCombiningChars.matcher(fullText);
        ArrayList outGoing = new ArrayList<>();
        while(matcher.find()) {
          outGoing.add(matcher.group());
        }
        return outGoing;
      }
    

    With its accompanying (passing) unit test if it's of worth to followers: https://gist.github.com/rdp/0014de502f37abd64ffd

提交回复
热议问题