I intend to normalize to Form C, then divide into \"display units\", basically a glyph plus all following combining characters. For now, I\'m just looking to handle the Lati
OK I did hack up something similar recently. Enjoy!
public static List stringToCharacterWithCombiningChars(String fullText) {
Pattern splitWithCombiningChars = Pattern.compile("(\\p{M}+|\\P{M}\\p{M}*)"); // {M} is any kind of 'mark' http://stackoverflow.com/questions/29110887/detect-any-combining-character-in-java/29111105
Matcher matcher = splitWithCombiningChars.matcher(fullText);
ArrayList outGoing = new ArrayList<>();
while(matcher.find()) {
outGoing.add(matcher.group());
}
return outGoing;
}
With its accompanying (passing) unit test if it's of worth to followers: https://gist.github.com/rdp/0014de502f37abd64ffd