Throughout the vast number of unicode characters, there are some that actually represent more than one character, like the U+FB00 ligature ff for two \'f\' characters. Is the
You could try the java.text.Normalizer, but I am not really sure if that works for ligatures.