How do I detect unicode characters in a Java string?

前端 未结 6 1761
不思量自难忘°
不思量自难忘° 2020-12-03 10:28

Suppose I have a string that contains Ü. How would I find all those unicode characters? Should I test for their code? How would I do that?

For example, given the str

6条回答
  •  北海茫月
    2020-12-03 10:49

    The definition of "unicode characters" is vague, but will be taken to mean UTF-8 characters not covered by the standard ISO 8859 charset. If this is true in your case, then loop through all characters in the String and test its codepoint to determine whether it is within the given character set.

    Alternatively, use a Map and characters in the map that contain match the keys. For example:

    Map charReplacementMap = new HashMap() {{
        put('Ü', 'Y');
        // Put more here.
    }};
    
    String originalString = "AÜAÜ";
    StringBuilder builder = new StringBuilder();
    
    for (char currentChar : originalString.toCharArray()) {
        Character replacementChar = charReplacementMap.get(currentChar);
        builder.append(replacementChar != null ? replacementChar : currentChar);
    }
    
    String newString = builder.toString();
    

    Or, do you mean "all characters with diacritics"? If so, then use java.text.Normalizer to remove diacritical marks:

    /**
     * Remove any diacritical marks (accents like ç, ñ, é, etc) from
     * the given string (so that it returns plain c, n, e, etc).
     * @param string The string to remove diacritical marks from.
     * @return The string with removed diacritical marks, if any.
     */
    public static String removeDiacriticalMarks(String string) {
        return Normalizer.normalize(string, Form.NFD)
            .replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
    }
    

    One pitfall, Ü would become U, not Y. Not sure if that's what you're after. If you want to replace by pronounced character, you'll really need to create a mapping. Sure, it's a tedious work, but it's done in less time than you needed to follow this topic.

提交回复
热议问题