How to replace \\\\u by \\u in Java String

和自甴很熟 提交于 2019-12-06 03:50:47

Unfortunately I do not know of a sort of eval.

    String s = "aaa\\u2022bbb\\u2014ccc";
    StringBuffer buf = new StringBuffer();
    Matcher m = Pattern.compile("\\\\u([0-9A-Fa-f]{4})").matcher(s);
    while (m.find()) {
        try {
            int cp = Integer.parseInt(m.group(1), 16);
            m.appendReplacement(buf, "");
            buf.appendCodePoint(cp);
        } catch (NumberFormatException e) {
        }
    }
    m.appendTail(buf);
    s = buf.toString();
Mike Clark

In addition to escaping your escapes -- as other people (e.g. barsju) have pointed out -- you must also consider that the usual conversion of the \uNNNN notation to an actual Unicode character is done by the Java compiler at compile-time.

So even once you sort out the backslash escaping issue, you may very well have have further trouble getting the actual Unicode character to display because you appear to be manipulating the string at run-time, not at compile-time.

This answer provides a method to replace \uNNNN escape sequences in a run-time string with the actual corresponding Unicode characters. Note that the method has some TODOs left with regard to error handling, bounds checking, and unexpected input.

(Edit: I think the regex-based solutions provided here by e.g. dash1e would be better than the method I linked, as they are more polished with regards to handling unexpected input data).

Try

Pattern unicode = Pattern.compile("\\\\u(.{4})");
Matcher matcher = unicode.matcher("aaa\\u2022bbb\\u2014ccc");
StringBuffer sb = new StringBuffer();
while (matcher.find()) {
    int code = Integer.parseInt(matcher.group(1), 16);
    matcher.appendReplacement(sb, new String(Character.toChars(code)));
}
matcher.appendTail(sb);
System.out.println(sb.toString());

You need to escape your escapes:

System.out.println("aaa\\u2022bbb\\u2014ccc".replace("\\\\u", "\\u"));
String input = "aaa\\u2022bbb\\u2014ccc";
String korv = input.replace("\\\\u", "\\u");
System.out.println(korv);

=>

aaa\u2022bbb\u2014ccc

This because "\" is a special character in a string, so you need to quote it as well. "\" == "\".

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!