replace any non-ascii character in a string in java

穿精又带淫゛_ 提交于 2021-02-07 08:26:56

问题


How would one convert -lrb-300-rrb- 922-6590 to -lrb-300-rrb- 922-6590 in java?

Have tried the following:

t.lemma = lemma.replaceAll("\\p{C}", " ");
t.lemma = lemma.replaceAll("[\u0000-\u001f]", " ");

Am probably missing something conceptual. Will appreciate any pointers to the solution.

Thank you


回答1:


Try the next:

str = str.replaceAll("[^\\p{ASCII}]", " ");

By the way, \p{ASCII} is all ASCII: [\x00-\x7F].

In ahother hand, you need to use a constant of Pattern for avoid recompiled the expression every time.

private static final Pattern REGEX_PATTERN = 
        Pattern.compile("[^\\p{ASCII}]");

public static void main(String[] args) {
    String input = "-lrb-300-rrb- 922-6590";
    System.out.println(
        REGEX_PATTERN.matcher(input).replaceAll(" ")
    );  // prints "-lrb-300-rrb- 922-6590"
}

See also:

  • http://en.wikipedia.org/wiki/ASCII
  • http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html



回答2:


Assuming you only want to keep a-zA-Z0-9 and punctuation characters, you could do:

t.lemma = lemma.replaceAll("[^\\p{Punct}\\w]", " "));


来源:https://stackoverflow.com/questions/18623868/replace-any-non-ascii-character-in-a-string-in-java

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!