问题
How would one convert -lrb-300-rrb- 922-6590
to -lrb-300-rrb- 922-6590
in java?
Have tried the following:
t.lemma = lemma.replaceAll("\\p{C}", " ");
t.lemma = lemma.replaceAll("[\u0000-\u001f]", " ");
Am probably missing something conceptual. Will appreciate any pointers to the solution.
Thank you
回答1:
Try the next:
str = str.replaceAll("[^\\p{ASCII}]", " ");
By the way, \p{ASCII}
is all ASCII: [\x00-\x7F]
.
In ahother hand, you need to use a constant of Pattern
for avoid recompiled the expression every time.
private static final Pattern REGEX_PATTERN =
Pattern.compile("[^\\p{ASCII}]");
public static void main(String[] args) {
String input = "-lrb-300-rrb- 922-6590";
System.out.println(
REGEX_PATTERN.matcher(input).replaceAll(" ")
); // prints "-lrb-300-rrb- 922-6590"
}
See also:
- http://en.wikipedia.org/wiki/ASCII
- http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
回答2:
Assuming you only want to keep a-zA-Z0-9
and punctuation characters, you could do:
t.lemma = lemma.replaceAll("[^\\p{Punct}\\w]", " "));
来源:https://stackoverflow.com/questions/18623868/replace-any-non-ascii-character-in-a-string-in-java