Remove non printable utf8 characters except controlchars from String

ぐ巨炮叔叔 提交于 2019-12-04 07:30:37

You have already found Unicode character properties.

You can invert the character property, by changing the case of the leading "p"


\p{L} matches all letters

\P{L} matches all characters that does not have the property letter.

So if you think \P{Cc} is what you need, then \p{Cc} would match the opposite.

More details on

I am quite sure \p{Cc} is close to what you want, but be careful, it does include, e.g. the tab (0x09), the Linefeed (0x0A) and the Carriage return (0x0D).

But you can create you own character class, like this:


This class [^...] is a negated character class, so this would match everything that is not "Not control character" (double negation, so it matches control chars), and not tab, CR and LF.

You can use,

your_string.replaceAll("\\p{C}", "");