Remove “empty” character from String

后端 未结 9 1949
不思量自难忘°
不思量自难忘° 2020-12-14 10:42

I\'m using a framwork which returns malformed Strings with \"empty\" characters from time to time.

\"foobar\" for example is represented by: [,f,o,o,b,a,r]

T

相关标签:
9条回答
  • 2020-12-14 11:39

    Regex would be an appropriate way to sanitize the string from unwanted Unicode characters in this case.

    String sanitized = dirty.replaceAll("[\uFEFF-\uFFFF]", ""); 
    

    This will replace all char in \uFEFF-\uFFFF range with the empty string.

    The [...] construct is called a character class, e.g. [aeiou] matches one of any of the lowercase vowels, [^aeiou] matches anything but.

    You can do one of these two approaches:

    • replaceAll("[blacklist]", "")
    • replaceAll("[^whitelist]", "")

    References

    • regular-expressions.info
    0 讨论(0)
  • 2020-12-14 11:41

    trim left or right removes white spaces. does it has a colon before space?

    even more: a=(long) string[0]; will show u the char code, and u can use replace() or substring.

    0 讨论(0)
  • 2020-12-14 11:42

    Thank you Johannes Rössel. It actually was '\uFEFF'

    The following code works:

     final StringBuilder sb = new StringBuilder();
        for (final char character : body.toCharArray()) {
           if (character != '\uFEFF') {
              sb.append(character);
           }
         }  
     final String sanitzedString = sb.toString();
    

    Anyone know of a way to just include a range of valid characters instead of excluding 95% of the UTF8 range?

    0 讨论(0)
提交回复
热议问题