Remove “empty” character from String

后端 未结 9 1979
不思量自难忘°
不思量自难忘° 2020-12-14 10:42

I\'m using a framwork which returns malformed Strings with \"empty\" characters from time to time.

\"foobar\" for example is represented by: [,f,o,o,b,a,r]

T

9条回答
  •  鱼传尺愫
    2020-12-14 11:21

    It's probably the NULL character which is represented by \0. You can get rid of it by String#trim().

    To nail down the exact codepoint, do so:

    for (char c : string.toCharArray()) {
        System.out.printf("U+%04x ", (int) c);
    }
    

    Then you can find the exact character here.


    Update: as per the update:

    Anyone know of a way to just include a range of valid characters instead of excluding 95% of the UTF8 range?

    You can do that with help of regex. See the answer of @polygenelubricants here and this answer.

    On the other hand, you can also just fix the problem in its root instead of workarounding it. Either update the files to get rid of the BOM mark, it's a legacy way to distinguish UTF-8 files from others which is nowadays worthless, or use a Reader which recognizes and skips the BOM. Also see this question.

提交回复
热议问题