Remove “empty” character from String

后端未结

关注

 9  1968

不思量自难忘°

I\'m using a framwork which returns malformed Strings with \"empty\" characters from time to time.

\"foobar\" for example is represented by: [,f,o,o,b,a,r]

相关标签:

9条回答

爱一瞬间的悲伤

2020-12-14 11:19

This is what worked for me:-

    StringBuilder sb = new StringBuilder();
    for (char character : myString.toCharArray()) {
        int i = (int) character;
        if (i > 0 && i <= 256) {
            sb.append(character);
        }
    }  
    return sb.toString();

The int value of my NULL characters was in the region of 8103 or something.

0 讨论(0)

鱼传尺愫

2020-12-14 11:21
It's probably the NULL character which is represented by \0. You can get rid of it by String#trim().

To nail down the exact codepoint, do so:
```
for (char c : string.toCharArray()) {
    System.out.printf("U+%04x ", (int) c);
}
```
Then you can find the exact character here.

Update: as per the update:

Anyone know of a way to just include a range of valid characters instead of excluding 95% of the UTF8 range?

You can do that with help of regex. See the answer of @polygenelubricants here and this answer.

On the other hand, you can also just fix the problem in its root instead of workarounding it. Either update the files to get rid of the BOM mark, it's a legacy way to distinguish UTF-8 files from others which is nowadays worthless, or use a Reader which recognizes and skips the BOM. Also see this question.
0 讨论(0)
发布评论:

提交评论
- 加载中...
栀梦

2020-12-14 11:25
You could check for the whitespace like this:
```
if (character.equals(' ')){ // }
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
栀梦

2020-12-14 11:28

Simply malformedString.trim() will solve the issue.

0 讨论(0)
发布评论:

提交评论
- 加载中...

南旧

2020-12-14 11:31

for (int i = 0; i < s.length(); i++)
    if (s.charAt(i) == ' ') {
        your code....
    }

0 讨论(0)

难免孤独

2020-12-14 11:32
A very simple way to remove the UTF-8 BOM from a string, using substring as Denis Tulskiy suggested. No looping needed. Just checks the first character for the mark and skips it if needed.
```
public static String removeUTF8BOM(String s) {
    if (s.startsWith("\uFEFF")) {
        s = s.substring(1);
    }
    return s;
}
```
I needed to add this to my code when using the Apache HTTPClient EntityUtil to read from a webserver. The webserver was not sending the blank mark but it was getting pulled in while reading the input stream. Original article can be found here.
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页