How to split a String by space

前端 未结 15 1851
傲寒
傲寒 2020-11-22 10:31

I need to split my String by spaces. For this I tried:

str = \"Hello I\'m your String\";
String[] splited = str.split(\" \");

But it doesn\

15条回答
  •  忘掉有多难
    2020-11-22 10:53

    While the accepted answer is good, be aware that you will end up with a leading empty string if your input string starts with a white space. For example, with:

    String str = " Hello I'm your String";
    String[] splitStr = str.split("\\s+");
    

    The result will be:

    splitStr[0] == "";
    splitStr[1] == "Hello";
    splitStr[2] == "I'm";
    splitStr[3] == "Your";
    splitStr[4] == "String";
    

    So you might want to trim your string before splitting it:

    String str = " Hello I'm your String";
    String[] splitStr = str.trim().split("\\s+");
    

    [edit]

    In addition to the trim caveat, you might want to consider the unicode non-breaking space character (U+00A0). This character prints just like a regular space in string, and often lurks in copy-pasted text from rich text editors or web pages. They are not handled by .trim() which tests for characters to remove using c <= ' '; \s will not catch them either.

    Instead, you can use \p{Blank} but you need to enable unicode character support as well which the regular split won't do. For example, this will work: Pattern.compile("\\p{Blank}", UNICODE_CHARACTER_CLASS).split(words) but it won't do the trim part.

    The following demonstrates the problem and provides a solution. It is far from optimal to rely on regex for this, but now that Java has 8bit / 16bit byte representation, an efficient solution for this becomes quite long.

    public class SplitStringTest
    {
        static final Pattern TRIM_UNICODE_PATTERN = Pattern.compile("^\\p{Blank}*(.*)\\p{Blank}$", UNICODE_CHARACTER_CLASS);
        static final Pattern SPLIT_SPACE_UNICODE_PATTERN = Pattern.compile("\\p{Blank}", UNICODE_CHARACTER_CLASS);
    
        public static String[] trimSplitUnicodeBySpace(String str)
        {
            Matcher trimMatcher = TRIM_UNICODE_PATTERN.matcher(str);
            boolean ignore = trimMatcher.matches(); // always true but must be called since it does the actual matching/grouping
            return SPLIT_SPACE_UNICODE_PATTERN.split(trimMatcher.group(1));
        }
    
        @Test
        void test()
        {
            String words = " Hello I'm\u00A0your String\u00A0";
            // non-breaking space here --^ and there -----^
    
            String[] split = words.split(" ");
            String[] trimAndSplit = words.trim().split(" ");
            String[] splitUnicode = SPLIT_SPACE_UNICODE_PATTERN.split(words);
            String[] trimAndSplitUnicode = trimSplitUnicodeBySpace(words);
    
            System.out.println("words: [" + words + "]");
            System.out.println("split: [" + Arrays.stream(split).collect(Collectors.joining("][")) + "]");
            System.out.println("trimAndSplit: [" + Arrays.stream(trimAndSplit).collect(Collectors.joining("][")) + "]");
            System.out.println("splitUnicode: [" + Arrays.stream(splitUnicode).collect(Collectors.joining("][")) + "]");
            System.out.println("trimAndSplitUnicode: [" + Arrays.stream(trimAndSplitUnicode).collect(Collectors.joining("][")) + "]");
        }
    }
    

    Results in:

    words: [ Hello I'm your String ]
    split: [][Hello][I'm your][String ]
    trimAndSplit: [Hello][I'm your][String ]
    splitUnicode: [][Hello][I'm][your][String]
    trimAndSplitUnicode: [Hello][I'm][your][String]
    

提交回复
热议问题