What is the easiest/best/most correct way to iterate through the characters of a string in Java?

前端 未结 15 1422
挽巷
挽巷 2020-11-22 11:14

StringTokenizer? Convert the String to a char[] and iterate over that? Something else?

15条回答
  •  故里飘歌
    2020-11-22 11:37

    In Java 8 we can solve it as:

    String str = "xyz";
    str.chars().forEachOrdered(i -> System.out.print((char)i));
    str.codePoints().forEachOrdered(i -> System.out.print((char)i));
    

    The method chars() returns an IntStream as mentioned in doc:

    Returns a stream of int zero-extending the char values from this sequence. Any char which maps to a surrogate code point is passed through uninterpreted. If the sequence is mutated while the stream is being read, the result is undefined.

    The method codePoints() also returns an IntStream as per doc:

    Returns a stream of code point values from this sequence. Any surrogate pairs encountered in the sequence are combined as if by Character.toCodePoint and the result is passed to the stream. Any other code units, including ordinary BMP characters, unpaired surrogates, and undefined code units, are zero-extended to int values which are then passed to the stream.

    How is char and code point different? As mentioned in this article:

    Unicode 3.1 added supplementary characters, bringing the total number of characters to more than the 2^16 = 65536 characters that can be distinguished by a single 16-bit char. Therefore, a char value no longer has a one-to-one mapping to the fundamental semantic unit in Unicode. JDK 5 was updated to support the larger set of character values. Instead of changing the definition of the char type, some of the new supplementary characters are represented by a surrogate pair of two char values. To reduce naming confusion, a code point will be used to refer to the number that represents a particular Unicode character, including supplementary ones.

    Finally why forEachOrdered and not forEach ?

    The behaviour of forEach is explicitly nondeterministic where as the forEachOrdered performs an action for each element of this stream, in the encounter order of the stream if the stream has a defined encounter order. So forEach does not guarantee that the order would be kept. Also check this question for more.

    For difference between a character, a code point, a glyph and a grapheme check this question.

提交回复
热议问题