Producing histogram Map for IntStream raises compile-time-error

扶醉桌前 提交于 2019-12-06 10:24:41

问题


I'm interested in building a Huffman Coding prototype. To that end, I want to begin by producing a histogram of the characters that make up an input Java String. I've seen many solutions on SO and elsewhere (e.g:here that depend on using the collect() methods for Streams as well as static imports of Function.identity() and Collectors.counting() in a very specific and intuitive way.

However, when using a piece of code eerily similar to the one I linked to above:

private List<HuffmanTrieNode> getCharsAndFreqs(String s){
        Map<Character, Long> freqs = s.chars().collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
        return null;
}

I receive a compile-time-error from Intellij which essentially tells me that there is no arguments to collect that conforms to a Supplier type, as required by its signature:

Unfortunately, I'm new to the Java 8 Stream hierarchy and I'm not entirely sure what the best course of action for me should be. In fact, going the Map way might be too much boilerplate for what I'm trying to do; please advise if so.


回答1:


The problem is that s.chars() returns an IntStream - a particular specialization of Stream and it does not have a collect that takes a single argument; it's collect takes 3 arguments. Obviously you can use boxed and that would transform that IntStream to Stream<Integer>.

Map<Integer, Long> map = yourString.codePoints()
          .boxed()
          .collect(Collectors.groupingBy(
                      Function.identity(), 
                      Collectors.counting()));

But now the problem is that you have counted code-points and not chars. If you absolutely know that your String is made from characters in the BMP, you can safely cast to char as shown in the other answer. If you are not - things get trickier.

In that case you need to get the single unicode code point as a character - but it might not fit into a Java char - that has 2 bytes; and a unicode character can be up to 4 bytes.

In that case your map should be Map<String, Long> and not Map<Character, Long>.

In java-9 with the introduction of supported \X (and Scanner#findAll) this is fairly easy to do:

 String sample = "A" + "\uD835\uDD0A" + "B" + "C";
         Map<String, Long> map = scan.findAll("\\X")
               .map(MatchResult::group)
               .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));


 System.out.println(map); // {A=1, B=1, C=1, 𝔊=1}

In java-8 this would be a bit more verbose:

    String sample = "AA" + "\uD835\uDD0A" + "B" + "C";
    Map<String, Long> map = new HashMap<>();

    Pattern p = Pattern.compile("\\P{M}\\p{M}*+");
    Matcher m = p.matcher(sample);

    while (m.find()) {
        map.merge(m.group(), 1L, Long::sum);
    }
    System.out.println(map); // {A=2, B=1, C=1, 𝔊=1}



回答2:


The String.chars() method returns an IntStream. You probably want to convert it to a Stream<Character> via:

s.chars().mapToObj(c -> (char)c)



回答3:


As already pointed, you could transform the stream to primitive types to Object types.

s.chars().boxed()
 .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));


来源:https://stackoverflow.com/questions/44838954/producing-histogram-map-for-intstream-raises-compile-time-error

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!