Collect stream with grouping, counting and filtering operations

别说谁变了你拦得住时间么 提交于 2019-12-05 16:55:47

问题


I'm trying to collect stream throwing away rarely used items like in this example:

import java.util.*;
import java.util.function.Function;
import static java.util.stream.Collectors.*;
import static org.hamcrest.MatcherAssert.assertThat;
import static org.hamcrest.Matchers.containsInAnyOrder;
import org.junit.Test;

@Test
public void shouldFilterCommonlyUsedWords() {
    // given
    List<String> allWords = Arrays.asList(
       "call", "feel", "call", "very", "call", "very", "feel", "very", "any");

    // when
    Set<String> commonlyUsed = allWords.stream()
            .collect(groupingBy(Function.identity(), counting()))
            .entrySet().stream().filter(e -> e.getValue() > 2)
            .map(Map.Entry::getKey).collect(toSet());

    // then
    assertThat(commonlyUsed, containsInAnyOrder("call", "very"));
}

I have a feeling that it is possible to do it much simpler - am I right?


回答1:


A while ago I wrote an experimental distinct(atLeast) method for my library:

public StreamEx<T> distinct(long atLeast) {
    if (atLeast <= 1)
        return distinct();
    AtomicLong nullCount = new AtomicLong();
    ConcurrentHashMap<T, Long> map = new ConcurrentHashMap<>();
    return filter(t -> {
        if (t == null) {
            return nullCount.incrementAndGet() == atLeast;
        }
        return map.merge(t, 1L, (u, v) -> (u + v)) == atLeast;
    });
}

So the idea was to use it like this:

Set<String> commonlyUsed = StreamEx.of(allWords).distinct(3).toSet();

This performs a stateful filtration, which looks a little bit ugly. I doubted whether such feature is useful thus I did not merge it into the master branch. Nevertheless it does the job in single stream pass. Probably I should revive it. Meanwhile you can copy this code into the static method and use it like this:

Set<String> commonlyUsed = distinct(allWords.stream(), 3).collect(Collectors.toSet());

Update (2015/05/31): I added the distinct(atLeast) method to the StreamEx 0.3.1. It's implemented using custom spliterator. Benchmarks showed that this implementation is significantly faster for sequential streams than stateful filtering described above and in many cases it's also faster than other solutions proposed in this topic. Also it works nicely if null is encountered in the stream (the groupingBy collector doesn't support null as class, thus groupingBy-based solutions will fail if null is encountered).




回答2:


There is no way around creating a Map, unless you want accept a very high CPU complexity.

However, you can remove the second collect operation:

Map<String,Long> map = allWords.stream()
    .collect(groupingBy(Function.identity(), HashMap::new, counting()));
map.values().removeIf(l -> l<=2);
Set<String> commonlyUsed=map.keySet();

Note that in Java 8, HashSet still wraps a HashMap, so using the keySet() of a HashMap, when you want a Set in the first place, doesn’t waste space given the current implementation.

Of course, you can hide the post-processing in a Collector if that feels more “streamy”:

Set<String> commonlyUsed = allWords.stream()
    .collect(collectingAndThen(
        groupingBy(Function.identity(), HashMap::new, counting()),
        map-> { map.values().removeIf(l -> l<=2); return map.keySet(); }));



回答3:


I personally prefer Holger's solution (+1), but, instead of removing elements from the groupingBy map, I would filter its entrySet and map the result to a Set in the finalizer (it feels even more streamy to me)

    Set<String> commonlyUsed = allWords.stream().collect(
            collectingAndThen(
                groupingBy(identity(), counting()), 
                (map) -> map.entrySet().stream().
                            filter(e -> e.getValue() > 2).
                            map(e -> e.getKey()).
                            collect(Collectors.toSet())));


来源:https://stackoverflow.com/questions/30515792/collect-stream-with-grouping-counting-and-filtering-operations

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!