How can I find the largest M numbers from N numbers in Java 8?

℡╲_俬逩灬. 提交于 2019-12-07 18:12:07

问题


IntStream may be a the easiest way but I can only pick up smallest M numbers as below:

public class Test {
    private static final int[] arr = {5, 3, 4, 2, 9, 1, 7, 8, 6};

    public static void main(String[] args) throws Exception {
        System.out.println(Arrays.asList(IntStream.of(arr).sorted().limit(5).boxed().toArray()));
    }
}

btw, considering algorithm complexity and assuming N >> M, a "sorted + limit" approach just have a complexity of O(N log(N)).

I think the best complexity may reach to O(N log(M)) but I do not know whether Java 8 has this kind of stream methods or collectors.


回答1:


If you must use Streams:

IntStream.of(arr).sorted().skip(N-M)

Otherwise use a PriorityQueue and write yourself an inverting Comparator. Insertion will be O(N(log(N)) and removal of M elements will be O(M(log(N)). Not what you asked for, but maybe close enough.




回答2:


EJP has it right, I tested it - yields 8 and 9 when given an input of 2.

import java.util.stream.IntStream;
public class Test {
    private static final int[] arr = {5, 3, 4, 2, 9, 1, 7, 8, 6};

    public static void main(String[] args) throws Exception { 
        int n = Integer.parseInt(args[0]);
        System.out.println("Finding "+n+" largest numbers in arr");
        IntStream.of(arr).sorted().skip(arr.length-n).boxed().forEach(big -> System.out.println(big));
    }
}



回答3:


If you are already using google guava in your project, you can take advantage of MinMaxPriorityQueue:

Collection<..> min5 = stream.collect(
    toCollection(MinMaxPriorityQueue.maximumSize(5)::create)
);



回答4:


It's possible to create a custom collector using the JDK PriorityQueue to solve your task:

public static <T> Collector<T, ?, List<T>> maxN(Comparator<? super T> comparator, 
                                                int limit) {
    BiConsumer<PriorityQueue<T>, T> accumulator = (queue, t) -> {
        queue.add(t);
        if (queue.size() > limit)
            queue.poll();
    };
    return Collector.of(() -> new PriorityQueue<>(limit + 1, comparator),
            accumulator, (q1, q2) -> {
                for (T t : q2) {
                    accumulator.accept(q1, t);
                }
                return q1;
            }, queue -> new ArrayList<>(queue));
}

Usage:

int[] arr = {5, 3, 4, 2, 9, 1, 7, 8, 6};
System.out.println(IntStream.of(arr).boxed().collect(maxN(Comparator.naturalOrder(), 2)));
// [8, 9]
System.out.println(IntStream.of(arr).boxed().collect(maxN(Comparator.reverseOrder(), 3)));
// [3, 1, 2]

It might be faster for big data sets and small limits as it does not sort. If you want a sorted result, you can add the sorting step to the finisher.




回答5:


You can achieve your complexity goal by creating a histogram of the values:

public static IntStream maxValues(IntStream source, int limit) {
    TreeMap<Integer,Integer> m=new TreeMap<>();
    source.forEachOrdered(new IntConsumer() {
        int size, min=Integer.MIN_VALUE;
        public void accept(int value) {
            if(value<min) return;
            m.merge(value, 1, Integer::sum);
            if(size<limit) size++;
            else m.compute(min=m.firstKey(), (k,count)->count==1? null: count-1);
        }
    });
    if(m.size()==limit)// no duplicates
        return m.keySet().stream().mapToInt(Integer::valueOf);
    return m.entrySet().stream().flatMapToInt(e->{
        int value = e.getKey(), count = e.getValue();
        return count==1? IntStream.of(value): IntStream.range(0, count).map(i->value);
    });
}

It creates a map from int values to their corresponding number of occurrences but limits its contents to the desired number of values, hence, it’s operation has a O(log(M)) complexity (worst case, if no duplicates) and, since the operation is performed once for each value, it’s overall complexity is O(N×log(M)) as you wished.

You may test it with your original array as

int[] arr = {5, 3, 4, 2, 9, 1, 7, 8, 6};
maxValues(Arrays.stream(arr), 3).forEach(System.out::println);

but to test some corner cases, you may use an array containing duplicates like

int[] arr = {8, 5, 3, 4, 2, 2, 9, 1, 7, 9, 8, 6};
// note that the stream of three max elements contains one of the two eights

If you strive for maximum performance, replacing the boxing treemap with an adequate data structure using primitive data types may be feasible but that would be a minor performance optimization as this solution already solved the complexity problem.

By the way, this solution works for arbitrary streams, i.e. doesn’t need to know the value of N.



来源:https://stackoverflow.com/questions/30771314/how-can-i-find-the-largest-m-numbers-from-n-numbers-in-java-8

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!