Finding Nth item of unsorted list without sorting the list

前端未结

关注

 9  2053

Hey. I have a very large array and I want to find the Nth largest value. Trivially I can sort the array and then take the Nth element but I\'m only interested in one element

相关标签:

9条回答

情深已故

2020-12-09 04:09

You could try the Median of Medians method - it's speed is O(N).

0 讨论(0)
发布评论:

提交评论
- 加载中...
离开以前

2020-12-09 04:11

A simple modified quicksort works very well in practice. It has average running time proportional to N (though worst case bad luck running time is O(N^2)).

Proceed like a quicksort. Pick a pivot value randomly, then stream through your values and see if they are above or below that pivot value and put them into two bins based on that comparison. In quicksort you'd then recursively sort each of those two bins. But for the N-th highest value computation, you only need to sort ONE of the bins.. the population of each bin tells you which bin holds your n-th highest value. So for example if you want the 125th highest value, and you sort into two bins which have 75 in the "high" bin and 150 in the "low" bin, you can ignore the high bin and just proceed to finding the 125-75=50th highest value in the low bin alone.

0 讨论(0)
发布评论:

提交评论
- 加载中...
广开言路

2020-12-09 04:16
A heap is the best data structure for this operation and Python has an excellent built-in library to do just this, called heapq.
```
import heapq

def nth_largest(n, iter):
    return heapq.nlargest(n, iter)[-1]
```
Example Usage:
```
>>> import random
>>> iter = [random.randint(0,1000) for i in range(100)]
>>> n = 10
>>> nth_largest(n, iter)
920
```
Confirm result by sorting:
```
>>> list(sorted(iter))[-10]
920
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
一生所求

2020-12-09 04:21

One thing you should do if this is in production code is test with samples of your data. For example, you might consider 1000 or 10000 elements 'large' arrays, and code up a quickselect method from a recipe.

The compiled nature of sorted, and its somewhat hidden and constantly evolving optimizations, make it faster than a python written quickselect method on small to medium sized datasets (< 1,000,000 elements). Also, you might find as you increase the size of the array beyond that amount, memory is more efficiently handled in native code, and the benefit continues.

So, even if quickselect is O(n) vs sorted's O(nlogn), that doesn't take into account how many actual machine code instructions processing each n elements will take, any impacts on pipelining, uses of processor caches and other things the creators and maintainers of sorted will bake into the python code.

0 讨论(0)
发布评论:

提交评论
- 加载中...
余生分开走

2020-12-09 04:28

You can iterate the entire sequence maintaining a list of the 5 largest values you find (this will be O(n)). That being said I think it would just be simpler to sort the list.

0 讨论(0)
发布评论:

提交评论
- 加载中...
天涯浪人

2020-12-09 04:31

Use heapsort. It only partially orders the list until you draw the elements out.

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页