How can I vectorize this python count sort so it is absolutely as fast as it can be?

前端未结

关注

 2  468

I am trying to write a count sort in python to beat the built-in timsort in certain situations. Right now it beats the built in sorted function, but only for very large arra

相关标签:

2条回答

情话喂你

2021-01-03 09:58
Without thinking about your algorithm, this will help get rid of most of your pure python loops (which are quite slow) and turning them into comprehensions or generators (always faster than regular for blocks). Also, if you have to make a list consisting of all the same elements, the [x]*n syntax is probably the fastest way to go. The sum is used to flatten the list of lists.
```
from collections import defaultdict

def countsort(unsorted_list):
    lmin, lmax = min(unsorted_list), max(unsorted_list) + 1
    counts = defaultdict(int)
    for j in unsorted_list:
        counts[j] += 1
    return sum([[num]*counts[num] for num in xrange(lmin, lmax) if num in counts])
```
Note that this is not vectorized, nor does it use numpy.
0 讨论(0)
发布评论:

提交评论
- 加载中...
我寻月下人不归

2021-01-03 10:00
With numpy, this function reduces to the following:
```
def countsort(unsorted):
    unsorted = numpy.asarray(unsorted)
    return numpy.repeat(numpy.arange(1+unsorted.max()), numpy.bincount(unsorted))
```
This ran about 40 times faster when I tried it on 100000 random ints from the interval [0, 10000). bincount does the counting, and repeat converts from counts to a sorted array.
0 讨论(0)
发布评论:

提交评论
- 加载中...