Parallel in-place sort for numpy arrays

混江龙づ霸主 提交于 2019-12-04 18:59:01

问题


I often need to sort large numpy arrays (few billion elements), which became a bottleneck of my code. I am looking for a way to parallelize it.

Are there any parallel implementations for the ndarray.sort() function? Numexpr module provides parallel implementation for most math operations on numpy arrays, but lacks sorting capabilities.

Maybe, it is possible to make a simple wrapper around a C++ implementation of parallel sorting, and use it through Cython?


回答1:


I ended up wrapping GCC parallel sort. Here is the code:

parallelSort.pyx

# cython: wraparound = False
# cython: boundscheck = False
import numpy as np
cimport numpy as np
import cython
cimport cython 

ctypedef fused real:
    cython.char
    cython.uchar
    cython.short
    cython.ushort
    cython.int
    cython.uint
    cython.long
    cython.ulong
    cython.longlong
    cython.ulonglong
    cython.float
    cython.double

cdef extern from "<parallel/algorithm>" namespace "__gnu_parallel":
    cdef void sort[T](T first, T last) nogil 

def numpyParallelSort(real[:] a):
    "In-place parallel sort for numpy types"
    sort(&a[0], &a[a.shape[0]])

Extra compiler args: -fopenmp (compile) and -lgomp (linking)

This makefile will do it:

all:
    cython --cplus parallelSort.pyx  
    g++  -g -march=native -Ofast -fpic -c    parallelSort.cpp -o parallelSort.o -fopenmp `python-config --includes`
    g++  -g -march=native -Ofast -shared  -o parallelSort.so parallelSort.o `python-config --libs` -lgomp 

clean:
    rm -f parallelSort.cpp *.o *.so

And this shows that it works:

from parallelSort import numpyParallelSort
import numpy as np 
a = np.random.random(100000000)

numpyParallelSort(a) 
print a[:10]

edit: fixed bug noticed in the comment below




回答2:


Mergesort parallelizes quite naturally. Just have each worker pre-sort an arbitrary chunk, and then run a single merge pass on it. The final merging should require only O(N) operations, and its trivial to write a function for doing so in numba or somesuch.

Wikipedia agrees



来源:https://stackoverflow.com/questions/27551363/parallel-in-place-sort-for-numpy-arrays

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!