why is dot product in dask slower than in numpy

后端 未结 2 1501
萌比男神i
萌比男神i 2021-01-13 21:03

a dot product in dask seems to run much slower than in numpy:

import numpy as np
x_np = np.random.normal(10, 0.1, size=(1000,100))
y_np = x_np.transpose()
%t         


        
相关标签:
2条回答
  • 2021-01-13 21:41

    Adjust chunk sizes

    The answer by @isternberg is correct that you should adjust chunk sizes. A good choice of chunk size follows the following rules

    1. A chunk should be small enough to fit comfortably in memory.
    2. A chunk must be large enough so that computations on that chunk take significantly more than the 1ms overhead per task that dask incurs (so 100ms-1s is a good number to shoot for).
    3. Chunks should align with the computation that you want to do. For example if you plan to frequently slice along a particular dimension then it's more efficient if your chunks are aligned so that you have to touch fewer chunks.

    I generally shoot for chunks that are 1-100 megabytes large. Anything smaller than that isn't helpful and usually creates enough tasks that scheduling overhead becomes our largest bottleneck.

    Comments about the original question

    If your array is only of size (1000, 100) then there is no reason to use dask.array. Instead, use numpy and, if you really care about using mulitple cores, make sure that your numpy library is linked against an efficient BLAS implementation like MLK or OpenBLAS.

    If you use a multi-threaded BLAS implementation you might actually want to turn dask threading off. The two systems will clobber each other and reduce performance. If this is the case then you can turn off dask threading with the following command.

    dask.set_options(get=dask.async.get_sync)
    

    To actually time the execution of a dask.array computation you'll have to add a .compute() call to the end of the computation, otherwise you're just timing how long it takes to create the task graph, not to execute it.

    Larger Example

    In [1]: import dask.array as da
    
    In [2]: x = da.random.normal(10, 0.1, size=(2000, 100000), chunks=(1000, 1000))  # larger example
    
    In [3]: %time z = x.dot(x.T)  # create task graph
    CPU times: user 12 ms, sys: 3.57 ms, total: 15.6 ms
    Wall time: 15.3 ms
    
    In [4]: %time _ = z.compute()  # actually do work
    CPU times: user 2min 41s, sys: 841 ms, total: 2min 42s
    Wall time: 21 s
    
    0 讨论(0)
  • 2021-01-13 22:03

    The calculation of the dot product in dask runs much faster when adujusting the chunks:

    import dask.array as da
    x_dask = da.random.normal(10, 0.1, size=(1000,100), chunks=1000)
    y_dask = x_dask.transpose()
    %timeit x_dask.dot(y_dask)
    # 1000 loops, best of 3: 330 µs per loop
    

    more about chunks in the dask docs.

    edit: As @MRocklin wrote, to really get the computation time, one must call .compute() on the function.

    0 讨论(0)
提交回复
热议问题