multiplication of large arrays in python

血红的双手。 提交于 2019-12-08 15:54:30

Performance for .dot strongly depends on the BLAS library to which your NumPy implementation is linked.

If you have a modern implementation like OpenBLAS or MKL then NumPy is already running at full speed using all of your cores. In this case dask.array will likely only get in the way, trying to add further parallelism when none is warranted, causing thread contention.

If you have installed NumPy through Anaconda then you likely already have OpenBLAS or MKL, so I would just be happy with the performance that you have and call it a day.

However, in your actual example you're using chunks that are far too small (chunks=(100,)). The dask task scheduler incurs about a millisecond of overhead per task. You should choose a chunksize so that each task takes somewhere in the 100s of milliseconds in order to hide this overhead. Generally a good rule of thumb is to aim for chunks that are above a megabyte in size. This is what is causing the large difference in performance that you're seeing.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!