dask-distributed

Best practices in setting number of dask workers

半城伤御伤魂 提交于 2019-12-04 00:09:41
I am a bit confused by the different terms used in dask and dask.distributed when setting up workers on a cluster. The terms I came across are: thread, process, processor, node, worker, scheduler. My question is how to set the number of each, and if there is a strict or recommend relationship between any of these. For example: 1 worker per node with n processes for the n cores on the node threads and processes are the same concept? In dask-mpi I have to set nthreads but they show up as processes in the client Any other suggestions? By "node" people typically mean a physical or virtual machine.

Workaround for Item assignment not supported in dask

谁都会走 提交于 2019-12-02 10:35:14
I am trying to convert my code base from numpy array to dask because my numpy arrays are exceeding the Memory Error limit. But, I come to know that the feature of mutable arrays are not yet implemented in dask arrays so I am getting NotImplementedError: Item assignment with <class 'tuple'> not supported Is there any workaround for my code below- for i, mask in enumerate(masks): bounds = find_boundaries(mask, mode='inner') X2, Y2 = np.nonzero(bounds) X2 = da.from_array(X2, 'auto') Y2 = da.from_array(Y2, 'auto') xSum = (X2.reshape(-1, 1) - X1.reshape(1, -1)) ** 2 ySum = (Y2.reshape(-1, 1) - Y1