dask-delayed | 易学教程

Dask scheduler empty / graph not showing

阅读更多关于 Dask scheduler empty / graph not showing

问题 I have a setup as follows: # etl.py from dask.distributed import Client import dask from tasks import task1, task2, task3 def runall(**kwargs): print("done") def etl(): client = Client() tasks = {} tasks['task1'] = dask.delayed(task)(*args) tasks['task2'] = dask.delayed(task)(*args) tasks['task3'] = dask.delayed(task)(*args) out = dask.delayed(runall)(**tasks) out.compute() This logic was borrowed from luigi and works nicely with if statements to control what tasks to run. However, some of

How do the batching instructions of Dask delayed best practices work?

阅读更多关于 How do the batching instructions of Dask delayed best practices work?

问题 I guess I'm missing something (still a Dask Noob) but I'm trying the batching suggestion to avoid too many Dask tasks from here: https://docs.dask.org/en/latest/delayed-best-practices.html and can't make them work. This is what I tried: import dask def f(x): return x*x def batch(seq): sub_results = [] for x in seq: sub_results.append(f(x)) return sub_results batches = [] for i in range(0, 1000000000, 1000000): result_batch = dask.delayed(batch, range(i, i + 1000000)) batches.append(result

Can I use dask.delayed on a function wrapped with ctypes?

阅读更多关于 Can I use dask.delayed on a function wrapped with ctypes?

问题 The goal is to use dask.delayed to parallelize some 'embarrassingly parallel' sections of my code. The code involves calling a python function which wraps a c-function using ctypes . To understand the errors I was getting I wrote a very basic example. The c-function: double zippy_sum(double x, double y) { return x + y; } The python: from dask.distributed import Client client = Client(n_workers = 4) client import os import dask import ctypes current_dir = os.getcwd() #os.path.abspath(os.path

How can I get result of Dask compute on a different machine than the one that submitted it?

阅读更多关于 How can I get result of Dask compute on a different machine than the one that submitted it?

问题 I am using Dask behind a Django server and the basic setup I have is summarised here: https://github.com/MoonVision/django-dask-demo/ where the Dask client can be found here: https://github.com/MoonVision/django-dask-demo/blob/master/demo/daskmanager/daskmanager.py I want to be able to separate the saving of a task from the server that submitted it for robustness and scalability. I also would like more detailed information as to the processing status of the task, right now the future status

Dask For Loop In Parallel

阅读更多关于 Dask For Loop In Parallel

问题 I am trying to find the correct syntax for using a for loop with dask delayed. I have found several tutorials and other questions but none fit my condition, which is extremely basic. First, is this the correct way to run a for-loop in parallel? %%time list_names=['a','b','c','d'] keep_return=[] @delayed def loop_dummy(target): for i in range (1000000000): pass print('passed value is:'+target) return(1) for i in list_names: c=loop_dummy(i) keep_return.append(c) total = delayed(sum)(keep_return

How does dask.delayed handle mutable inputs?

阅读更多关于 How does dask.delayed handle mutable inputs?

问题 If I have an mutable object, let's say for example a dict, how does dask handle passing that as an input to delayed functions? Specifically if I make updates to the dict between delayed calls? I tried the following example which seems to suggest that some copying is going on but can you elaborate what exactly dask is doing? In [3]: from dask import delayed In [4]: x = {} In [5]: foo = delayed(print) In [6]: foo(x) Out[6]: Delayed('print-73930550-94a6-43f9-80ab-072bc88c2b88') In [7]: foo(x)

Using Dask compute causes execution to hang

阅读更多关于 Using Dask compute causes execution to hang

问题 This is a follow up question to a potential answer to one of my previous questions on using Dask computed to access one element in a large array . Why does using Dask compute cause the execution to hang below? Here's the working code snippet: #Suppose you created a scheduler at the ip address of 111.111.11.11:8786 from dask.distributed import Client import dask.array as da # client1 client1 = Client("111.111.11.11:8786") x = da.ones(10000000, chunks=(100000,)) # 1e7 size array cut into 1e5

Can we create a Dask cluster having multiple CPU machines as well as multiple GPU machines both.?

阅读更多关于 Can we create a Dask cluster having multiple CPU machines as well as multiple GPU machines both.?

问题 Can we create a dask-cluster with some CPU and some GPU machines together. If yes, how to control a certain task must run only on CPU machine, or some other type of task should run only on GPU machine, and if not specified, it should pick whichever machine is free.? does dask support such type of cluster.? what is the command that controls the task to run on a specific CPU/GPU machine.? 回答1: You can specify that a Dask worker has certain abstract resources dask-worker scheduler:8786 -

Access a single element in large published array with Dask

阅读更多关于 Access a single element in large published array with Dask

问题 Is there a faster way to only retrieve a single element in a large published array with Dask without retrieving the entire array? In the example below client.get_dataset('array1')[0] takes roughly the same time as client.get_dataset('array1'). import distributed client = distributed.Client() data = [1]*10000000 payload = {'array1': data} client.publish(**payload) one_element = client.get_dataset('array1')[0] 回答1: Note that anything you publish goes to the scheduler, not to the workers, so

Creating a dask bag from a generator

阅读更多关于 Creating a dask bag from a generator

问题 I would like to create a dask.Bag (or dask.Array ) from a list of generators. The gotcha is that the generators (when evaluated) are too large for memory. delayed_array = [delayed(generator) for generator in list_of_generators] my_bag = db.from_delayed(delayed_array) NB list_of_generators is exactly that - the generators haven't been consumed (yet). My problem is that when creating delayed_array the generators are consumed and RAM is exhausted. Is there a way to get these long lists into the