dask-distributed

How to prevent dask client from dying on worker exception?

可紊 提交于 2021-01-29 08:12:12
问题 I'm not understanding the resiliency model in dask distributed. Problem Exceptions raised by a workers kills embarrassingly parallel dask operation. All workers and clients die if any worker encounters an exception. Expected Behavior Reading here: http://distributed.dask.org/en/latest/resilience.html#user-code-failures Suggests that exceptions should be contained to workers and that subsequent tasks would go on without interruption. "When a function raises an error that error is kept and

Dask scheduler empty / graph not showing

£可爱£侵袭症+ 提交于 2020-12-15 06:40:00
问题 I have a setup as follows: # etl.py from dask.distributed import Client import dask from tasks import task1, task2, task3 def runall(**kwargs): print("done") def etl(): client = Client() tasks = {} tasks['task1'] = dask.delayed(task)(*args) tasks['task2'] = dask.delayed(task)(*args) tasks['task3'] = dask.delayed(task)(*args) out = dask.delayed(runall)(**tasks) out.compute() This logic was borrowed from luigi and works nicely with if statements to control what tasks to run. However, some of

Dask Dataframe Effecient Row Pair Generator?

走远了吗. 提交于 2020-07-23 06:23:07
问题 What exactly I want to achieve in terms of input output is a cross - join Input Example df = pd.DataFrame(columns = ['A', 'val'], data = [['a1', 23],['a2', 29], ['a3', 39]]) print(df) A val 0 a1 23 1 a2 29 2 a3 39 Output Example: df['key'] = 1 df.merge(df, how = "outer", on ="key") A_x val_x key A_y val_y 0 a1 23 1 a1 23 1 a1 23 1 a2 29 2 a1 23 1 a3 39 3 a2 29 1 a1 23 4 a2 29 1 a2 29 5 a2 29 1 a3 39 6 a3 39 1 a1 23 7 a3 39 1 a2 29 8 a3 39 1 a3 39 How I achieve this for a large dataset with