How to avoid an empty result with `Bag.take(n)` when using dask?

我们两清 提交于 2019-12-10 18:23:14

问题


Context: Dask documentation states clearly that Bag.take() will only collect from the first partition. However, when using a filter it can occur that the first partition is empty, while others are not.

Question: Is it possible to use Bag.take() so that it collects from a sufficient number of partitions to collect the n items (or the maximum available less than than n).


回答1:


You could do something like the following:

from toolz import take
f = lambda seq: list(take(n, seq))
b.reduction(f, f)

This grabs the first n elements of each partition, collects them all together, and then takes the first n elements of the result.



来源:https://stackoverflow.com/questions/38254247/how-to-avoid-an-empty-result-with-bag-taken-when-using-dask

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!