Strategy for partitioning dask dataframes efficiently

后端 未结 3 2059
臣服心动
臣服心动 2020-12-28 16:32

The documentation for Dask talks about repartioning to reduce overhead here.

They however seem to indicate you need some knowledge of what your dataframe will look l

3条回答
  •  忘掉有多难
    2020-12-28 16:57

    As of Dask 2.0.0 you may call .repartition(partition_size="100MB").

    This method performs an object-considerate (.memory_usage(deep=True)) breakdown of partition size. It will join smaller partitions, or split partitions that have grown too large.

    Dask's Documentation also outlines the usage.

提交回复
热议问题