问题
I have a database-like object containing many dask dataframes. I would like to work with the data, save it and reload it on the next day to continue the analysis.
Therefore, I tried saving dask dataframes (not computation results, just the "plan of computation" itself) using pickle. Apparently, it works (at least, if I unpickle the objects on the exact same machine) ... but are there some pitfalls?
回答1:
Generally speaking it is usually safe. However there are a few caveats:
- If your dask.dataframe contains custom functions, such as with with
df.apply(lambda x: x)then the internal function will not be pickleable. However it will still be serializable with cloudpickle - If your dask.dataframe contains references to files that are only valid on your local computer then, while it will still be serializable the re-serialized version on another machine may no longer be useful
- If your dask.dataframe contains
dask.distributedFutureobjects, such as would occur if you useExecutor.persiston a cluster then these are not currently serializable. - I recommend using a version >= 0.11.0.
来源:https://stackoverflow.com/questions/39147120/dask-is-it-safe-to-pickle-a-dataframe-for-later-use