Default pip installation of Dask gives “ImportError: No module named toolz”

依然范特西╮ 提交于 2019-12-05 01:01:39
TheDudeAbides

In order to use Dask's parallelized dataframes (built on top of pandas), you have to tell pip to install some "extras" (reference), as mentioned in the Dask installation documentation:

pip install "dask[dataframe]"

Or you could just do

pip install "dask[complete]"

to get the whole bag of tricks. NB: The double-quotes may or may not be required in your shell.

The justification for this is (or was) mentioned in the Dask documentation:

We do this so that users of the lightweight core dask scheduler aren’t required to download the more exotic dependencies of the collections (numpy, pandas, etc.)

As mentioned in Obinna's answer, you may wish to do this inside a virtualenv, or use pip install --user to put the libraries in your home directory, if, say, you don't have admin privileges on to the host OS.

Extra details

At Dask 0.13.0 and below, there was a requirement on toolz' identity function within dask/async.py. There is an open a closed pull request associated with GitHub issue #1849 to remove this dependency. In the meantime If, for some reason, you are stuck with an older version of dask, you can work around that particular issue by simply doing pip install toolz.

But this wouldn't (completely) fix your problem with import dask.dataframe as dd anyway. Because you'd still get this error:

>>> import dask.dataframe as dd
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/data/staff_agbio/PhyloWeb/data/dask-test/venv/local/lib/python2.7/site-packages/dask/dataframe/__init__.py", line 3, in <module>
    from .core import (DataFrame, Series, Index, _Frame, map_partitions,
  File "/data/staff_agbio/PhyloWeb/data/dask-test/venv/local/lib/python2.7/site-packages/dask/dataframe/core.py", line 12, in <module>
    import pandas as pd
ImportError: No module named pandas

or if you had pandas installed already, you'd get ImportError: No module named cloudpickle. So, pip install "dask[dataframe]" seems to be the way to go if you're in this situation.

I had this same issue and this was what fixed it for me.

  1. Create a virtual env for your project
  2. Cd your project directory (not required if you're good with directory navigation)
  3. Activate you virtual env
  4. pip install "dask[complete]" : This will install everything. You may wish to install only a given component like dataframe, then use pip install "dask[dataframe]"

The bottomline was that I had to be in my virtual environment; this would install dask for this env only.

requeriments.txt working:

awscli==1.16.69
botocore=1.13.0
boto3==1.9.79
numpy==1.16.2
dask[complete]
conda install dask
conda install dask-core

Solved the problem for me.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!