Reading csv file from hdfs using dask and pyarrow
问题 We are trying out dask_yarn version 0.3.0 (with dask 0.18.2) because of the conflicts between the boost-cpp i'm running with pyarrow version 0.10.0 We are trying to read a csv file from hdfs - however we get an error when running dd.read_csv('hdfs:///path/to/file.csv') since it is trying to use hdfs3. ImportError: Can not find the shared library: libhdfs3.so From the documentation it seems that there is an option to use pyarrow . What is the correct syntax/configuration to do so? 回答1: Try