I\'m using pydoop to read in a file from hdfs, and when I use:
import pydoop.hdfs as hd
with hd.open(\"/home/file.csv\") as f:
print f.read()
Use read
instead open
, it works
with hd.read("/home/file.csv") as f:
df = pd.read_csv(f)
I know next to nothing about hdfs
, but I wonder if the following might work:
with hd.open("/home/file.csv") as f:
df = pd.read_csv(f)
I assume read_csv
works with a file handle, or in fact any iterable that will feed it lines. I know the numpy
csv readers do.
pd.read_csv("/home/file.csv")
would work if the regular Python file open
works - i.e. it reads the file a regular local file.
with open("/home/file.csv") as f:
print f.read()
But evidently hd.open
is using some other location or protocol, so the file is not local. If my suggestion doesn't work, then you (or we) need to dig more into the hdfs
documentation.