发表新帖

发表新帖

Reading in csv file as dataframe from hdfs

前端未结

关注

 2  462

I\'m using pydoop to read in a file from hdfs, and when I use:

import pydoop.hdfs as hd
with hd.open(\"/home/file.csv\") as f:
    print f.read()

相关标签:

2条回答

星月不相逢

2020-12-16 04:10
Use read instead open, it works
```
with hd.read("/home/file.csv") as f:
    df =  pd.read_csv(f)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
粉色の甜心

2020-12-16 04:15
I know next to nothing about hdfs, but I wonder if the following might work:
```
with hd.open("/home/file.csv") as f:
    df =  pd.read_csv(f)
```
I assume read_csv works with a file handle, or in fact any iterable that will feed it lines. I know the numpy csv readers do.

pd.read_csv("/home/file.csv") would work if the regular Python file open works - i.e. it reads the file a regular local file.
```
with open("/home/file.csv") as f: 
    print f.read()
```
But evidently hd.open is using some other location or protocol, so the file is not local. If my suggestion doesn't work, then you (or we) need to dig more into the hdfs documentation.
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题