Reading in csv file as dataframe from hdfs

前端 未结 2 455
北荒
北荒 2020-12-16 03:16

I\'m using pydoop to read in a file from hdfs, and when I use:

import pydoop.hdfs as hd
with hd.open(\"/home/file.csv\") as f:
    print f.read()


        
相关标签:
2条回答
  • 2020-12-16 04:10

    Use read instead open, it works

    with hd.read("/home/file.csv") as f:
        df =  pd.read_csv(f)
    
    0 讨论(0)
  • 2020-12-16 04:15

    I know next to nothing about hdfs, but I wonder if the following might work:

    with hd.open("/home/file.csv") as f:
        df =  pd.read_csv(f)
    

    I assume read_csv works with a file handle, or in fact any iterable that will feed it lines. I know the numpy csv readers do.

    pd.read_csv("/home/file.csv") would work if the regular Python file open works - i.e. it reads the file a regular local file.

    with open("/home/file.csv") as f: 
        print f.read()
    

    But evidently hd.open is using some other location or protocol, so the file is not local. If my suggestion doesn't work, then you (or we) need to dig more into the hdfs documentation.

    0 讨论(0)
提交回复
热议问题