Importing text file : No Columns to parse from file

前端 未结 2 715
我在风中等你
我在风中等你 2020-12-11 01:49

I am trying to take input from sys.stdin. This is a map reducer program for hadoop. Input file is in txt form. Preview of the data set:

196 242 3   88125094         


        
相关标签:
2条回答
  • 2020-12-11 02:15

    You have to set delim_whitespace to True, to use whitespaces as the separator.

    import sys
    import pandas as pd
    
    if __name__ == '__main__':
        df = pd.read_csv(sys.stdin, header=None, delim_whitespace=True)
        print df
    
    0 讨论(0)
  • 2020-12-11 02:26

    Using try and except just lets you continue in spite of errors and handle them. It won't magically fix your errors.

    read_csv expects csv files, which your input is obviously not. A quick look into the documentation:

    delim_whitespace : boolean, default False

    Specifies whether or not whitespace (e.g. ' ' or ' ') will be used as the sep. Equivalent to setting sep='+s'. If this option is set to True, nothing should be passed in for the delimiter parameter.

    This seems like the right argument. Use

    pandas.read_csv(filepath_or_buffer, delim_whitespace=True).
    

    Using delimiter='\t' should also work, unless the tabs are expanded (replaced by spaces). As we can't really tell, delim_whitespace seems to be the better option.

    If this doesn't help, just print out your sys.stdin to check if you properly pass the text.

    Edit: I just saw that you use

    cat /root/lab/u.data | python /root/lab/mid-1-mapper.py |python /root/lab/mid-1-reducer.py
    

    Is this intended, this way mid-1-reducer.py processes the output of mid-1-mapper.py. If you want to process the content of the file u.data consider reading the file and not sys.stdin.

    0 讨论(0)
提交回复
热议问题