Pandas seems to ignore first column name when reading tab-delimited data, gives KeyError

前端 未结 4 1515
[愿得一人]
[愿得一人] 2021-01-04 17:37

I am using pandas 0.12.0 in ipython3 on Ubuntu 13.10, in order to wrangle large tab-delimited datasets in txt files. Using read_table to create a DataFrame from the txt app

4条回答
  •  没有蜡笔的小新
    2021-01-04 18:05

    Sounds like you just need to conditionally remove the BOM from the start of your files. You can do this with a wrapper around the file like so:

    def remove_bom(filename):
        fp = open(filename, 'rbU')
        if fp.read(2) != b'\xfe\xff':
            fp.seek(0, 0)
        return fp
    
    # read_table also accepts a file pointer, so we can remove the bom first
    samples = pd.read_table(remove_bom('~/datafile.txt'))
    
    print(samples['RECORDING_SESSION_LABEL'])
    

提交回复
热议问题