Pandas seems to ignore first column name when reading tab-delimited data, gives KeyError

前端未结

关注

 4  1520

[愿得一人] 2021-01-04 17:37

I am using pandas 0.12.0 in ipython3 on Ubuntu 13.10, in order to wrangle large tab-delimited datasets in txt files. Using read_table to create a DataFrame from the txt app

4条回答

野趣味 (楼主)

2021-01-04 18:22

This seems to be (related to) a known issue, see GH #4793. Using 'utf-8-sig' as the encoding seems to work. Without it, we have:

>>> df = pd.read_table("datafile.txt")
>>> df.columns
Index([u'RECORDING_SESSION_LABEL', u'LEFT_GAZE_X', u'LEFT_GAZE_Y', u'RIGHT_GAZE_X', u'RIGHT_GAZE_Y', u'VIDEO_FRAME_INDEX', u'VIDEO_NAME'], dtype='object')
>>> df.columns[0]
'\xef\xbb\xbfRECORDING_SESSION_LABEL'

but with it, we have

>>> df = pd.read_table("datafile.txt", encoding="utf-8-sig")
>>> df.columns
Index([u'RECORDING_SESSION_LABEL', u'LEFT_GAZE_X', u'LEFT_GAZE_Y', u'RIGHT_GAZE_X', u'RIGHT_GAZE_Y', u'VIDEO_FRAME_INDEX', u'VIDEO_NAME'], dtype='object')
>>> df.columns[0]
u'RECORDING_SESSION_LABEL'
>>> df["RECORDING_SESSION_LABEL"].max()
u'73_1'

(Used Python 2 for the above, but the same happens with Python 3.)

0 讨论(0)

查看其它4个回答