pandas failing with variable columns

拈花ヽ惹草 提交于 2019-12-21 22:04:20

问题


my file is this

    4 7 a a
    s g 6 8 0 d
    g 6 2 1 f 7 9 
    f g 3 
    1 2 4 6 8 9 0

I was using pandas to save it in form of pandas object. But I am getting the following error
pandas.parser.CParserError: Error tokenizing data. C error: Expected 6 fields in line 3, saw 8

The code I used was
file = pd.read_csv("a.txt",dtype = None,delimiter = " ")

Can anyone suggest an idea to include the file as such ?


回答1:


Here's one way.

In [50]: !type temp.csv
4,7,a,a
s,g,6,8,0,d
g,6,2,1,f,7,9
f,g,3
1,2,4,6,8,9,0

Read the csv to list of lists and then convert to DataFrame.

In [51]: pd.DataFrame([line.strip().split(',') for line in open('temp.csv', 'r')])
Out[51]:
   0  1  2     3     4     5     6
0  4  7  a     a  None  None  None
1  s  g  6     8     0     d  None
2  g  6  2     1     f     7     9
3  f  g  3  None  None  None  None
4  1  2  4     6     8     9     0



回答2:


Using pandas this will raise an error because the function expects there to be a certain number of columns, in this case 6, but when it got to the third row it encountered 8. One way to handle this is to not read the the rows that have more columns than the first row of the dataframe. This could be done using error_bad_lines parameter. This is what the docs say about error_bad_lines:

error_bad_lines : boolean, default True Lines with too many fields (e.g. a csv line with too many commas) will by default cause an exception to be raised, and no DataFrame will be returned. If False, then these “bad lines” will dropped from the DataFrame that is returned. (Only valid with C parser)

So you could do this:

>>> file = pd.read_csv("a.txt",dtype = None,delimiter = " ",error_bad_lines=False)
Skipping line 3: expected 6 fields, saw 8
Skipping line 5: expected 6 fields, saw 7

>>> file
     4    7    a  a.1
s g  6  8.0  0.0    d
f g  3  NaN  NaN  NaN

Or you could use skiprows parameter to skip the rows that you would like, this is what the docs have to say about skiprows:

skiprows : list-like or integer, default None Line numbers to skip (0-indexed) or number of lines to skip (int) at the start of the file



来源:https://stackoverflow.com/questions/40880724/pandas-failing-with-variable-columns

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!