问题
my file is this
4 7 a a
s g 6 8 0 d
g 6 2 1 f 7 9
f g 3
1 2 4 6 8 9 0
I was using pandas to save it in form of pandas object. But I am getting the following error pandas.parser.CParserError: Error tokenizing data. C error: Expected 6 fields in line 3, saw 8
The code I used wasfile = pd.read_csv("a.txt",dtype = None,delimiter = " ")
Can anyone suggest an idea to include the file as such ?
回答1:
Here's one way.
In [50]: !type temp.csv
4,7,a,a
s,g,6,8,0,d
g,6,2,1,f,7,9
f,g,3
1,2,4,6,8,9,0
Read the csv to list of lists and then convert to DataFrame.
In [51]: pd.DataFrame([line.strip().split(',') for line in open('temp.csv', 'r')])
Out[51]:
0 1 2 3 4 5 6
0 4 7 a a None None None
1 s g 6 8 0 d None
2 g 6 2 1 f 7 9
3 f g 3 None None None None
4 1 2 4 6 8 9 0
回答2:
Using pandas this will raise an error because the function expects there to be a certain number of columns, in this case 6, but when it got to the third row it encountered 8. One way to handle this is to not read the the rows that have more columns than the first row of the dataframe. This could be done using error_bad_lines
parameter. This is what the docs say about error_bad_lines
:
error_bad_lines : boolean, default True Lines with too many fields (e.g. a csv line with too many commas) will by default cause an exception to be raised, and no DataFrame will be returned. If False, then these “bad lines” will dropped from the DataFrame that is returned. (Only valid with C parser)
So you could do this:
>>> file = pd.read_csv("a.txt",dtype = None,delimiter = " ",error_bad_lines=False)
Skipping line 3: expected 6 fields, saw 8
Skipping line 5: expected 6 fields, saw 7
>>> file
4 7 a a.1
s g 6 8.0 0.0 d
f g 3 NaN NaN NaN
Or you could use skiprows
parameter to skip the rows that you would like, this is what the docs have to say about skiprows
:
skiprows : list-like or integer, default None Line numbers to skip (0-indexed) or number of lines to skip (int) at the start of the file
来源:https://stackoverflow.com/questions/40880724/pandas-failing-with-variable-columns