I want to read in a very large csv (cannot be opened in excel and edited easily) but somewhere around the 100,000th row, there is a row with one extra column causing the pro
To get information about error causing rows try to use combination of error_bad_lines=False and warn_bad_lines=True:
dataframe = pd.read_csv(filePath, index_col=False, encoding='iso-8859-1', nrows=1000,
warn_bad_lines=True, error_bad_lines=False)
error_bad_lines=False skips error-causing rows and warn_bad_lines=True prints error details and row number, like this:
'Skipping line 3: expected 4 fields, saw 3401\nSkipping line 4: expected 4 fields, saw 30...'
If you want to save the warning message (i.e. for some further processing), then you can save it to a file too (with use of contextlib):
import contextlib
with open(r'D:\Temp\log.txt', 'w') as log:
with contextlib.redirect_stderr(log):
dataframe = pd.read_csv(filePath, index_col=False, encoding='iso-8859-1',
warn_bad_lines=True, error_bad_lines=False)