Not reading all rows while importing csv into pandas dataframe

点点圈 提交于 2020-05-29 02:40:32

问题


I am trying the kaggle challenge here, and unfortunately I am stuck at a very basic step. My limited python knowledge has to be blamed for this. I am trying to read the datasets into a pandas dataframe by executing following command:

test = pd.DataFrame.from_csv("C:/Name/DataMining/hillary/data/output/emails.csv")

The problem is that this file as you would find out has over 300,000 records, but I am reading only 7945, 21.

print (test.shape)
(7945, 21)

Now I have double checked the file and I cannot find anything special about line number 7945. Any pointers why this could be happening. Seems very ordinary situation, I hope some of you who have ran across this error can help me out.


回答1:


I think better is use function read_csv with parameters quoting=csv.QUOTE_NONE and error_bad_lines=False. link

import pandas as pd
import csv

test = pd.read_csv("output/Emails.csv", quoting=csv.QUOTE_NONE, error_bad_lines=False)

print (test.shape)
#(381422, 22)

But some data (problematic) will be skipped.

If you want skip emails body data, you can use:

import pandas as pd
import csv

test = pd.read_csv("output/Emails.csv", quoting=csv.QUOTE_NONE,  sep=',', error_bad_lines=False, header=None,
    names=["Id","DocNumber","MetadataSubject","MetadataTo","MetadataFrom","SenderPersonId","MetadataDateSent","MetadataDateReleased","MetadataPdfLink","MetadataCaseNumber","MetadataDocumentClass","ExtractedSubject","ExtractedTo","ExtractedFrom","ExtractedCc","ExtractedDateSent","ExtractedCaseNumber","ExtractedDocNumber","ExtractedDateReleased","ExtractedReleaseInPartOrFull","ExtractedBodyText","RawText"])

print (test.shape)

#delete row with NaN in column MetadataFrom
test = test.dropna(subset=['MetadataFrom'])
#delete headers in data
test = test[test.MetadataFrom != 'MetadataFrom']


来源:https://stackoverflow.com/questions/33161769/not-reading-all-rows-while-importing-csv-into-pandas-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!