问题
I'm using this answer on how to read only a chunk of CSV file with pandas
.
The suggestion to use pd.read_csv('./input/test.csv' , iterator=True, chunksize=1000)
works excellent but it returns a <class 'pandas.io.parsers.TextFileReader'>
, so I'm converting it to dataframe with pd.concat(pd.read_csv('./input/test.csv' , iterator=True, chunksize=25))
but that takes as much time as reading the file in the first place!
Any suggestions on how to read only a chunk of the file fast?
回答1:
pd.read_csv('./input/test.csv', iterator=True, chunksize=1000)
returns an iterator. You can use the next
function to grab the next one
reader = pd.read_csv('./input/test.csv', iterator=True, chunksize=1000)
next(reader)
This is often used in a for loop for processing one chunk at a time.
for df in pd.read_csv('./input/test.csv', iterator=True, chunksize=1000):
pass
来源:https://stackoverflow.com/questions/50473327/how-to-read-only-a-chunk-of-csv-file-fast