I have a large .xlsx file with 1 million rows. I don\'t want to open the whole file in one go. I was wondering if I can read a chunk of the file, process it and then read th
Yes. Pandas supports chunked reading. You would go about reading an excel file like so.
import pandas as pd
xl = pd.ExcelFile("myfile.xlsx")
for sheet_name in xl.sheet_names:
reader = xl.parse(sheet_name, chunksize=1000):
for chunk in reader:
#parse chunk here
UPDATE: 2019-09-05
The chunksize
parameter has been deprecated as it wasn't used by pd.read_excel()
, because of the nature of XLSX file format, which will be read up into memory as a whole during parsing.
There are more details about that in this great SO answer...
OLD answer:
you can use read_excel() method:
chunksize = 10**5
for chunk in pd.read_excel(filename, chunksize=chunksize):
# process `chunk` DF
if your excel file has multiple sheets, take a look at bpachev's solution