Reading a portion of a large xlsx file with python

后端 未结 2 1259
野性不改
野性不改 2020-12-06 21:40

I have a large .xlsx file with 1 million rows. I don\'t want to open the whole file in one go. I was wondering if I can read a chunk of the file, process it and then read th

相关标签:
2条回答
  • 2020-12-06 22:21

    Yes. Pandas supports chunked reading. You would go about reading an excel file like so.

    import pandas as pd
    xl = pd.ExcelFile("myfile.xlsx")
    for sheet_name in xl.sheet_names:
      reader = xl.parse(sheet_name, chunksize=1000):
      for chunk in reader:
        #parse chunk here
    
    0 讨论(0)
  • 2020-12-06 22:38

    UPDATE: 2019-09-05

    The chunksize parameter has been deprecated as it wasn't used by pd.read_excel(), because of the nature of XLSX file format, which will be read up into memory as a whole during parsing.

    There are more details about that in this great SO answer...


    OLD answer:

    you can use read_excel() method:

    chunksize = 10**5
    for chunk in pd.read_excel(filename, chunksize=chunksize):
        # process `chunk` DF
    

    if your excel file has multiple sheets, take a look at bpachev's solution

    0 讨论(0)
提交回复
热议问题