How to open a huge excel file efficiently

前端 未结 11 896
佛祖请我去吃肉
佛祖请我去吃肉 2021-01-30 21:29

I have a 150MB one-sheet excel file that takes about 7 minutes to open on a very powerful machine using the following:

# using python
import xlrd
wb = xlrd.open_         


        
11条回答
  •  灰色年华
    2021-01-30 22:07

    I have created an sample Java program which is able to load the file in ~40 seconds my laptop ( Intel i7 4 core, 16 GB RAM).

    https://github.com/skadyan/largefile

    This program uses the Apache POI library to load the .xlsx file using the XSSF SAX API.

    The callback interface com.stackoverlfow.largefile.RecordHandler implementation can be used to process the data loaded from the excel. This interface define only one method which take three arguments

    • sheetname : String, excel sheet name
    • row number: int, row number of data
    • and data map: Map: excel cell reference and excel formatted cell value

    The class com.stackoverlfow.largefile.Main demonstrate one basic implementation of this interface which just print the row number on console.

    Update

    woodstox parser seems have better performance than standard SAXReader. (code updated in repo).

    Also in order to meet the desired performance requirement, you may consider to re-implement the org.apache.poi...XSSFSheetXMLHandler. In the implementation, more optimized string/text value handling can be implemented and unnecessary text formatting operation may be skipped.

提交回复
热议问题