Slow XSSFWorkbook and WorkbookFactory when reading xlsx files

若如初见. 提交于 2019-12-08 07:04:17

问题


I've seen developers have had this problem since a few years ago. I have studied many forums and the official POI documents. Nonetheless I haven't found an answer yet. So the problem is.. I have tried the following two snippets:

Workbook wb = WorkbookFactory.create(new File("spreadsheet.xlsx"));

and

File file = new File("C:\\spreadsheet.xlsx");
OPCPackage opcPackage = OPCPackage.open(file.getAbsolutePath());
XSSFWorkbook workbook = new XSSFWorkbook(opcPackage);

and either of the approaches takes about 5-6min (if the application doesn't run out of memory) to process a simple and fairly small spreadsheet.xlsx file (200KB).

What do I need to do to fix this? (I'm using Apache POI 3.9)

/*****************************/

The process takes a long time in the following location:

public class XSSFSheet extends POIXMLDocumentPart implements Sheet{
...
protected void read(InputStream is) throws IOException {
    try {
      -->>> worksheet = WorksheetDocument.Factory.parse(is).getWorksheet();
    } catch (XmlException e){
        throw new POIXMLException(e);
    }
}
...

I can't debug further. The VisualVM also says the same thing..!


回答1:


One factor that might be contributing to the load time is that the data has been pasted into the worksheet so that the used range includes every row, ie when you use the sheet.usedrange rows count it returns > 1,000,000 rows.. Not sure how this happens but I found that I needed to perform an intermediary step wherein prior to loading the workbook I 'cleaned' it by using some vba script. The workbook has around 20 sheets of around 5000 rows each, each of which are filled out by different parts o the business, and it takes a fairly long time (maybe 4 minutes) to load but that is acceptable in this case. Before I added the cleaning stage it ran for over 30 minutes, which was not acceptable....

A user runs the process I am referring to, bu pressing two buttons. The first cleans, the second does the rest. The first process is triggered using Runtime.getruntime.exec and creates an empty text file that the second process will not run unless the test file is there.



来源:https://stackoverflow.com/questions/22051802/slow-xssfworkbook-and-workbookfactory-when-reading-xlsx-files

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!