Apache POI Excel workbook creation taking a long time

我们两清 提交于 2019-12-10 16:15:56

问题


I've noticed that the workbook creation statement for xlsx files with Apache POI v3.10 e.g. `

Workbook wb = WorkbookFactory.create(inputStream) 

or

Workbook wb = new XSSFWorkbook(inputStream)

...is taking a long time (~30 seconds) and the file only has 72 rows with 10 columns (365KB).

It's not a problem, but it just seems a bit excessive. I'm wondering if I'm doing anything wrong or not doing something I should be doing. Instantiation of an xls file with the same data (but only 25KB) only takes 1 or 2 seconds. If this is normal, then could someone just let me know.

Edit:

This is the workbook creation code I'm using:

LOG.info("Loading Excel Workbook...");
Workbook workbook;
try {
    workbook = WorkbookFactory.create(dataStream);
} catch (InvalidFormatException e) {
    throw new IOException("Invalid file format ==> " + e.getMessage());
}
LOG.info("Workbook loaded.");

Just to be clear, dataStream is an InputStream. The 30 second delay occurs between the first and second log statements. As I said previously, I've tried replacing the factory with new XSSFWorkbook(dataStream) but the delay remains.

Edit-2:

I ran a standalone test which does nothing except the workbook initialization using 1) a File, and also 2) an InputStream where the source is the xlsx file I've been having trouble with. They both completed in ~2 seconds.

I should have added some background earlier. I'm using the Google App Engine. The input stream that I'm giving to POI is retrieved from a file upload to the server. App Engine doesn't support Servlet 3.0 (for handling file uploads) so I have to use Apache Commons FileUpload lib to retrieve the file data. Ultimately, the data I get is an InputStream retrieved from FileItemStream#openStream(). This is what I supply to POI.

So, I don't know if this is a problem with the App Engine, or if POI doesn't like the flavor of the InputStream that FileItemStream is returning. Incidentally, I cannot try the initialization with a File instead of a InputStream because App Engine doesn't allow writes to the file system.


回答1:


I would do some profiling using one of the available profiling tools, e.g. JVisualVM, Dynatrace, JProfiler, ..

Only then you know for sure where the time is spent in your code, it might be some unexpected place after all and you would be chasing after the wrong horse here.

I.e. you might receive the InputStream from somewhere else and it might be actually a download from some external content via the Internet and the line might be slow and thus all the reading just takes ages. Or it might be something with the disk-setup or memory shortage where lots of GC is running because you are near the limit, ...

One other option would be to extract the smallest possible snippet of code which reproduces this, then you can see what else you need to remove to make it run faster.




回答2:


Use csv format instead. We had the same issue using POI library and we are now downloading the csv file instead of an excel file, which is very light and is downloaded quickly.



来源:https://stackoverflow.com/questions/23330156/apache-poi-excel-workbook-creation-taking-a-long-time

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!