Get excel sheet names for large excel files using Apache POI

丶灬走出姿态 提交于 2019-12-17 21:32:25

问题


I have the following code that I use to get the sheet names of an excel file(.xlsx)

    XSSFWorkbook workBookXlsx = new XSSFWorkbook(new FileInputStream(pathToFile));
    ArrayList<String> sheetNames = new ArrayList<>();

    int numberOfSheets = workBookXlsx.getNumberOfSheets();
    for (int i = 0; i < numberOfSheets; i++) {
        sheetNames.add(workBookXlsx.getSheetAt(i).getSheetName());
    }

    workBookXlsx = null;

The issue I have with the above code is that it takes a lot of memory(~700MB) & a long time(5-6s) to create the XSSFWorkbook for a file of size 9MB. Even setting the workBookXlsx to null doesn't release the memory taken by the javaw(I know gc may or maynot be called & JVM wont release memory just because I have set a variable to null)

I did go through the documentation of Workbook, XSSFWorkbook & from what I understood, there is no method that will help me get the sheet names with low memory imprint.

The one solution I have found is to manually unzip the .xlsx file and read the contents of the .\xl\woorkbook.xml to get the sheet names and the r:id

Is there an API for getting the sheet names in an .xlsx file without large memory imprint?


回答1:


To show what @Gagravarr probably meant with his comment:

The XSSFReader contains a method XSSFReader.getSheetsData which "Returns an Iterator which will let you get at all the different Sheets in turn. Each sheet's InputStream is only opened when fetched from the Iterator. It's up to you to close the InputStreams when done with each one.". But as often this is not the whole truth. In truth it returns a XSSFReader.SheetIterator which has a method XSSFReader.SheetIterator.getSheetName to get the sheet names.

Example:

import java.io.InputStream;
import java.io.FileInputStream;

import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.xssf.eventusermodel.XSSFReader;

import java.util.Iterator;

public class ExcelXSSFGetSheetNamesXSSFReader {

 public static void main(String[] args) throws Exception {

  OPCPackage pkg = OPCPackage.open(new FileInputStream("Example.xlsx"));
  XSSFReader r = new XSSFReader( pkg );
  Iterator<InputStream> sheets = r.getSheetsData();

  if (sheets instanceof XSSFReader.SheetIterator) {
   XSSFReader.SheetIterator sheetiterator = (XSSFReader.SheetIterator)sheets;

   while (sheetiterator.hasNext()) {
    InputStream dummy = sheetiterator.next();

    System.out.println(sheetiterator.getSheetName());

    dummy.close();
   }
  }

  pkg.close();
 }
}

Conclusion: Currently you cannot work with apache poi only by trusting the API documentation. Instead you must always have a look at the source code.



来源:https://stackoverflow.com/questions/47453102/get-excel-sheet-names-for-large-excel-files-using-apache-poi

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!