EXcel Sheet POI Validation : Out Of Memory Error

强颜欢笑 提交于 2019-12-03 18:12:49

问题


I am trying to validate an excel file using java before dumping it to database.

Here is my code snippet which causes error.

try {
        fis = new FileInputStream(file);
        wb = new XSSFWorkbook(fis);
        XSSFSheet sh = wb.getSheet("Sheet1");
        for(int i = 0 ; i < 44 ; i++){
            XSSFCell a1 = sh.getRow(1).getCell(i);
            printXSSFCellType(a1);
        }

    } catch (FileNotFoundException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
    catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }

Here is the error which i get

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.ArrayList.<init>(Unknown Source)
    at java.util.ArrayList.<init>(Unknown Source)
    at org.apache.xmlbeans.impl.values.NamespaceContext$NamespaceContextStack.<init>(NamespaceContext.java:78)
    at org.apache.xmlbeans.impl.values.NamespaceContext$NamespaceContextStack.<init>(NamespaceContext.java:75)
    at org.apache.xmlbeans.impl.values.NamespaceContext.getNamespaceContextStack(NamespaceContext.java:98)
    at org.apache.xmlbeans.impl.values.NamespaceContext.push(NamespaceContext.java:106)
    at org.apache.xmlbeans.impl.values.XmlObjectBase.check_dated(XmlObjectBase.java:1273)
    at org.apache.xmlbeans.impl.values.XmlObjectBase.stringValue(XmlObjectBase.java:1484)
    at org.apache.xmlbeans.impl.values.XmlObjectBase.getStringValue(XmlObjectBase.java:1492)
    at org.openxmlformats.schemas.spreadsheetml.x2006.main.impl.CTCellImpl.getR(Unknown Source)
    at org.apache.poi.xssf.usermodel.XSSFCell.<init>(XSSFCell.java:105)
    at org.apache.poi.xssf.usermodel.XSSFRow.<init>(XSSFRow.java:70)
    at org.apache.poi.xssf.usermodel.XSSFSheet.initRows(XSSFSheet.java:179)
    at org.apache.poi.xssf.usermodel.XSSFSheet.read(XSSFSheet.java:143)
    at org.apache.poi.xssf.usermodel.XSSFSheet.onDocumentRead(XSSFSheet.java:130)
    at org.apache.poi.xssf.usermodel.XSSFWorkbook.onDocumentRead(XSSFWorkbook.java:286)
    at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:159)
    at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:207)
    at com.xls.validate.ExcelValidator.main(ExcelValidator.java:79)

This works perfectly fine when xlsx file is less than 1 MB.

I understand this is because my xlsx file is around 5-10 MB and POI tries to load the entire sheet at once in JVM Memory

What can be a possible workaround?

Please help.

Thanks in Advance!


回答1:


There are two options available to you. Option #1 - increase the size of your JVM Heap, so that Java has more memory available to it. Processing Excel files in POI using the UserModel code is DOM based, so the whole file (including parsed form) needs to be buffered into memory. Try a question like this one for advice on how to increase the help.

Option #2, which is more work - switch to event based (SAX) processing. This only processes part of the file at a time, so needs much much less memory. However, it requires more work from you, which is why you might be better throwing a few more GB of memory at the problem - memory is cheap while programmers aren't! The SpreadSheet howto page has instructions on how to do SAX parsing of .xlsx files, and there are various example files provided by POI you can look at for advice.

.

Also, another thing - you seem to be loading a File via a stream, which is bad as it means even more stuff needs buffering into memory. See the POI Documentation for more on this, including instructions on how to work with the File directly.




回答2:


Use Event API (HSSF Only).

The event API is newer than the User API. It is intended for intermediate developers who are willing to learn a little bit of the low level API structures. Its relatively simple to use, but requires a basic understanding of the parts of an Excel file (or willingness to learn). The advantage provided is that you can read an XLS with a relatively small memory footprint.




回答3:


Well, here's a link with some detailed info about your error, and how to fix it: http://javarevisited.blogspot.com/2011/09/javalangoutofmemoryerror-permgen-space.html?m=1.

Well, let me try to explain your error:

The java.lang.OutOfMemoryError has two variants. One in the Java Heap Space, and the other in PermGen Space.

Your error could be caused by a memory leak, a low amount of system RAM, or very little RAM allocated to the Java Virtual Machine.

The difference between the Java Heap Space and PermGen Space variants is that PermGen Space stores pools of Strings and data on the primitive types, such as int, as well as how to read methods and classes, the Java Heap Space works differently. So if you have a lot of strings or classes in your project, and not enough allocated/system RAM, you will get an OutOfMemoryError. The default amount of RAM the JVM allocates to PermGen is 64 MB, which is quite a small bit of memory space. The linked article explains much more about this error and provides detailed information about how to fix this.

Hope this helps!




回答4:


You can use SXSSF workbook from POI for memory related issues. Refer here

I faced the similar issue while reading and merging multiple CSVs into a single XLSX file. I had a total of 3 csv sheets each with 30k rows totalling to 90k.

It got resolved by using SXSFF as below,

    public static void mergeCSVsToXLSX(Long jobExecutionId, Map<String, String> csvSheetNameAndFile, String xlsxFile) {
    try (SXSSFWorkbook wb = new SXSSFWorkbook(100);) { // keep 100 rows in memory, exceeding rows will be flushed to
                                                       // disk
      csvSheetNameAndFile.forEach((sheetName, csv) -> {
        try (CSVReader reader = new CSVReader(new FileReader(csv))) {
          wb.setCompressTempFiles(true);
          SXSSFSheet sheet = wb.createSheet(sheetName);
          sheet.setRandomAccessWindowSize(100);

          String[] nextLine;
          int r = 0;
          while ((nextLine = reader.readNext()) != null) {
            Row row = sheet.createRow((short) r++);
            for (int i = 0; i < nextLine.length; i++) {
              Cell cell = row.createCell(i);
              cell.setCellValue(nextLine[i]);
            }
          }
        } catch (IOException ioException) {
          logger.error("Error in reading CSV file {} for jobId {} with exception {}", csv, jobExecutionId,
              ioException.getMessage());
        }
      });

      FileOutputStream out = new FileOutputStream(xlsxFile);
      wb.write(out);
      wb.dispose();
    } catch (IOException ioException) {
      logger.error("Error in creating workbook for jobId {} with exception {}", jobExecutionId,
          ioException.getMessage());
    }
  }



回答5:


I too faced the same issue of OOM while parsing xlsx file...after two days of struggle, I finally found out the below code that was really perfect;

This code is based on sjxlsx. It reads the xlsx and stores in a HSSF sheet.

           [code=java] 
            // read the xlsx file
       SimpleXLSXWorkbook = new SimpleXLSXWorkbook(new File("C:/test.xlsx"));

        HSSFWorkbook hsfWorkbook = new HSSFWorkbook();

        org.apache.poi.ss.usermodel.Sheet hsfSheet = hsfWorkbook.createSheet();

        Sheet sheetToRead = workbook.getSheet(0, false);

        SheetRowReader reader = sheetToRead.newReader();
        Cell[] row;
        int rowPos = 0;
        while ((row = reader.readRow()) != null) {
            org.apache.poi.ss.usermodel.Row hfsRow = hsfSheet.createRow(rowPos);
            int cellPos = 0;
            for (Cell cell : row) {
                if(cell != null){
                    org.apache.poi.ss.usermodel.Cell hfsCell = hfsRow.createCell(cellPos);
                    hfsCell.setCellType(org.apache.poi.ss.usermodel.Cell.CELL_TYPE_STRING);
                    hfsCell.setCellValue(cell.getValue());
                }
                cellPos++;
            }
            rowPos++;
        }
        return hsfSheet;[/code]


来源:https://stackoverflow.com/questions/18147585/excel-sheet-poi-validation-out-of-memory-error

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!