How to read specific rows using Apache POI Event API?

爱⌒轻易说出口 提交于 2019-12-11 04:55:02

问题


I want to read large xls or xlsx file (about more than 30 MB and having 70,000+ rows). I was able to read small excel files using Apache POI eaily until I get an OutOfMemory error.

Performance and memory usage is a concern for me. I read through many posts that if memory footprint is an issue, then for XSSF, you can get at the underlying XML data, and process it yourself using XSSF and SAX (Event API). Well, I found it interesting and now can read entire xlsx file without any issue. It consumed a much less memory (less than 70 MB) compared to almost in GB (goes up to 1GB if I had -Xmx set to 1024m and it still used to hang) when not using Event API.

But now I want to customize the read process and allow only specific rows to be read from an excel. I could easily do this using org.apache.poi.ss.usermodel.Sheet#getRow(int rownum). But using Event API it reads all the rows without any interruption and I find it difficult to read specific rows, e.g. just row numbers 2,3,5, etc. Below is my entire code:

import java.io.InputStream;
import java.util.Iterator;
import java.util.Vector;
import org.apache.poi.xssf.eventusermodel.XSSFReader;
import org.apache.poi.xssf.model.SharedStringsTable;
import org.apache.poi.xssf.usermodel.XSSFRichTextString;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.xml.sax.Attributes;
import org.xml.sax.ContentHandler;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.helpers.XMLReaderFactory;

/**
 * XSSF and SAX (Event API)
 */
public class FromHowTo {
    public void processAllSheets(String filename) throws Exception {
        OPCPackage pkg = OPCPackage.open(filename);
        XSSFReader r = new XSSFReader( pkg );
        SharedStringsTable sst = r.getSharedStringsTable();

        XMLReader parser = fetchSheetParser(sst);

        Iterator<InputStream> sheets = r.getSheetsData();
        while(sheets.hasNext()) {
            InputStream sheet = sheets.next();
            InputSource sheetSource = new InputSource(sheet);
            parser.parse(sheetSource);
            sheet.close();
        }
    }

    public XMLReader fetchSheetParser(SharedStringsTable sst) throws SAXException {
        XMLReader parser = XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser");
        ContentHandler handler = new SheetHandler(sst);
        parser.setContentHandler(handler);
        return parser;
    }

    /** 
     * See org.xml.sax.helpers.DefaultHandler javadocs 
     */
    private static class SheetHandler extends DefaultHandler {
        private SharedStringsTable sst;
        private String lastContents;
        private boolean nextIsString;
        Vector values = new Vector(10);

        private SheetHandler(SharedStringsTable sst) {
            this.sst = sst;
        }

        public void startElement(String uri, String localName, String name, Attributes attributes) throws SAXException {
            // c => cell

            if(name.equals("c")) {
                // Figure out if the value is an index in the SST
                String cellType = attributes.getValue("t");
                //System.out.println(cellType);
                if(cellType != null && cellType.equals("s")) {
                    nextIsString = true;
                } else {
                    nextIsString = false;
                }
            }
            // Clear contents cache
            lastContents = "";
        }

        public void endElement(String uri, String localName, String name) throws SAXException {
            // Process the last contents as required.
            // Do now, as characters() may be called more than once
            if(nextIsString) {
                try {
                    int idx = Integer.parseInt(lastContents);
                    lastContents = new XSSFRichTextString(sst.getEntryAt(idx)).toString();
                } catch (NumberFormatException e) {
                }
            }

            // v => contents of a cell
            // Output after we've seen the string contents
            if(name.equals("v")) {
                values.add(lastContents);
            }

            if(name.equals("row")) {
                System.out.println(values);
                values.removeAllElements();
            }
        }

        public void characters(char[] ch, int start, int length) throws SAXException {
            lastContents += new String(ch, start, length);
        }
    }

    public static void main(String[] args) throws Exception {
        FromHowTo howto = new FromHowTo();
        howto.processAllSheets(args[0]);
    }
}

I am using JRE7 with Apache POI 3.7. Can someone please help me getting specific rows with Event API?


回答1:


each row start element has a row number. it can be retrieved from the attributes

long rowIndex = Long.valueOf(attributes.getValue("r"));

The event model will go through to all rows but you can get he index and handle your data accordingly in the endElement



来源:https://stackoverflow.com/questions/10313716/how-to-read-specific-rows-using-apache-poi-event-api

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!