问题
I want to read large xls or xlsx file (about more than 30 MB and having 70,000+ rows). I was able to read small excel files using Apache POI eaily until I get an OutOfMemory error.
Performance and memory usage is a concern for me. I read through many posts that if memory footprint is an issue, then for XSSF, you can get at the underlying XML data, and process it yourself using XSSF and SAX (Event API). Well, I found it interesting and now can read entire xlsx file without any issue. It consumed a much less memory (less than 70 MB) compared to almost in GB (goes up to 1GB if I had -Xmx set to 1024m and it still used to hang) when not using Event API.
But now I want to customize the read process and allow only specific rows to be read from an excel. I could easily do this using org.apache.poi.ss.usermodel.Sheet#getRow(int rownum). But using Event API it reads all the rows without any interruption and I find it difficult to read specific rows, e.g. just row numbers 2,3,5, etc. Below is my entire code:
import java.io.InputStream;
import java.util.Iterator;
import java.util.Vector;
import org.apache.poi.xssf.eventusermodel.XSSFReader;
import org.apache.poi.xssf.model.SharedStringsTable;
import org.apache.poi.xssf.usermodel.XSSFRichTextString;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.xml.sax.Attributes;
import org.xml.sax.ContentHandler;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.helpers.XMLReaderFactory;
/**
* XSSF and SAX (Event API)
*/
public class FromHowTo {
public void processAllSheets(String filename) throws Exception {
OPCPackage pkg = OPCPackage.open(filename);
XSSFReader r = new XSSFReader( pkg );
SharedStringsTable sst = r.getSharedStringsTable();
XMLReader parser = fetchSheetParser(sst);
Iterator<InputStream> sheets = r.getSheetsData();
while(sheets.hasNext()) {
InputStream sheet = sheets.next();
InputSource sheetSource = new InputSource(sheet);
parser.parse(sheetSource);
sheet.close();
}
}
public XMLReader fetchSheetParser(SharedStringsTable sst) throws SAXException {
XMLReader parser = XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser");
ContentHandler handler = new SheetHandler(sst);
parser.setContentHandler(handler);
return parser;
}
/**
* See org.xml.sax.helpers.DefaultHandler javadocs
*/
private static class SheetHandler extends DefaultHandler {
private SharedStringsTable sst;
private String lastContents;
private boolean nextIsString;
Vector values = new Vector(10);
private SheetHandler(SharedStringsTable sst) {
this.sst = sst;
}
public void startElement(String uri, String localName, String name, Attributes attributes) throws SAXException {
// c => cell
if(name.equals("c")) {
// Figure out if the value is an index in the SST
String cellType = attributes.getValue("t");
//System.out.println(cellType);
if(cellType != null && cellType.equals("s")) {
nextIsString = true;
} else {
nextIsString = false;
}
}
// Clear contents cache
lastContents = "";
}
public void endElement(String uri, String localName, String name) throws SAXException {
// Process the last contents as required.
// Do now, as characters() may be called more than once
if(nextIsString) {
try {
int idx = Integer.parseInt(lastContents);
lastContents = new XSSFRichTextString(sst.getEntryAt(idx)).toString();
} catch (NumberFormatException e) {
}
}
// v => contents of a cell
// Output after we've seen the string contents
if(name.equals("v")) {
values.add(lastContents);
}
if(name.equals("row")) {
System.out.println(values);
values.removeAllElements();
}
}
public void characters(char[] ch, int start, int length) throws SAXException {
lastContents += new String(ch, start, length);
}
}
public static void main(String[] args) throws Exception {
FromHowTo howto = new FromHowTo();
howto.processAllSheets(args[0]);
}
}
I am using JRE7 with Apache POI 3.7. Can someone please help me getting specific rows with Event API?
回答1:
each row start element has a row number. it can be retrieved from the attributes
long rowIndex = Long.valueOf(attributes.getValue("r"));
The event model will go through to all rows but you can get he index and handle your data accordingly in the endElement
来源:https://stackoverflow.com/questions/10313716/how-to-read-specific-rows-using-apache-poi-event-api