On September 28, 2009 the Apache POI project released version 3.5 which officially supports the OOXML formats introduced in Office 2007, like DOCX and XLSX.
Please p
This is more generic
POITextExtractor poitex = ExtractorFactory.createExtractor(in);
return poitex.getText();
This worked for me. Make sure you add the required jars (upgrade xmlbeans, etc.)
public String extractText(InputStream in) throws Exception {
XWPFDocument doc = new XWPFDocument(in);
XWPFWordExtractor ex = new XWPFWordExtractor(doc);
String text = ex.getText();
return text;
}