How to extract plain text from a DOCX file using the new OOXML support in Apache POI 3.5?

后端 未结 2 1875
时光取名叫无心
时光取名叫无心 2021-01-01 21:08

On September 28, 2009 the Apache POI project released version 3.5 which officially supports the OOXML formats introduced in Office 2007, like DOCX and XLSX.

Please p

相关标签:
2条回答
  • 2021-01-01 21:45

    This is more generic

    POITextExtractor poitex = ExtractorFactory.createExtractor(in);

    return poitex.getText();

    0 讨论(0)
  • 2021-01-01 21:58

    This worked for me. Make sure you add the required jars (upgrade xmlbeans, etc.)

    public String extractText(InputStream in) throws Exception {
        XWPFDocument doc = new XWPFDocument(in);
        XWPFWordExtractor ex = new XWPFWordExtractor(doc);
        String text = ex.getText();
        return text;
    }
    
    0 讨论(0)
提交回复
热议问题