How to extract plain text from a DOCX file using the new OOXML support in Apache POI 3.5?

后端 未结 2 1883
时光取名叫无心
时光取名叫无心 2021-01-01 21:08

On September 28, 2009 the Apache POI project released version 3.5 which officially supports the OOXML formats introduced in Office 2007, like DOCX and XLSX.

Please p

2条回答
  •  庸人自扰
    2021-01-01 21:58

    This worked for me. Make sure you add the required jars (upgrade xmlbeans, etc.)

    public String extractText(InputStream in) throws Exception {
        XWPFDocument doc = new XWPFDocument(in);
        XWPFWordExtractor ex = new XWPFWordExtractor(doc);
        String text = ex.getText();
        return text;
    }
    

提交回复
热议问题