How to extract plain text from a DOCX file using the new OOXML support in Apache POI 3.5?

帅比萌擦擦* 提交于 2019-11-30 11:39:01

This worked for me. Make sure you add the required jars (upgrade xmlbeans, etc.)

public String extractText(InputStream in) throws Exception {
    XWPFDocument doc = new XWPFDocument(in);
    XWPFWordExtractor ex = new XWPFWordExtractor(doc);
    String text = ex.getText();
    return text;
}

This is more generic

POITextExtractor poitex = ExtractorFactory.createExtractor(in);

return poitex.getText();

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!