Convert Word to HTML with Apache POI

前端 未结 1 574
孤街浪徒
孤街浪徒 2020-12-01 02:26

I see that there is a converter called WordToHtmlConverter but the process method is not exposed. How should I pass a doc file and get HTML file (or Outp

相关标签:
1条回答
  • 2020-12-01 03:13

    This code is now working for me!

        HWPFDocumentCore wordDocument = WordToHtmlUtils.loadDoc(new FileInputStream("D:\\temp\\seo\\1.doc"));
    
        WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(
                DocumentBuilderFactory.newInstance().newDocumentBuilder()
                        .newDocument());
        wordToHtmlConverter.processDocument(wordDocument);
        Document htmlDocument = wordToHtmlConverter.getDocument();
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        DOMSource domSource = new DOMSource(htmlDocument);
        StreamResult streamResult = new StreamResult(out);
    
        TransformerFactory tf = TransformerFactory.newInstance();
        Transformer serializer = tf.newTransformer();
        serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
        serializer.setOutputProperty(OutputKeys.INDENT, "yes");
        serializer.setOutputProperty(OutputKeys.METHOD, "html");
        serializer.transform(domSource, streamResult);
        out.close();
    
        String result = new String(out.toByteArray());
        System.out.println(result);
    
    0 讨论(0)
提交回复
热议问题