using WordToHtmlConverter converter in Apache POI

北战南征 提交于 2019-12-11 15:25:43

问题


I am trying to use WordToHtmlConverter class to convert a word document in HTML, but the documentation is not clear.

The WordToHtmlConverter has a constructor taking org.w3c.dom.Document, but I don't think it is the word document.

Does anyone have a sample program on how to load a word document and convert it into html.


回答1:


You best bet for now is probably to look at the unit tests, eg TestWordToHtmlConverter. That will show you how to do it

In general though, you pass in the xml document to be populated, have WordToHtmlConverter generate the HTML into it from the Word document, then transform the xml document into appropriate output (indenting, new lines etc)

Your code would want to look something like:

    Document newDocument = DocumentBuilderFactory.newInstance()
            .newDocumentBuilder().newDocument();
    WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(
            newDocument );

    wordToHtmlConverter.processDocument( hwpfDocument );

    StringWriter stringWriter = new StringWriter();
    Transformer transformer = TransformerFactory.newInstance()
            .newTransformer();
    transformer.setOutputProperty( OutputKeys.INDENT, "yes" );
    transformer.setOutputProperty( OutputKeys.ENCODING, "utf-8" );
    transformer.setOutputProperty( OutputKeys.METHOD, "html" );
    transformer.transform(
            new DOMSource( wordToHtmlConverter.getDocument() ),
            new StreamResult( stringWriter ) );

    String html = stringWriter.toString();


来源:https://stackoverflow.com/questions/8242407/using-wordtohtmlconverter-converter-in-apache-poi

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!