How to use Tika's XWPFWordExtractorDecorator class?

◇◆丶佛笑我妖孽 提交于 2019-12-09 23:37:01

问题


Someone told me that Tika's XWPFWordExtractorDecorator class is used to convert docx into html. But I am not sure how to use this class to get the HTML from docx. Any other library for doing the same job is also appreciated/


回答1:


You shouldn't use it directly

Instead, call Tika in the usual way, and it'll call the appropriate code for you

If you want XHTML from parsing a file, the code looks something like

    // Either of these will work, the latter is recommended
    //InputStream input = new FileInputStream("test.docx");
    InputStream input = TikaInputStream.get(new File("test.docx"));

    // AutoDetect is normally best, unless you know the best parser for the type
    Parser parser = new AutoDetectParser();

    // Handler for indented XHTML
    StringWriter sw = new StringWriter();
    SAXTransformerFactory factory = (SAXTransformerFactory)
             SAXTransformerFactory.newInstance();
    TransformerHandler handler = factory.newTransformerHandler();
    handler.getTransformer().setOutputProperty(OutputKeys.METHOD, "xml");
    handler.getTransformer().setOutputProperty(OutputKeys.INDENT, "yes");
    handler.setResult(new StreamResult(sw));

    // Call the Tika Parser
    try {
        Metadata metadata = new Metadata();
        parser.parse(input, handler, metadata, new ParseContext());
        String xml = sw.toString();
    } finally {
        input.close();
    }


来源:https://stackoverflow.com/questions/9051183/how-to-use-tikas-xwpfwordextractordecorator-class

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!