How to edit MS Word documents using Java?

前端 未结 4 1473
野趣味
野趣味 2020-12-21 20:43

I do have few Word templates, and my requirement is to replace some of the words/place holders in the document based on the user input, using Java. I tried lot of libraries

相关标签:
4条回答
  • One may use for docx (a zip with XML and other files) a java zip file system and XML or text processing.

    URI docxUri = ,,, // "jar:file:/C:/... .docx"
    Map<String, String> zipProperties = new HashMap<>();
    zipProperties.put("encoding", "UTF-8");
    try (FileSystem zipFS = FileSystems.newFileSystem(docxUri, zipProperties)) {
        Path documentXmlPath = zipFS.getPath("/word/document.xml");
    

    When using XML:

        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    
        factory.setNamespaceAware(true);
        DocumentBuilder builder = factory.newDocumentBuilder();
    
        Document doc = builder.parse(Files.newInputStream(documentXmlPath));
        //Element root = doc.getDocumentElement();
    

    You can then use XPath to find the places, and write the XML back again.

    It even might be that you do not need XML but could replace place holders:

        byte[] content = Files.readAllBytes(documentXmlPath);
        String xml = new String(content, StandardCharsets.UTF_8);
        xml = xml.replace("#DATE#", "2014-09-24");
        xml = xml.replace("#NAME#", StringEscapeUtils.escapeXml("Sniper")));
        ...
        content = xml.getBytes(StandardCharsets.UTF_8);
        Files.delete(documentXmlPath);
        Files.write(documentXmlPath, content);
    

    For a fast development, rename a copy of the .docx to a name with the .zip file extension, and inspect the files.

    File.write should already apply StandardOpenOption.TRUNCATE_EXISTING, but I have added Files.delete as some error occured. See comments.

    0 讨论(0)
  • 2020-12-21 21:26

    Try Apache POI. POI can work with doc and docx, but docx is more documented therefore support of it better.

    UPD: You can use XDocReport, which use POI. Also I recomend to use xlsx for templates because it more suitable and more documented

    0 讨论(0)
  • 2020-12-21 21:27

    I have spent a few days on this issue, until I found that what makes the difference is the try-with-resources on the FileSystem instance, appearing in Joop Eggen's snippet but not in question snippet:
    try (FileSystem zipFS = FileSystems.newFileSystem(docxUri, zipProperties))
    Without such try-with-resources block, the FileSystem resource will not be closed (as explained in Java tutorial), and the word document not modified.

    0 讨论(0)
  • 2020-12-21 21:28

    Stepping back a bit, there are about 4 different approaches for editing words/placeholders:

    • MERGEFIELD or DOCPROPERTY fields (if you are having problems with this in docx4j, then you have probably not set up your input docx correctly)
    • content control databinding
    • variable replacement on the document surface (either at the DOM/SAX level, or using a library)
    • do stuff as XHTML, then import that

    Before choosing one, you should decide whether you also need to be able to handle:

    • repeating data (eg adding table rows)
    • conditional content (eg entire paragraphs which will either be present or absent)
    • adding images

    If you need these, then MERGEFIELD or DOCPROPERTY fields are probably out (though you can also use IF fields, if you can find a library which supports them). And adding images makes DOM/SAX manipulation as advocated in one of the other answers, messier and error prone.

    The other things to consider are:

    • your authors: how technical are they? What does that imply for the authoring UI?
    • the "user input" you mention for variable replacement, is this given, or is obtaining it part of the problem you are solving?
    0 讨论(0)
提交回复
热议问题