I do have few Word templates, and my requirement is to replace some of the words/place holders in the document based on the user input, using Java. I tried lot of libraries
One may use for docx (a zip with XML and other files) a java zip file system and XML or text processing.
URI docxUri = ,,, // "jar:file:/C:/... .docx"
Map<String, String> zipProperties = new HashMap<>();
zipProperties.put("encoding", "UTF-8");
try (FileSystem zipFS = FileSystems.newFileSystem(docxUri, zipProperties)) {
Path documentXmlPath = zipFS.getPath("/word/document.xml");
When using XML:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(Files.newInputStream(documentXmlPath));
//Element root = doc.getDocumentElement();
You can then use XPath to find the places, and write the XML back again.
It even might be that you do not need XML but could replace place holders:
byte[] content = Files.readAllBytes(documentXmlPath);
String xml = new String(content, StandardCharsets.UTF_8);
xml = xml.replace("#DATE#", "2014-09-24");
xml = xml.replace("#NAME#", StringEscapeUtils.escapeXml("Sniper")));
...
content = xml.getBytes(StandardCharsets.UTF_8);
Files.delete(documentXmlPath);
Files.write(documentXmlPath, content);
For a fast development, rename a copy of the .docx to a name with the .zip file extension, and inspect the files.
File.write
should already apply StandardOpenOption.TRUNCATE_EXISTING, but I have added Files.delete
as some error occured. See comments.
Try Apache POI. POI
can work with doc
and docx
, but docx
is more documented therefore support of it better.
UPD: You can use XDocReport, which use POI. Also I recomend to use xlsx
for templates because it more suitable and more documented
I have spent a few days on this issue, until I found that what makes the difference is the try-with-resources
on the FileSystem instance, appearing in Joop Eggen's snippet but not in question snippet:
try (FileSystem zipFS = FileSystems.newFileSystem(docxUri, zipProperties))
Without such try-with-resources
block, the FileSystem
resource will not be closed (as explained in Java tutorial), and the word document not modified.
Stepping back a bit, there are about 4 different approaches for editing words/placeholders:
Before choosing one, you should decide whether you also need to be able to handle:
If you need these, then MERGEFIELD or DOCPROPERTY fields are probably out (though you can also use IF fields, if you can find a library which supports them). And adding images makes DOM/SAX manipulation as advocated in one of the other answers, messier and error prone.
The other things to consider are: