I just found Apache POI library very useful for editing Word files using Java. Specifically, I want to edit a DOCX file using Apache POI\'s XWPF classes. I
The answer accepted here needs one more update along with Justin Skiles update. r.setText(text, 0); Reason: If not updating setText with pos variable, the output will be the combination of old string and replace string.
I suggest my solution for replacing text between #, for example: This #bookmark# should be replaced. It is replace in:
Also, it takes into account situations, when symbol # and bookmark are in the separated runs (replace variable between different runs).
Here link to the code: https://gist.github.com/aerobium/bf02e443c079c5caec7568e167849dda
There is the replaceParagraph
implementation that replaces ${key}
with value
(the fieldsForReport
parameter) and saves format by merging runs
contents ${key}
.
private void replaceParagraph(XWPFParagraph paragraph, Map<String, String> fieldsForReport) throws POIXMLException {
String find, text, runsText;
List<XWPFRun> runs;
XWPFRun run, nextRun;
for (String key : fieldsForReport.keySet()) {
text = paragraph.getText();
if (!text.contains("${"))
return;
find = "${" + key + "}";
if (!text.contains(find))
continue;
runs = paragraph.getRuns();
for (int i = 0; i < runs.size(); i++) {
run = runs.get(i);
runsText = run.getText(0);
if (runsText.contains("${") || (runsText.contains("$") && runs.get(i + 1).getText(0).substring(0, 1).equals("{"))) {
//As the next run may has a closed tag and an open tag at
//the same time, we have to be sure that our building string
//has a fully completed tags
while (!openTagCountIsEqualCloseTagCount(runsText))) {
nextRun = runs.get(i + 1);
runsText = runsText + nextRun.getText(0);
paragraph.removeRun(i + 1);
}
run.setText(runsText.contains(find) ?
runsText.replace(find, fieldsForReport.get(key)) :
runsText, 0);
}
}
}
}
private boolean openTagCountIsEqualCloseTagCount(String runText) {
int openTagCount = runText.split("\\$\\{", -1).length - 1;
int closeTagCount = runText.split("}", -1).length - 1;
return openTagCount == closeTagCount;
}
Implementation replaceParagraph
Unit test
The method you need is XWPFRun.setText(String). Simply work your way through the file until you find the XWPFRun of interest, work out what you want the new text to be, and replace it. (A run is a sequence of text with the same formatting)
You should be able to do something like:
XWPFDocument doc = new XWPFDocument(OPCPackage.open("input.docx"));
for (XWPFParagraph p : doc.getParagraphs()) {
List<XWPFRun> runs = p.getRuns();
if (runs != null) {
for (XWPFRun r : runs) {
String text = r.getText(0);
if (text != null && text.contains("needle")) {
text = text.replace("needle", "haystack");
r.setText(text, 0);
}
}
}
}
for (XWPFTable tbl : doc.getTables()) {
for (XWPFTableRow row : tbl.getRows()) {
for (XWPFTableCell cell : row.getTableCells()) {
for (XWPFParagraph p : cell.getParagraphs()) {
for (XWPFRun r : p.getRuns()) {
String text = r.getText(0);
if (text != null && text.contains("needle")) {
text = text.replace("needle", "haystack");
r.setText(text,0);
}
}
}
}
}
}
doc.write(new FileOutputStream("output.docx"));
my task was to replace texts of the format ${key} with values of a map within a word docx document. The above solutions were a good starting point but did not take into account all cases: ${key} can be spread not only across multiple runs but also across multiple texts within a run. I therefore ended up with the following code:
private void replace(String inFile, Map<String, String> data, OutputStream out) throws Exception, IOException {
XWPFDocument doc = new XWPFDocument(OPCPackage.open(inFile));
for (XWPFParagraph p : doc.getParagraphs()) {
replace2(p, data);
}
for (XWPFTable tbl : doc.getTables()) {
for (XWPFTableRow row : tbl.getRows()) {
for (XWPFTableCell cell : row.getTableCells()) {
for (XWPFParagraph p : cell.getParagraphs()) {
replace2(p, data);
}
}
}
}
doc.write(out);
}
private void replace2(XWPFParagraph p, Map<String, String> data) {
String pText = p.getText(); // complete paragraph as string
if (pText.contains("${")) { // if paragraph does not include our pattern, ignore
TreeMap<Integer, XWPFRun> posRuns = getPosToRuns(p);
Pattern pat = Pattern.compile("\\$\\{(.+?)\\}");
Matcher m = pat.matcher(pText);
while (m.find()) { // for all patterns in the paragraph
String g = m.group(1); // extract key start and end pos
int s = m.start(1);
int e = m.end(1);
String key = g;
String x = data.get(key);
if (x == null)
x = "";
SortedMap<Integer, XWPFRun> range = posRuns.subMap(s - 2, true, e + 1, true); // get runs which contain the pattern
boolean found1 = false; // found $
boolean found2 = false; // found {
boolean found3 = false; // found }
XWPFRun prevRun = null; // previous run handled in the loop
XWPFRun found2Run = null; // run in which { was found
int found2Pos = -1; // pos of { within above run
for (XWPFRun r : range.values())
{
if (r == prevRun)
continue; // this run has already been handled
if (found3)
break; // done working on current key pattern
prevRun = r;
for (int k = 0;; k++) { // iterate over texts of run r
if (found3)
break;
String txt = null;
try {
txt = r.getText(k); // note: should return null, but throws exception if the text does not exist
} catch (Exception ex) {
}
if (txt == null)
break; // no more texts in the run, exit loop
if (txt.contains("$") && !found1) { // found $, replace it with value from data map
txt = txt.replaceFirst("\\$", x);
found1 = true;
}
if (txt.contains("{") && !found2 && found1) {
found2Run = r; // found { replace it with empty string and remember location
found2Pos = txt.indexOf('{');
txt = txt.replaceFirst("\\{", "");
found2 = true;
}
if (found1 && found2 && !found3) { // find } and set all chars between { and } to blank
if (txt.contains("}"))
{
if (r == found2Run)
{ // complete pattern was within a single run
txt = txt.substring(0, found2Pos)+txt.substring(txt.indexOf('}'));
}
else // pattern spread across multiple runs
txt = txt.substring(txt.indexOf('}'));
}
else if (r == found2Run) // same run as { but no }, remove all text starting at {
txt = txt.substring(0, found2Pos);
else
txt = ""; // run between { and }, set text to blank
}
if (txt.contains("}") && !found3) {
txt = txt.replaceFirst("\\}", "");
found3 = true;
}
r.setText(txt, k);
}
}
}
System.out.println(p.getText());
}
}
private TreeMap<Integer, XWPFRun> getPosToRuns(XWPFParagraph paragraph) {
int pos = 0;
TreeMap<Integer, XWPFRun> map = new TreeMap<Integer, XWPFRun>();
for (XWPFRun run : paragraph.getRuns()) {
String runText = run.text();
if (runText != null && runText.length() > 0) {
for (int i = 0; i < runText.length(); i++) {
map.put(pos + i, run);
}
pos += runText.length();
}
}
return map;
}
As of the date of writing, none of the answers replace properly.
Gagravars answer does not include cases where words to replace are split in runs; Thierry Boduins solution sometimes left words to replace blank when they were after other words to replace, also it does not check tables.
Using Gagtavars answer as base I have also checked the run before current run if the text of both runs contain the word to replace, adding else block. My addition in kotlin:
if (text != null) {
if (text.contains(findText)) {
text = text.replace(findText, replaceText)
r.setText(text, 0)
} else if (i > 0 && p.runs[i - 1].getText(0).plus(text).contains(findText)) {
val pos = p.runs[i - 1].getText(0).indexOf('$')
text = textOfNotFullSecondRun(text, findText)
r.setText(text, 0)
val findTextLengthInFirstRun = findTextPartInFirstRun(p.runs[i - 1].getText(0), findText)
val prevRunText = p.runs[i - 1].getText(0).replaceRange(pos, findTextLengthInFirstRun, replaceText)
p.runs[i - 1].setText(prevRunText, 0)
}
}
private fun textOfNotFullSecondRun(text: String, findText: String): String {
return if (!text.contains(findText)) {
textOfNotFullSecondRun(text, findText.drop(1))
} else {
text.replace(findText, "")
}
}
private fun findTextPartInFirstRun(text: String, findText: String): Int {
return if (text.contains(findText)) {
findText.length
} else {
findTextPartInFirstRun(text, findText.dropLast(1))
}
}
it is the list of runs in a paragraph. Same with the search block in the table. With this solution I did not have any issues yet. All formatting is intact.
Edit: I made a java lib for replacing, check it out: https://github.com/deividasstr/docx-word-replacer