Replacing a text in Apache POI XWPF

前端未结

关注

 10  2200

I just found Apache POI library very useful for editing Word files using Java. Specifically, I want to edit a DOCX file using Apache POI\'s XWPF classes. I

相关标签:

10条回答

刺人心

2020-11-29 18:33

The answer accepted here needs one more update along with Justin Skiles update. r.setText(text, 0); Reason: If not updating setText with pos variable, the output will be the combination of old string and replace string.

0 讨论(0)
发布评论:

提交评论
- 加载中...
陌清茗

2020-11-29 18:36
I suggest my solution for replacing text between #, for example: This #bookmark# should be replaced. It is replace in:
- paragraphs;
- tables;
- footers.
Also, it takes into account situations, when symbol # and bookmark are in the separated runs (replace variable between different runs).

Here link to the code: https://gist.github.com/aerobium/bf02e443c079c5caec7568e167849dda
0 讨论(0)
发布评论:

提交评论
- 加载中...

挽巷

2020-11-29 18:38

There is the replaceParagraph implementation that replaces ${key} with value (the fieldsForReport parameter) and saves format by merging runs contents ${key}.

private void replaceParagraph(XWPFParagraph paragraph, Map<String, String> fieldsForReport) throws POIXMLException {
    String find, text, runsText;
    List<XWPFRun> runs;
    XWPFRun run, nextRun;
    for (String key : fieldsForReport.keySet()) {
        text = paragraph.getText();
        if (!text.contains("${"))
            return;
        find = "${" + key + "}";
        if (!text.contains(find))
            continue;
        runs = paragraph.getRuns();
        for (int i = 0; i < runs.size(); i++) {
            run = runs.get(i);
            runsText = run.getText(0);
            if (runsText.contains("${") || (runsText.contains("$") && runs.get(i + 1).getText(0).substring(0, 1).equals("{"))) {
                //As the next run may has a closed tag and an open tag at 
                //the same time, we have to be sure that our building string 
                //has a fully completed tags 
                while (!openTagCountIsEqualCloseTagCount(runsText))) {
                    nextRun = runs.get(i + 1);
                    runsText = runsText + nextRun.getText(0);
                    paragraph.removeRun(i + 1);
                }
                run.setText(runsText.contains(find) ?
                        runsText.replace(find, fieldsForReport.get(key)) :
                        runsText, 0);
            }
        }
    }
}

private boolean openTagCountIsEqualCloseTagCount(String runText) {
    int openTagCount = runText.split("\\$\\{", -1).length - 1;
    int closeTagCount = runText.split("}", -1).length - 1;
    return openTagCount == closeTagCount;
}

Implementation replaceParagraph

Unit test

0 讨论(0)

名媛妹妹

2020-11-29 18:43

The method you need is XWPFRun.setText(String). Simply work your way through the file until you find the XWPFRun of interest, work out what you want the new text to be, and replace it. (A run is a sequence of text with the same formatting)

You should be able to do something like:

XWPFDocument doc = new XWPFDocument(OPCPackage.open("input.docx"));
for (XWPFParagraph p : doc.getParagraphs()) {
    List<XWPFRun> runs = p.getRuns();
    if (runs != null) {
        for (XWPFRun r : runs) {
            String text = r.getText(0);
            if (text != null && text.contains("needle")) {
                text = text.replace("needle", "haystack");
                r.setText(text, 0);
            }
        }
    }
}
for (XWPFTable tbl : doc.getTables()) {
   for (XWPFTableRow row : tbl.getRows()) {
      for (XWPFTableCell cell : row.getTableCells()) {
         for (XWPFParagraph p : cell.getParagraphs()) {
            for (XWPFRun r : p.getRuns()) {
              String text = r.getText(0);
              if (text != null && text.contains("needle")) {
                text = text.replace("needle", "haystack");
                r.setText(text,0);
              }
            }
         }
      }
   }
}
doc.write(new FileOutputStream("output.docx"));

0 讨论(0)

无人及你

2020-11-29 18:44

my task was to replace texts of the format ${key} with values of a map within a word docx document. The above solutions were a good starting point but did not take into account all cases: ${key} can be spread not only across multiple runs but also across multiple texts within a run. I therefore ended up with the following code:

    private void replace(String inFile, Map<String, String> data, OutputStream out) throws Exception, IOException {
    XWPFDocument doc = new XWPFDocument(OPCPackage.open(inFile));
    for (XWPFParagraph p : doc.getParagraphs()) {
        replace2(p, data);
    }
    for (XWPFTable tbl : doc.getTables()) {
        for (XWPFTableRow row : tbl.getRows()) {
            for (XWPFTableCell cell : row.getTableCells()) {
                for (XWPFParagraph p : cell.getParagraphs()) {
                    replace2(p, data);
                }
            }
        }
    }
    doc.write(out);
}

private void replace2(XWPFParagraph p, Map<String, String> data) {
    String pText = p.getText(); // complete paragraph as string
    if (pText.contains("${")) { // if paragraph does not include our pattern, ignore
        TreeMap<Integer, XWPFRun> posRuns = getPosToRuns(p);
        Pattern pat = Pattern.compile("\\$\\{(.+?)\\}");
        Matcher m = pat.matcher(pText);
        while (m.find()) { // for all patterns in the paragraph
            String g = m.group(1);  // extract key start and end pos
            int s = m.start(1);
            int e = m.end(1);
            String key = g;
            String x = data.get(key);
            if (x == null)
                x = "";
            SortedMap<Integer, XWPFRun> range = posRuns.subMap(s - 2, true, e + 1, true); // get runs which contain the pattern
            boolean found1 = false; // found $
            boolean found2 = false; // found {
            boolean found3 = false; // found }
            XWPFRun prevRun = null; // previous run handled in the loop
            XWPFRun found2Run = null; // run in which { was found
            int found2Pos = -1; // pos of { within above run
            for (XWPFRun r : range.values())
            {
                if (r == prevRun)
                    continue; // this run has already been handled
                if (found3)
                    break; // done working on current key pattern
                prevRun = r;
                for (int k = 0;; k++) { // iterate over texts of run r
                    if (found3)
                        break;
                    String txt = null;
                    try {
                        txt = r.getText(k); // note: should return null, but throws exception if the text does not exist
                    } catch (Exception ex) {

                    }
                    if (txt == null)
                        break; // no more texts in the run, exit loop
                    if (txt.contains("$") && !found1) {  // found $, replace it with value from data map
                        txt = txt.replaceFirst("\\$", x);
                        found1 = true;
                    }
                    if (txt.contains("{") && !found2 && found1) {
                        found2Run = r; // found { replace it with empty string and remember location
                        found2Pos = txt.indexOf('{');
                        txt = txt.replaceFirst("\\{", "");
                        found2 = true;
                    }
                    if (found1 && found2 && !found3) { // find } and set all chars between { and } to blank
                        if (txt.contains("}"))
                        {
                            if (r == found2Run)
                            { // complete pattern was within a single run
                                txt = txt.substring(0, found2Pos)+txt.substring(txt.indexOf('}'));
                            }
                            else // pattern spread across multiple runs
                                txt = txt.substring(txt.indexOf('}'));
                        }
                        else if (r == found2Run) // same run as { but no }, remove all text starting at {
                            txt = txt.substring(0,  found2Pos);
                        else
                            txt = ""; // run between { and }, set text to blank
                    }
                    if (txt.contains("}") && !found3) {
                        txt = txt.replaceFirst("\\}", "");
                        found3 = true;
                    }
                    r.setText(txt, k);
                }
            }
        }
        System.out.println(p.getText());

    }

}

private TreeMap<Integer, XWPFRun> getPosToRuns(XWPFParagraph paragraph) {
    int pos = 0;
    TreeMap<Integer, XWPFRun> map = new TreeMap<Integer, XWPFRun>();
    for (XWPFRun run : paragraph.getRuns()) {
        String runText = run.text();
        if (runText != null && runText.length() > 0) {
            for (int i = 0; i < runText.length(); i++) {
                map.put(pos + i, run);
            }
            pos += runText.length();
        }

    }
    return map;
}

0 讨论(0)

醉梦人生

2020-11-29 18:48

As of the date of writing, none of the answers replace properly.

Gagravars answer does not include cases where words to replace are split in runs; Thierry Boduins solution sometimes left words to replace blank when they were after other words to replace, also it does not check tables.

Using Gagtavars answer as base I have also checked the run before current run if the text of both runs contain the word to replace, adding else block. My addition in kotlin:

if (text != null) {
        if (text.contains(findText)) {
            text = text.replace(findText, replaceText)
            r.setText(text, 0)
        } else if (i > 0 && p.runs[i - 1].getText(0).plus(text).contains(findText)) {
            val pos = p.runs[i - 1].getText(0).indexOf('$')
            text = textOfNotFullSecondRun(text, findText)
            r.setText(text, 0)
            val findTextLengthInFirstRun = findTextPartInFirstRun(p.runs[i - 1].getText(0), findText)
            val prevRunText = p.runs[i - 1].getText(0).replaceRange(pos, findTextLengthInFirstRun, replaceText)
            p.runs[i - 1].setText(prevRunText, 0)
        }
    }

private fun textOfNotFullSecondRun(text: String, findText: String): String {
    return if (!text.contains(findText)) {
        textOfNotFullSecondRun(text, findText.drop(1))
    } else {
        text.replace(findText, "")
    }
}

private fun findTextPartInFirstRun(text: String, findText: String): Int {
    return if (text.contains(findText)) {
        findText.length
    } else {
        findTextPartInFirstRun(text, findText.dropLast(1))
    }
}

it is the list of runs in a paragraph. Same with the search block in the table. With this solution I did not have any issues yet. All formatting is intact.

Edit: I made a java lib for replacing, check it out: https://github.com/deividasstr/docx-word-replacer

0 讨论(0)

1 2 下一页