Replacing a text in Apache POI XWPF

前端 未结 10 2200
鱼传尺愫
鱼传尺愫 2020-11-29 18:18

I just found Apache POI library very useful for editing Word files using Java. Specifically, I want to edit a DOCX file using Apache POI\'s XWPF classes. I

相关标签:
10条回答
  • 2020-11-29 18:33

    The answer accepted here needs one more update along with Justin Skiles update. r.setText(text, 0); Reason: If not updating setText with pos variable, the output will be the combination of old string and replace string.

    0 讨论(0)
  • 2020-11-29 18:36

    I suggest my solution for replacing text between #, for example: This #bookmark# should be replaced. It is replace in:

    • paragraphs;
    • tables;
    • footers.

    Also, it takes into account situations, when symbol # and bookmark are in the separated runs (replace variable between different runs).

    Here link to the code: https://gist.github.com/aerobium/bf02e443c079c5caec7568e167849dda

    0 讨论(0)
  • 2020-11-29 18:38

    There is the replaceParagraph implementation that replaces ${key} with value (the fieldsForReport parameter) and saves format by merging runs contents ${key}.

    private void replaceParagraph(XWPFParagraph paragraph, Map<String, String> fieldsForReport) throws POIXMLException {
        String find, text, runsText;
        List<XWPFRun> runs;
        XWPFRun run, nextRun;
        for (String key : fieldsForReport.keySet()) {
            text = paragraph.getText();
            if (!text.contains("${"))
                return;
            find = "${" + key + "}";
            if (!text.contains(find))
                continue;
            runs = paragraph.getRuns();
            for (int i = 0; i < runs.size(); i++) {
                run = runs.get(i);
                runsText = run.getText(0);
                if (runsText.contains("${") || (runsText.contains("$") && runs.get(i + 1).getText(0).substring(0, 1).equals("{"))) {
                    //As the next run may has a closed tag and an open tag at 
                    //the same time, we have to be sure that our building string 
                    //has a fully completed tags 
                    while (!openTagCountIsEqualCloseTagCount(runsText))) {
                        nextRun = runs.get(i + 1);
                        runsText = runsText + nextRun.getText(0);
                        paragraph.removeRun(i + 1);
                    }
                    run.setText(runsText.contains(find) ?
                            runsText.replace(find, fieldsForReport.get(key)) :
                            runsText, 0);
                }
            }
        }
    }
    
    private boolean openTagCountIsEqualCloseTagCount(String runText) {
        int openTagCount = runText.split("\\$\\{", -1).length - 1;
        int closeTagCount = runText.split("}", -1).length - 1;
        return openTagCount == closeTagCount;
    }
    

    Implementation replaceParagraph

    Unit test

    0 讨论(0)
  • 2020-11-29 18:43

    The method you need is XWPFRun.setText(String). Simply work your way through the file until you find the XWPFRun of interest, work out what you want the new text to be, and replace it. (A run is a sequence of text with the same formatting)

    You should be able to do something like:

    XWPFDocument doc = new XWPFDocument(OPCPackage.open("input.docx"));
    for (XWPFParagraph p : doc.getParagraphs()) {
        List<XWPFRun> runs = p.getRuns();
        if (runs != null) {
            for (XWPFRun r : runs) {
                String text = r.getText(0);
                if (text != null && text.contains("needle")) {
                    text = text.replace("needle", "haystack");
                    r.setText(text, 0);
                }
            }
        }
    }
    for (XWPFTable tbl : doc.getTables()) {
       for (XWPFTableRow row : tbl.getRows()) {
          for (XWPFTableCell cell : row.getTableCells()) {
             for (XWPFParagraph p : cell.getParagraphs()) {
                for (XWPFRun r : p.getRuns()) {
                  String text = r.getText(0);
                  if (text != null && text.contains("needle")) {
                    text = text.replace("needle", "haystack");
                    r.setText(text,0);
                  }
                }
             }
          }
       }
    }
    doc.write(new FileOutputStream("output.docx"));
    
    0 讨论(0)
  • 2020-11-29 18:44

    my task was to replace texts of the format ${key} with values of a map within a word docx document. The above solutions were a good starting point but did not take into account all cases: ${key} can be spread not only across multiple runs but also across multiple texts within a run. I therefore ended up with the following code:

        private void replace(String inFile, Map<String, String> data, OutputStream out) throws Exception, IOException {
        XWPFDocument doc = new XWPFDocument(OPCPackage.open(inFile));
        for (XWPFParagraph p : doc.getParagraphs()) {
            replace2(p, data);
        }
        for (XWPFTable tbl : doc.getTables()) {
            for (XWPFTableRow row : tbl.getRows()) {
                for (XWPFTableCell cell : row.getTableCells()) {
                    for (XWPFParagraph p : cell.getParagraphs()) {
                        replace2(p, data);
                    }
                }
            }
        }
        doc.write(out);
    }
    
    private void replace2(XWPFParagraph p, Map<String, String> data) {
        String pText = p.getText(); // complete paragraph as string
        if (pText.contains("${")) { // if paragraph does not include our pattern, ignore
            TreeMap<Integer, XWPFRun> posRuns = getPosToRuns(p);
            Pattern pat = Pattern.compile("\\$\\{(.+?)\\}");
            Matcher m = pat.matcher(pText);
            while (m.find()) { // for all patterns in the paragraph
                String g = m.group(1);  // extract key start and end pos
                int s = m.start(1);
                int e = m.end(1);
                String key = g;
                String x = data.get(key);
                if (x == null)
                    x = "";
                SortedMap<Integer, XWPFRun> range = posRuns.subMap(s - 2, true, e + 1, true); // get runs which contain the pattern
                boolean found1 = false; // found $
                boolean found2 = false; // found {
                boolean found3 = false; // found }
                XWPFRun prevRun = null; // previous run handled in the loop
                XWPFRun found2Run = null; // run in which { was found
                int found2Pos = -1; // pos of { within above run
                for (XWPFRun r : range.values())
                {
                    if (r == prevRun)
                        continue; // this run has already been handled
                    if (found3)
                        break; // done working on current key pattern
                    prevRun = r;
                    for (int k = 0;; k++) { // iterate over texts of run r
                        if (found3)
                            break;
                        String txt = null;
                        try {
                            txt = r.getText(k); // note: should return null, but throws exception if the text does not exist
                        } catch (Exception ex) {
    
                        }
                        if (txt == null)
                            break; // no more texts in the run, exit loop
                        if (txt.contains("$") && !found1) {  // found $, replace it with value from data map
                            txt = txt.replaceFirst("\\$", x);
                            found1 = true;
                        }
                        if (txt.contains("{") && !found2 && found1) {
                            found2Run = r; // found { replace it with empty string and remember location
                            found2Pos = txt.indexOf('{');
                            txt = txt.replaceFirst("\\{", "");
                            found2 = true;
                        }
                        if (found1 && found2 && !found3) { // find } and set all chars between { and } to blank
                            if (txt.contains("}"))
                            {
                                if (r == found2Run)
                                { // complete pattern was within a single run
                                    txt = txt.substring(0, found2Pos)+txt.substring(txt.indexOf('}'));
                                }
                                else // pattern spread across multiple runs
                                    txt = txt.substring(txt.indexOf('}'));
                            }
                            else if (r == found2Run) // same run as { but no }, remove all text starting at {
                                txt = txt.substring(0,  found2Pos);
                            else
                                txt = ""; // run between { and }, set text to blank
                        }
                        if (txt.contains("}") && !found3) {
                            txt = txt.replaceFirst("\\}", "");
                            found3 = true;
                        }
                        r.setText(txt, k);
                    }
                }
            }
            System.out.println(p.getText());
    
        }
    
    }
    
    private TreeMap<Integer, XWPFRun> getPosToRuns(XWPFParagraph paragraph) {
        int pos = 0;
        TreeMap<Integer, XWPFRun> map = new TreeMap<Integer, XWPFRun>();
        for (XWPFRun run : paragraph.getRuns()) {
            String runText = run.text();
            if (runText != null && runText.length() > 0) {
                for (int i = 0; i < runText.length(); i++) {
                    map.put(pos + i, run);
                }
                pos += runText.length();
            }
    
        }
        return map;
    }
    
    0 讨论(0)
  • 2020-11-29 18:48

    As of the date of writing, none of the answers replace properly.

    Gagravars answer does not include cases where words to replace are split in runs; Thierry Boduins solution sometimes left words to replace blank when they were after other words to replace, also it does not check tables.

    Using Gagtavars answer as base I have also checked the run before current run if the text of both runs contain the word to replace, adding else block. My addition in kotlin:

    if (text != null) {
            if (text.contains(findText)) {
                text = text.replace(findText, replaceText)
                r.setText(text, 0)
            } else if (i > 0 && p.runs[i - 1].getText(0).plus(text).contains(findText)) {
                val pos = p.runs[i - 1].getText(0).indexOf('$')
                text = textOfNotFullSecondRun(text, findText)
                r.setText(text, 0)
                val findTextLengthInFirstRun = findTextPartInFirstRun(p.runs[i - 1].getText(0), findText)
                val prevRunText = p.runs[i - 1].getText(0).replaceRange(pos, findTextLengthInFirstRun, replaceText)
                p.runs[i - 1].setText(prevRunText, 0)
            }
        }
    
    private fun textOfNotFullSecondRun(text: String, findText: String): String {
        return if (!text.contains(findText)) {
            textOfNotFullSecondRun(text, findText.drop(1))
        } else {
            text.replace(findText, "")
        }
    }
    
    private fun findTextPartInFirstRun(text: String, findText: String): Int {
        return if (text.contains(findText)) {
            findText.length
        } else {
            findTextPartInFirstRun(text, findText.dropLast(1))
        }
    }
    

    it is the list of runs in a paragraph. Same with the search block in the table. With this solution I did not have any issues yet. All formatting is intact.

    Edit: I made a java lib for replacing, check it out: https://github.com/deividasstr/docx-word-replacer

    0 讨论(0)
提交回复
热议问题