How to accept revisions / track changes (ins/del) in a docx?

Deadly 提交于 2019-12-24 19:17:10

问题


In MS-Word 2010 there is an Option under File -> Information to check the document for problems before sharing it. This makes it possible to handle track changes (to new newest version) and remove all comments and annotations from the document at once.

Is this possibility available in docx4j as well or do I need to investiagte the corresponding JAXB-Objects and write a traverse finder? Doing that manually could be a lot of work since I would have to add the RunIns (w:ins) to the R (w:r) and remove the RunDel (w:del). I also saw a w:del once inside a w:ins. In this case I don't know if this also appears vice versa or in deeper nestings.

Further research brought this XSLT up: https://github.com/plutext/docx4all/blob/master/docx4all/src/main/java/org/docx4all/util/ApplyRemoteChanges.xslt I was not able to run this within docx4j but by manually unzipping the docx and extracting the document.xml. After applying the xslt on the plain document.xml I wrapped it in the docx container again to open it with MS-Word. The result was not the same as it would be by accepting the revision with MS-Word itself. More concrete: The XSLT removed the deleted marked text (in a Table), but not a listing dot before the text. This appears quite often in my document.

If this request is not posible to solve in an easy manner, I will change the constraints. It is sufficent for me to have a method for getting all text of a ContentAccessor, as a String. The ContentAccessor could be a P or Tc. The String shall be inside a R there or inside a RunIns (with R inside of that) For this I have a half solution below. The intersting part starts in the line of else if (child instanceof RunIns) {. But as mentioned above I'm not sure how nested del/ins Statements might appear and if this will handle them well. And the results are still not the same as if I would prepare the document with MS-Word before.

//Similar to:
//http://www.docx4java.org/forums/docx-java-f6/how-to-get-all-text-element-of-a-paragraph-with-docx4j-t2028.html
private String getAllTextfromParagraph(ContentAccessor ca) {
    String result = "";
    List<Object> children = ca.getContent();
    for (Object child : children) {
        child = XmlUtils.unwrap(child);
        if (child instanceof Text) {
            Text text = (Text) child;
            result += text.getValue();
        } else if (child instanceof R) {
            R run = (R) child;
            result += getTextFromRun(run);
        }
        else if (child instanceof RunIns) {
            RunIns ins = (RunIns) child;
            for (Object obj : ins.getCustomXmlOrSmartTagOrSdt()) {
                if (obj instanceof R) {
                    result += getTextFromRun((R) obj);
                }
            }
        }
    }
    return result.trim();
}

private String getTextFromRun(R run) {
    String result = "";
    for (Object o : run.getContent()) {
        o = XmlUtils.unwrap(o);
        if (o instanceof R.Tab) {
            Text text = new Text();
            text.setValue("\t");
            result += text.getValue();
        }
        if (o instanceof R.SoftHyphen) {
            Text text = new Text();
            text.setValue("\u00AD");
            result += text.getValue();
        }
        if (o instanceof Br) {
            Text text = new Text();
            text.setValue(" ");
            result += text.getValue();
        }
        if (o instanceof Text) {
            result += ((Text) o).getValue();
        }
    }
    return result;
}

回答1:


https://github.com/plutext/docx4j/commit/309a8e4008553452ebe675e81def30aab97542a2?w=1 adds a method for transforming just one Part, and sample code to use it to accept changes.

The XSLT is just what you found (relicensed as Apache 2):

    <?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
  xmlns:o="urn:schemas-microsoft-com:office:office"
  xmlns:v="urn:schemas-microsoft-com:vml"
  xmlns:WX="http://schemas.microsoft.com/office/word/2003/auxHint"
  xmlns:aml="http://schemas.microsoft.com/aml/2001/core"
  xmlns:w10="urn:schemas-microsoft-com:office:word"
  xmlns:pkg="http://schemas.microsoft.com/office/2006/xmlPackage"
        xmlns:msxsl="urn:schemas-microsoft-com:xslt"
    xmlns:ext="http://www.xmllab.net/wordml2html/ext"
  xmlns:java="http://xml.apache.org/xalan/java"
  xmlns:xml="http://www.w3.org/XML/1998/namespace"
  version="1.0"
        exclude-result-prefixes="java msxsl ext o v WX aml w10">


  <xsl:output method="xml" encoding="utf-8" omit-xml-declaration="no" indent="yes" />


  <xsl:template match="/ | @*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="w:del" />

  <xsl:template match="w:ins" >
    <xsl:apply-templates select="*"/>
  </xsl:template>

</xsl:stylesheet>

You'll need to add support for the other elements identified in the MSDN link. If you do that, I'd be happy to get a pull request



来源:https://stackoverflow.com/questions/45544974/how-to-accept-revisions-track-changes-ins-del-in-a-docx

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!