Obtain Details of docx4j Comparison

若如初见. 提交于 2019-12-04 06:21:18

问题


I took the suggestion for comparing docx files from here: OutOfMemoryError while doing docx comparison using docx4j

However, this line:

Body newBody = (Body) org.docx4j.XmlUtils.unmarshalString(contentStr);

triggers a number of JAXB Warnings such as:

WARN org.docx4j.jaxb.JaxbValidationEventHandler .handleEvent line 80 - [ERROR] : unexpected element (uri:"", local:"ins"). Expected elements are <{[?]}text>
INFO org.docx4j.jaxb.JaxbValidationEventHandler .handleEvent line 106 - continuing (with possible element/attribute loss)

That is understandable given that org.docx4j.wml.Text does not indicate handling for any nested tags and the string written by Docx4jDriver.diff() contains:

<w:t dfx:insert="true" xml:space="preserve"><ins>This</ins><ins> </ins><ins>first</ins><ins> </ins><ins>line</ins><ins> </ins><ins>has</ins><ins> </ins><ins>a</ins><ins> </ins></w:t>

Consequently, the Text.getValue() calls which contain <ins> tags return an empty String.

I'm attempting to programatically determine diffs between two docx files (original + result of round-tripping a docx transformation process) using the suggested approach plus the following code:

Body newBody = (Body) org.docx4j.XmlUtils.unmarshalString(contentStr);

for ( Object bodyPart : newBody.getContent() ) {
  if ( bodyPart instanceof P ) {
    P bodyPartInCast = (P)bodyPart;
    for ( Object currentPContent : bodyPartInCast.getContent() ) {
      if ( currentPContent instanceof R ) {
        R pContentCast = (R)currentPContent;
        for( Object currentRContent : pContentCast.getContent() ) {
          if ( currentRContent instanceof JAXBElement ) {
            JAXBElement rContentCast = (JAXBElement)currentRContent;
            Object jaxbValue = rContentCast.getValue();
            if ( jaxbValue instanceof Text ) {
              Text textValue = (Text)jaxbValue;
              System.out.println( "Text: --> " + textValue.getValue() );
            } 
          }
        }
      } 
    }
  } 
}

So, the question is: if this isn't the correct approach for processing the details of the differences between two files, what is?

I'm using docx4j version 2.8.0 and the two docx files being compared are:

  1. Document 1 (input)
  2. Document 2 (output)

回答1:


Disclosure: I work on docx4j

Have a look at CompareDocuments which uses Differencer to convert the diff result back to valid WordML content.



来源:https://stackoverflow.com/questions/11347961/obtain-details-of-docx4j-comparison

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!