How to add comments to a .docx XML

依然范特西╮ 提交于 2019-12-11 05:06:10

问题


At work, we have a word document that we have to edit all the time to pass on to another team, to tell them how to perform some tasks. Since I don't like mindlessly filling out data, and I always look for ways to simplify the tasks I have to do, I decided I would automate this process. After considering a few methods (such as generating a word document from scratch or editing an existing document), I decided to edit the document in-place.

I have inserted special tags into the document (specifically, they take the form [SOME_NAME_HERE]), and I will then parse the document for those special tags and replace them with the value I actually need. I then extract the .docx to a folder with all of the XML documents inside of it, and parse the document.xml file, replacing the values.

During this process, depending on what is actually needed, there are sections of the document that will have to be removed from it. So my first thought was to add comments to the document.xml file. For example:

<!-- INITIAL BUILD ONLY -->
      <w:p w:rsidR="00202319" w:rsidRPr="00D00FF5" w:rsidRDefault="00202319" w:rsidP="00AC0192">
        <w:r w:rsidR="00E548A2" w:rsidRPr="00D00FF5">
          <w:rPr>
            <w:rStyle w:val="emcfontstrong"/>
          </w:rPr>
          <w:t>Some text here</w:t>
        </w:r>
      </w:p>
<!-- END INITIAL BUILD ONLY -->

Then, when I go to generate the output word document, I would simply remove all of the sections that were "INITIAL BUILD ONLY" (unless, of course, it is the initial build).

However, the issue I am running in to is that when you convert the document back to a Word document, open in Word and save it, it will "cleanup" the document, and remove all of the comments I've added to it.

So, my question is, is there any way to preserve the comments in the document, or is there any special tags I could add to the XML that would not be visible during standard view/edit of the document, but would not be removed by Word upon a save?


回答1:


Choosing to edit the document is a good choice imo!

Word does a lot of changes when opening a docx and saving it again, so I would'nt trust this. I don't even know where it would be possible to store hidden and persistent data inside document.xml.

Here's an idea to get what you need using an other technique.

Hello {name}

with name="edi9999" will be replaced by Hello edi9999

{#names}
Hello {name}
{/names}

with names=[{name:"John"},{name:"Mary"},{name:"Jane"}]

will be replaced by:

Hello John
Hello Mary
Hello Jane

Now the trick to comment out a section is to use an empty array.

if names=[]

the output will be an empty string. If you want to uncomment it, use an array with one element.

Inspired by Mustache

How do I build this ?

I have created an implementation of this for Javascript (works on Node and in Browser): https://github.com/edi9999/docxgenjs

There's a demo Here:

http://javascript-ninja.fr/docxgenjs/examples/demo.html




回答2:


To echo the previous poster, I've found that Microsoft Word does a lot of behind-the-scenes magic when opening/saving files, and wouldn't suggest storing metadata in any xml files.

However, if you aren't above a bit of scripting, it shouldn't be too difficult to accomplish using any number of modules. Here's a basic Python implementation using oodocx, a module that I'm currently developing.

from oodocx import oodocx
from lxml import etree

d = oodocx.Docx('template.docx')
body = d.get_body()
paragraph_to_remove = d.search('Some text here', result_type='paragraph')
body.remove(paragraph_to_remove)
d.save('new document.docx')


来源:https://stackoverflow.com/questions/18797129/how-to-add-comments-to-a-docx-xml

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!