问题
I'm trying to replace words in a docx file like described here:
public static void SearchAndReplace(string document)
{
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, true))
{
string docText = null;
using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
{
docText = sr.ReadToEnd();
}
Regex regexText = new Regex("Hello world!");
docText = regexText.Replace(docText, "Hi Everyone!");
using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
{
sw.Write(docText);
}
}
}
That's working fine except that sometimes for SomeTest in a document you would get something like:
<w:t>
Some
</w:t>
</w:r>
<w:r w:rsidR="009E5AFA">
<w:rPr>
<w:b/>
<w:color w:val="365F91"/>
<w:sz w:val="22"/>
</w:rPr>
<w:t>
Test
</w:t>
</w:r>
And of course replacement fails. Perhaps there is a workaround to make some words unbreakable in docx? Or perhaps I'm doing replace wrong?
回答1:
One way to solve this is normalizing the xml of your document before doing transformtions. You can make use of OpenXml Powertools to do this.
Sample code to normalize xml
using (WordprocessingDocument doc =
WordprocessingDocument.Open("Test.docx", true))
{
SimplifyMarkupSettings settings = new SimplifyMarkupSettings
{
NormalizeXml = true, // Merges Run's in a paragraph with similar formatting
// Additional settings if required
AcceptRevisions = true,
RemoveBookmarks = true,
RemoveComments = true,
RemoveGoBackBookmark = true,
RemoveWebHidden = true,
RemoveContentControls = true,
RemoveEndAndFootNotes = true,
RemoveFieldCodes = true,
RemoveLastRenderedPageBreak = true,
RemovePermissions = true,
RemoveProof = true,
RemoveRsidInfo = true,
RemoveSmartTags = true,
RemoveSoftHyphens = true,
ReplaceTabsWithSpaces = true
};
MarkupSimplifier.SimplifyMarkup(doc, settings);
}
This will simplify the markup of Open Xml document to make further transformations easier to work with the document programatically. I always use it before working with a open xml document programatically.
More Info about using these tools can be found here and a good blog article here.
来源:https://stackoverflow.com/questions/15790999/docx-unbreakable-words