docx unbreakable words

你离开我真会死。 提交于 2019-12-02 06:29:47

问题


I'm trying to replace words in a docx file like described here:

public static void SearchAndReplace(string document)
{
    using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, true))
    {
        string docText = null;
        using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
        {
            docText = sr.ReadToEnd();
        }

        Regex regexText = new Regex("Hello world!");
        docText = regexText.Replace(docText, "Hi Everyone!");

        using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
        {
            sw.Write(docText);
        }
    }
}

That's working fine except that sometimes for SomeTest in a document you would get something like:

    <w:t>
        Some
    </w:t>
</w:r>

<w:r w:rsidR="009E5AFA">
    <w:rPr>
        <w:b/>
        <w:color w:val="365F91"/>
        <w:sz w:val="22"/>
    </w:rPr>
    <w:t>
        Test
    </w:t>
</w:r>

And of course replacement fails. Perhaps there is a workaround to make some words unbreakable in docx? Or perhaps I'm doing replace wrong?


回答1:


One way to solve this is normalizing the xml of your document before doing transformtions. You can make use of OpenXml Powertools to do this.

Sample code to normalize xml

 using (WordprocessingDocument doc =
            WordprocessingDocument.Open("Test.docx", true))
        {
            SimplifyMarkupSettings settings = new SimplifyMarkupSettings
            {
                NormalizeXml = true, // Merges Run's in a paragraph with similar formatting
                // Additional settings if required
                AcceptRevisions = true,
                RemoveBookmarks = true,
                RemoveComments = true,
                RemoveGoBackBookmark = true,
                RemoveWebHidden = true,
                RemoveContentControls = true,
                RemoveEndAndFootNotes = true,
                RemoveFieldCodes = true,
                RemoveLastRenderedPageBreak = true,
                RemovePermissions = true,
                RemoveProof = true,
                RemoveRsidInfo = true,
                RemoveSmartTags = true,
                RemoveSoftHyphens = true,
                ReplaceTabsWithSpaces = true
            };
            MarkupSimplifier.SimplifyMarkup(doc, settings);
        }

This will simplify the markup of Open Xml document to make further transformations easier to work with the document programatically. I always use it before working with a open xml document programatically.

More Info about using these tools can be found here and a good blog article here.



来源:https://stackoverflow.com/questions/15790999/docx-unbreakable-words

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!