How to remove html tags from word content?

别说谁变了你拦得住时间么 提交于 2019-12-13 04:33:34

问题


I know there are a couple threads about it which says simply using

Regex.Replace(input, "<.*?>", String.Empty);

but I cant use it in text written in word doc. my code is like:

Microsoft.Office.Interop.Word.Document wBelge = oWord.Documents.Add(ref oMissing,
    ref oMissing, ref oMissing, ref oMissing);
Microsoft.Office.Interop.Word.Paragraph paragraf2;
paragraf2 = wBelge.Paragraphs.Add(ref oMissing);
paragraf2.Range.Text ="some long text";

I can change with finding and replacing like

Word.Find findObject = oWord.Selection.Find;
findObject.ClearFormatting();
findObject.Text = "<strong>";
findObject.Replacement.Text = "";
findObject.Replacement.ClearFormatting();               

object replaceAllc = Word.WdReplace.wdReplaceAll;
findObject.Execute(ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing,
    ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing,
    ref replaceAllc, ref oMissing, ref oMissing, ref oMissing, ref oMissing);

Do I need to do this for every html tag?


回答1:


Give a try the following:

Convert the text with HTML addings to a simple string using

string unFormatted = paragrapf2.ToString(SaveOptions.DisableFormatting));

and then replace the paragraf2 contect with the unFormatted string.




回答2:


With some help provided in the comments, i realized the following working solution

findObject.ClearFormatting();
findObject.Text = @"\<*\>";
findObject.MatchWildcards=true;                     
findObject.Replacement.ClearFormatting();
findObject.Replacement.Text = "";                       

object replaceAll = Word.WdReplace.wdReplaceAll;
findObject.Execute(ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing,
    ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing,
    ref replaceAll, ref oMissing, ref oMissing, ref oMissing, ref oMissing);

which is using the search pattern \<*\> (containing the wildcard character *, hence findObject.MatchWildcards must be set to true).



来源:https://stackoverflow.com/questions/24480564/how-to-remove-html-tags-from-word-content

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!