HTMLagilitypack is not removing all html tags How can I solve this efficiently?

巧了我就是萌 提交于 2019-12-06 20:25:32

问题


I am using following method to strip all html from the string:

public static string StripHtmlTags(string html)
        {
            if (String.IsNullOrEmpty(html)) return "";
            HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
            doc.LoadHtml(html);
            return doc.DocumentNode.InnerText;
        }

But it seems ignoring this following tag: […]

So the string returns basicly:

> A hungry thief who stole a rack of pork ribs from a grocery store has
> been sentenced to spend 50 years in prison. Willie Smith Ward felt the
> full force of the law after being convicted of the crime in Waco,
> Texas, on Wednesday. The 43-year-old may feel slightly aggrieved over
> the severity of the […]

How can I make sure that these kind of tags gets stripped?

Any kind of help is appreciated, thanks.


回答1:


Try HttpUtility.HtmlDecode

public static string StripHtmlTags(string html)
{
    if (String.IsNullOrEmpty(html)) return "";
    HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
    doc.LoadHtml(html);
    return HttpUtility.HtmlDecode(doc.DocumentNode.InnerText);
}

HtmlDecode will convert […] to […]



来源:https://stackoverflow.com/questions/16875549/htmlagilitypack-is-not-removing-all-html-tags-how-can-i-solve-this-efficiently

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!