Converting HTML entities to Unicode Characters in C#

前端 未结 6 2089
心在旅途
心在旅途 2020-12-15 15:23

I found similar questions and answers for Python and Javascript, but not for C# or any other WinRT compatible language.

The reason I think I need it, is because I\'

6条回答
  •  星月不相逢
    2020-12-15 16:08

    This worked for me, replaces both common and unicode entities.

    private static readonly Regex HtmlEntityRegex = new Regex("&(#)?([a-zA-Z0-9]*);");
    
    public static string HtmlDecode(this string html)
    {
        if (html.IsNullOrEmpty()) return html;
        return HtmlEntityRegex.Replace(html, x => x.Groups[1].Value == "#"
            ? ((char)int.Parse(x.Groups[2].Value)).ToString()
            : HttpUtility.HtmlDecode(x.Groups[0].Value));
    }
    
    [Test]
    [TestCase(null, null)]
    [TestCase("", "")]
    [TestCase("'fark'", "'fark'")]
    [TestCase(""fark"", "\"fark\"")]
    public void should_remove_html_entities(string html, string expected)
    {
        html.HtmlDecode().ShouldEqual(expected);
    }
    

提交回复
热议问题