Converting HTML entities to Unicode Characters in C#

前端 未结 6 2085
心在旅途
心在旅途 2020-12-15 15:23

I found similar questions and answers for Python and Javascript, but not for C# or any other WinRT compatible language.

The reason I think I need it, is because I\'

6条回答
  •  眼角桃花
    2020-12-15 16:14

    Improved Zumey method (I can`t comment there). Max char size is in the entity: &exclamation; (11). Upper case in the entities are also possible, ex. À (Source from wiki)

    public string EntityToUnicode(string html) {
            var replacements = new Dictionary();
            var regex = new Regex("(&[a-zA-Z]{2,11};)");
            foreach (Match match in regex.Matches(html)) {
                if (!replacements.ContainsKey(match.Value)) { 
                    var unicode = HttpUtility.HtmlDecode(match.Value);
                    if (unicode.Length == 1) {
                        replacements.Add(match.Value, string.Concat("&#", Convert.ToInt32(unicode[0]), ";"));
                    }
                }
            }
            foreach (var replacement in replacements) {
                html = html.Replace(replacement.Key, replacement.Value);
            }
            return html;
        }
    

提交回复
热议问题