Remove non-utf8 characters from string

后端 未结 18 1734
心在旅途
心在旅途 2020-11-22 11:56

Im having a problem with removing non-utf8 characters from string, which are not displaying properly. Characters are like this 0x97 0x61 0x6C 0x6F (hex representation)

18条回答
  •  耶瑟儿~
    2020-11-22 12:48

    Slightly different to the question, but what I am doing is to use HtmlEncode(string),

    pseudo code here

    var encoded = HtmlEncode(string);
    encoded = Regex.Replace(encoded, "&#\d+?;", "");
    var result = HtmlDecode(encoded);
    

    input and output

    "Headlight\x007E Bracket, { Cafe Racer<> Style, Stainless Steel 中文呢?"
    "Headlight~ Bracket, { Cafe Racer<> Style, Stainless Steel 中文呢?"
    

    I know it's not perfect, but does the job for me.

提交回复
热议问题