C# regex to remove non - printable characters, and control characters, in a text that has a mix of many different languages, unicode letters

后端 未结 3 1976
天命终不由人
天命终不由人 2020-12-03 07:45

i would appreciate your help on this, since i do not know which range of characters to use, or if there is a character class like [[:cntrl:]] that i have found in ruby?

3条回答
  •  猫巷女王i
    2020-12-03 08:16

    You can try with :

    string s = "Täkörgåsmrgås";
    s = Regex.Replace(s, @"[^\u0000-\u007F]+", string.Empty);
    


    Updated answer after comments:

    Documentation about non-printable character: https://en.wikipedia.org/wiki/Control_character

    Char.IsControl Method:

    https://msdn.microsoft.com/en-us/library/system.char.iscontrol.aspx

    Maybe you can try:

    string input; // this is your input string
    string output = new string(input.Where(c => !char.IsControl(c)).ToArray());
    

提交回复
热议问题