Regex to replace invalid characters

后端 未结 3 1912
甜味超标
甜味超标 2020-12-07 02:16

I don\'t have much experience with RegEx so I am using many chained String.Replace() calls to remove unwanted characters -- is there a RegEx I can write to streamline this?<

相关标签:
3条回答
  • 2020-12-07 02:55

    Try this regex:

    Regex regex = new Regex(@"[\s,:.;/\\]+");
    string cleanText = regex.Replace(messyText, "").ToUpper();
    

    \s is a character class equivalent to [ \t\r\n].


    If you just want to preserve alphanumeric characters, instead of adding every non-alphanumeric character in existence to the character class, you could do this:

    Regex regex = new Regex(@"[\W_]+");
    string cleanText = regex.Replace(messyText, "").ToUpper();
    

    Where \W is any non-word character (not [^a-zA-Z0-9_]).

    0 讨论(0)
  • 2020-12-07 02:56

    Character classes to the rescue!

    string messyText = GetText();
    string cleanText = Regex.Replace(messyText.Trim().ToUpper(), @"[,:.;/\\\n\t\r ]+", "")
    
    0 讨论(0)
  • 2020-12-07 03:01

    You would probably want to use a whitelist approach, there is an ocean of funny characters whose effect depending on combination may not be easy to figure.

    A simple regex that removes everything but the allowed characters could look like this:

    messyText = Regex.Replace(messyText, @"[^a-zA-Z0-9\x7C\x2C\x2E_]", "");
    

    The ^ is there to invert the selection, apart from the alphanumeric characters this regex allows | , . and _ You can add and remove characters and character sets as needed.

    0 讨论(0)
提交回复
热议问题