How can I strip punctuation from a string?

前端 未结 15 516
天命终不由人
天命终不由人 2020-12-04 18:47

For the hope-to-have-an-answer-in-30-seconds part of this question, I\'m specifically looking for C#

But in the general case, what\'s the best way to strip punctuati

相关标签:
15条回答
  • 2020-12-04 19:24

    Why not simply:

    string s = "sxrdct?fvzguh,bij.";
    var sb = new StringBuilder();
    
    foreach (char c in s)
    {
       if (!char.IsPunctuation(c))
          sb.Append(c);
    }
    
    s = sb.ToString();
    

    The usage of RegEx is normally slower than simple char operations. And those LINQ operations look like overkill to me. And you can't use such code in .NET 2.0...

    0 讨论(0)
  • 2020-12-04 19:24

    Describes intent, easiest to read (IMHO) and best performing:

     s = s.StripPunctuation();
    

    to implement:

    public static class StringExtension
    {
        public static string StripPunctuation(this string s)
        {
            var sb = new StringBuilder();
            foreach (char c in s)
            {
                if (!char.IsPunctuation(c))
                    sb.Append(c);
            }
            return sb.ToString();
        }
    }
    

    This is using Hades32's algorithm which was the best performing of the bunch posted.

    0 讨论(0)
  • 2020-12-04 19:30

    You can use the regex.replace method:

     replace(YourString, RegularExpressionWithPunctuationMarks, Empty String)
    

    Since this returns a string, your method will look something like this:

     string s = Regex.Replace("Hello!?!?!?!", "[?!]", "");
    

    You can replace "[?!]" with something more sophiticated if you want:

    (\p{P})
    

    This should find any punctuation.

    0 讨论(0)
  • 2020-12-04 19:34

    Here's a slightly different approach using linq. I like AviewAnew's but this avoids the Aggregate

            string myStr = "Hello there..';,]';';., Get rid of Punction";
    
            var s = from ch in myStr
                    where !Char.IsPunctuation(ch)
                    select ch;
    
            var bytes = UnicodeEncoding.ASCII.GetBytes(s.ToArray());
            var stringResult = UnicodeEncoding.ASCII.GetString(bytes);
    
    0 讨论(0)
  • 2020-12-04 19:35

    This thread is so old, but I'd be remiss not to post a more elegant (IMO) solution.

    string inputSansPunc = input.Where(c => !char.IsPunctuation(c)).Aggregate("", (current, c) => current + c);
    

    It's LINQ sans WTF.

    0 讨论(0)
  • 2020-12-04 19:35

    The most braindead simple way of doing it would be using string.replace

    The other way I would imagine is a regex.replace and have your regular expression with all the appropriate punctuation marks in it.

    0 讨论(0)
提交回复
热议问题