How can I strip punctuation from a string?

前端 未结 15 514
天命终不由人
天命终不由人 2020-12-04 18:47

For the hope-to-have-an-answer-in-30-seconds part of this question, I\'m specifically looking for C#

But in the general case, what\'s the best way to strip punctuati

相关标签:
15条回答
  • 2020-12-04 19:15

    Assuming "best" means "simplest" I suggest using something like this:

    String stripped = input.replaceAll("\\p{Punct}+", "");
    

    This example is for Java, but all sufficiently modern Regex engines should support this (or something similar).

    Edit: the Unicode-Aware version would be this:

    String stripped = input.replaceAll("\\p{P}+", "");
    

    The first version only looks at punctuation characters contained in ASCII.

    0 讨论(0)
  • 2020-12-04 19:19

    Based off GWLlosa's idea, I was able to come up with the supremely ugly, but working:

    string s = "cat!";
    s = s.ToCharArray().ToList<char>()
          .Where<char>(x => !char.IsPunctuation(x))
          .Aggregate<char, string>(string.Empty, new Func<string, char, string>(
                 delegate(string s, char c) { return s + c; }));
    
    0 讨论(0)
  • 2020-12-04 19:19

    If you want to use this for tokenizing text you can use:

    new string(myText.Select(c => char.IsPunctuation(c) ? ' ' : c).ToArray())
    
    0 讨论(0)
  • 2020-12-04 19:19

    For anyone who would like to do this via RegEx:

    This code shows the full RegEx replace process and gives a sample Regex that only keeps letters, numbers, and spaces in a string - replacing ALL other characters with an empty string:

    //Regex to remove all non-alphanumeric characters
    System.Text.RegularExpressions.Regex TitleRegex = new 
    System.Text.RegularExpressions.Regex("[^a-z0-9 ]+", 
    System.Text.RegularExpressions.RegexOptions.IgnoreCase);
    
    string ParsedString = TitleRegex.Replace(stringToParse, String.Empty);
    
    return ParsedString;
    
    0 讨论(0)
  • 2020-12-04 19:19
    #include<string>
        #include<cctype>
        using namespace std;
    
        int main(int a, char* b[]){
        string strOne = "H,e.l/l!o W#o@r^l&d!!!";
        int punct_count = 0;
    
    cout<<"before : "<<strOne<<endl;
    for(string::size_type ix = 0 ;ix < strOne.size();++ix)   
    {   
        if(ispunct(strOne[ix])) 
        {
                ++punct_count;  
                strOne.erase(ix,1); 
                ix--;
        }//if
    }
        cout<<"after : "<<strOne<<endl;
                      return 0;
        }//main
    
    0 讨论(0)
  • 2020-12-04 19:20

    For long strings I use this:

    var normalized = input
                    .Where(c => !char.IsPunctuation(c))
                    .Aggregate(new StringBuilder(),
                               (current, next) => current.Append(next), sb => sb.ToString());
    

    performs much better than using string concatenations (though I agree it's less intuitive).

    0 讨论(0)
提交回复
热议问题