How to split text into words?

前端 未结 7 2342
-上瘾入骨i
-上瘾入骨i 2020-12-04 22:39

How to split text into words?

Example text:

\'Oh, you can\'t help that,\' said the Cat: \'we\'re all mad here. I\'m mad. You\'re mad.\'

7条回答
  •  粉色の甜心
    2020-12-04 22:55

    You could try using a regex to remove the apostrophes that aren't surrounded by letters (i.e. single quotes) and then using the Char static methods to strip all the other characters. By calling the regex first you can keep the contraction apostrophes (e.g. can't) but remove the single quotes like in 'Oh.

    string myText = "'Oh, you can't help that,' said the Cat: 'we're all mad here. I'm mad. You're mad.'";
    
    Regex reg = new Regex("\b[\"']\b");
    myText = reg.Replace(myText, "");
    
    string[] listOfWords = RemoveCharacters(myText);
    
    public string[] RemoveCharacters(string input)
    {
        StringBuilder sb = new StringBuilder();
        foreach (char c in input)
        {
            if (Char.IsLetter(c) || Char.IsWhiteSpace(c) || c == '\'')
               sb.Append(c);
        }
    
        return sb.ToString().Split(' ');
    }
    

提交回复
热议问题