How to split text into words?
Example text:
\'Oh, you can\'t help that,\' said the Cat: \'we\'re all mad here. I\'m mad. You\'re mad.\'
>
You could try using a regex to remove the apostrophes that aren't surrounded by letters (i.e. single quotes) and then using the Char
static methods to strip all the other characters. By calling the regex first you can keep the contraction apostrophes (e.g. can't
) but remove the single quotes like in 'Oh
.
string myText = "'Oh, you can't help that,' said the Cat: 'we're all mad here. I'm mad. You're mad.'";
Regex reg = new Regex("\b[\"']\b");
myText = reg.Replace(myText, "");
string[] listOfWords = RemoveCharacters(myText);
public string[] RemoveCharacters(string input)
{
StringBuilder sb = new StringBuilder();
foreach (char c in input)
{
if (Char.IsLetter(c) || Char.IsWhiteSpace(c) || c == '\'')
sb.Append(c);
}
return sb.ToString().Split(' ');
}