I\'m doing simple string input parsing and I am in need of a string tokenizer. I am new to C# but have programmed Java, and it seems natural that C# should have a string tok
_words = new List<string>(YourText.ToLower().Trim('\n', '\r').Split(' ').
Select(x => new string(x.Where(Char.IsLetter).ToArray())));
Or
_words = new List<string>(YourText.Trim('\n', '\r').Split(' ').
Select(x => new string(x.Where(Char.IsLetterOrDigit).ToArray())));
If you're trying to do something like splitting command line arguments in a .NET Console app, you're going to have issues because .NET is either broken or is trying to be clever (which means it's as good as broken). I needed to be able to split arguments by the space character, preserving any literals that were quoted so they didn't get split in the middle. This is the code I wrote to do the job:
private static List<String> Tokenise(string value, char seperator)
{
List<string> result = new List<string>();
value = value.Replace(" ", " ").Replace(" ", " ").Trim();
StringBuilder sb = new StringBuilder();
bool insideQuote = false;
foreach(char c in value.ToCharArray())
{
if(c == '"')
{
insideQuote = !insideQuote;
}
if((c == seperator) && !insideQuote)
{
if (sb.ToString().Trim().Length > 0)
{
result.Add(sb.ToString().Trim());
sb.Clear();
}
}
else
{
sb.Append(c);
}
}
if (sb.ToString().Trim().Length > 0)
{
result.Add(sb.ToString().Trim());
}
return result;
}
You could use String.Split method.
class ExampleClass
{
public ExampleClass()
{
string exampleString = "there is a cat";
// Split string on spaces. This will separate all the words in a string
string[] words = exampleString.Split(' ');
foreach (string word in words)
{
Console.WriteLine(word);
// there
// is
// a
// cat
}
}
}
For more information see Sam Allen's article about splitting strings in c# (Performance, Regex)
If you are using C# 3.5 you could write an extension method to System.String that does the splitting you need. You then can then use syntax:
string.SplitByMyTokens();
More info and a useful example from MS here http://msdn.microsoft.com/en-us/library/bb383977.aspx
The split method of a string is what you need. In fact the tokenizer class in Java is deprecated in favor of Java's string split method.