I\'m doing simple string input parsing and I am in need of a string tokenizer. I am new to C# but have programmed Java, and it seems natural that C# should have a string tok
read this, split function has an overload takes an array consist of seperators http://msdn.microsoft.com/en-us/library/system.stringsplitoptions.aspx
For complex splitting you could use a regex creating a match collection.
I think the nearest in the .NET Framework is
string.Split()
The similar to Java's method is:
Regex.Split(string, pattern);
where
string
- the text you need to splitpattern
- string type pattern, what is splitting the textuse Regex.Split(string,"#|#");
I just want to highlight the power of C#'s Split method and give a more detailed comparison, particularly from someone who comes from a Java background.
Whereas StringTokenizer in Java only allows a single delimiter, we can actually split on multiple delimiters making regular expressions less necessary (although if one needs regex, use regex by all means!) Take for example this:
str.Split(new char[] { ' ', '.', '?' })
This splits on three different delimiters returning an array of tokens. We can also remove empty arrays with what would be a second parameter for the above example:
str.Split(new char[] { ' ', '.', '?' }, StringSplitOptions.RemoveEmptyEntries)
One thing Java's String tokenizer does have that I believe C# is lacking (at least Java 7 has this feature) is the ability to keep the delimiter(s) as tokens. C#'s Split will discard the tokens. This could be important in say some NLP applications, but for more general purpose applications this might not be a problem.