My input consists of user-posted strings.
What I want to do is create a dictionary with words, and how often they’ve been used. This means I want to parse a string,
Using the following
var pattern = new Regex(
@"( [^\W_\d] # starting with a letter
# followed by a run of either...
( [^\W_\d] | # more letters or
[-'\d](?=[^\W_\d]) # ', -, or digit followed by a letter
)*
[^\W_\d] # and finishing with a letter
)",
RegexOptions.IgnorePatternWhitespace);
var input = "#@!@LOLOLOL YOU'VE BEEN *PWN3D* ! :') !!!1einszwei drei foo--bar!";
foreach (Match m in pattern.Matches(input))
Console.WriteLine("[{0}]", m.Groups[1].Value);
produces output of
[LOLOLOL] [YOU'VE] [BEEN] [PWN3D] [einszwei] [drei] [foo] [bar]