My input consists of user-posted strings.
What I want to do is create a dictionary with words, and how often they’ve been used. This means I want to parse a string,
My gut feeling would not be to use regular expressions, but just do a loop or two.
Iterate over each char in the string, if not a valid char, replace it with a space Then use String.Split() and split over spaces.
Appostrophes and hyphens may be a little more tricky to determine if they are junk characters or legite ones. But if you are using a for loop to iterate over the string then looking backwards and forwards from the current character should help you.
Then you will have a list of words - for each of these words check if they are valid in your dictionary. If you want this to be fast, performing somekind of binary search would be best. But just to get it working a linear search would be easier to start with.
EDIT: I only mentioned the dictionary thing because I thought you might be interested only in legitimate words, ie not "asdfasdf" but ignore that last statement if that's not what you need.