I am about to start a project which will be taking blocks of text, parsing a lot of data into them into some sort of object which can then be serialized, stored, and statistics
I don't know what kind of processing you're doing here, but if you're talking hundreds of thousands of strings per day, it seems like a pretty small number. Let's assume that you get 1 million new strings to process every day, and you can fully task 10 of those 12 Xeon cores. That's 100,000 strings per core per day. There are 86,400 seconds in a day, so we're talking 0.864 seconds per string. That's a lot of parsing.
I'll echo the recommendations made by @Pieter, especially where he suggests making measurements to see how long it takes to do your processing. Your best bet is to get something up and working, then figure out how to make it faster if you need to. I think you'll be surprised at how often you don't need to do any optimization. (I know that's heresy to the optimization wizards, but processor time is cheap and programmer time is expensive.)
How much slower is using regex's to substr?
That depends entirely on how complex your regexes are. As @Pieter said, if you're looking for a single string, String.Contains
will probably be faster. You might also consider using String.IndexOfAny
if you're looking for constant strings. Regular expressions aren't necessary unless you're looking for patterns that can't be represented as constant strings.
Is .NET going to be significantly slower than other languages?
In processor-intensive applications, .NET can be slower than native apps. Sometimes. If so, it's typically in the range of 5 to 20 percent, and most often between 7 and 12 percent. That's just the code executing in isolation. You have to take into account other factors like how long it takes you to build the program in that other language and how difficult it is to share data between the native app and the rest of your system.