If you really want to get it right from the beginning, you should look at ICU, i.e. Unicode support, unless you are sure your strings will never hold anything but plain ASCII-7... Searching, regular expressions, tokenization is all in there.
Of course, going C++ would make things much easier, but even then my recommendation of ICU would stand.