I've written a permutation algorithm recently. It uses a vector of type T (template) instead of a string, and it's not super-fast because it uses recursion and there's a lot of copying. But perhaps you can draw some inspiration for the code. You can find the code here.