Sorry I don't have any references for you, but I do have another anecdote to add to the pile.
I had a rather large std::map that I was generating using Microsoft's CString object as the key. Performance was unacceptable. Since all of my strings were identical in length, I created a class wrapper around an old-fashioned fixed-size array of chars, to emulate the interface of CString. Unfortunately I can't remember the exact speedup, but it was significant, and the resulting performance was more than adequate.
Sometimes you need to know a little about the library constructs you rely upon.