Here\'s a bit of code that is a considerable bottleneck after doing some measuring:
//-----------------------------------------------------------------------
A proper implementation of the IO library would cache the data for you, avoiding excessive disk accesses and system calls. I recommend that you use a system-call level tool (e.g. strace if you're under Linux) to check what actually happens with your IO.
Obviously, dict.insert(xxx) could also be a nuisance if it doesn't allow O(1) insertion.