I have converted this simple method from C# to C++. It reads a path table and populates a list of lists of ints (or a vector of vectors of ints).
A sample line from
Here are a few things that I haven't seen anyone else mention. They are somewhat vague, but being unable to reproduce things makes it hard to go into specifics on all of it.
Poor man's profiling.
While the code is running, just keep interrupting it. Usually you'll see the same stack frame over and over.
Start commenting stuff out. If you comment out your splitting and it completes instantly, then its pretty clear where to start.
Some of the code is dependent, but you could read the full file into memory then do the parsing to create an obvious separation on where its spending its time. If both finish quickly independently, then it's probably interaction.
Buffering.
I don't seen any buffering on your reads. This becomes especially important if you are writing anything to disk. The arm on your disk will jump back and forth between your read location, then write location, etc.
While it doesn't look like you are writing here, your main program may have more memory being used. It is possible that after you reach your high water, the OS starts paging some of the memory to disk. You'll thrash when you are reading line by line while the paging is happening.
Usually, I'll set up a simple iterator interface to verify everything is working. Then write a decorator around it to read 500 lines at a time. The standard streams have some buffering options built in as well, and those may be better to use. I'm going to guess that their buffering defaults are pretty conservative.
Reserve.
std::vector::push_back works best when you also use std::vector::reserve. If you can make most of the memory is available before entering a tight loop, you win. You don't even have to know exactly how much, just guess.
You can actually beat std::vector::resize performance with this as well, because std::vector::resize uses alloc and std::vector::push_back will use realloc
That last bit is contested, though I've read otherwise. I have no reason to doubt that I'm wrong, though I will have to do more research to confirm or deny.
Nevertheless, push_back can run faster if you use reserve with it.
String splitting.
I've never seen a C++ iterator solution that was performant when it comes to dealing with gb+ files. I haven't tried that one specifically, though. My guess at why is that they tend to make a lot of small allocations.
Here is a reference with what I usually use.
Split array of chars into two arrays of chars
Advice on std::vector::reserve applies here.
I prefer boost::lexical_cast to stream implementations for maintenance concerns, though I can't say its more or less performant than stream implementations. I will say it is exceedingly rare to actually see correct error checking on stream usage.
STL shenanigans.
I'm intentionally vague on these, sorry. I usually write code that avoids the conditions, though I do remember some of the trials and tribulations that co-workers have told me about. Using STLPort avoids a good chunk of these entirely.
On some platforms, using stream operations have some weird thread safety enabled by default. So I've seen minor std::cout usage absolutely destroy an algorithm's performance. You don't have anything here, but if you had logging going on in another thread that could pose problems. I see a 8% _Mutex::Mutex in another comment, which may speak to its existence.
It's plausible that a degenerate STL implementation could even have the above issue with the lexical parsing stream operations.
There are odd performance characteristics on some of the containers. I don't I ever had problems with vector, but I really have no idea what istream_iterator uses internally. In the past, I've traced through an misbehaving algorithm to find a std::list::size call doing full traversal of the list with GCC, for instance. I don't know if newer versions are less inane.
The usual stupid SECURE_CRT stupidity should stupidly be taken care of. I wonder if this is what microsoft thinks we want to spend our time doing?