Besides the fact that the standard defines it to be contiguous, why is std::vector contiguous?
If it runs out of space, it needs to reallocate a new block and copy t
If std::vector
didn't guarantee contiguousness, a new container would be invented which did.
The contiguity guarantee makes it easier to inter-operate with existing code that expects a contiguous array, and also gives very good performance because it is cache-friendly. (Inserting/deleting in the middle is in practice very fast for moderate sizes because of this.)
Copying the array on expansion is surprisingly cheap - if you append to a vector a million elements one at a time, each element will have been copied on average around once.
There are a few reasons for this:
First, iteration over a contiguous container is a lot faster than over a non-contiguous one due to two factors: the first is branch prediction - the processor doesn't need to throw away its pipeline every time you finish reading one of the sub-containers, and less pipeline resets means faster code. The second is that it's much easier to fully cache a contiguous block of memory than a bunch of assorted small blocks, making it much more likely that your array is cached on its entirety.
Second, there's a lot of C++ code being written out there that has to interact with C code, and a lot of that code will expect a contiguous space of memory when receiving an array/buffer, because that's the least data structure implementation-dependent way to do it. When you're interacting with code that expects buffers/arrays constantly, the overhead of converting your std::deque
into an array takes its toll compared to the practically instantaneous passage of an std::vector
to an array (which can be basically just giving a pointer to the internal array).
All of this justifies the existence of a contiguous container. As others have said, when you don't need either fast iteration or contiguousness of memory, you can always use an std::deque
.
As a complement to the other answers (they are quite complete), there is one situation when you do prefer vectors to not be contiguous: when you need to resize a vector concurrently. That is why Intel Thread Building Block provides tbb::concurrent_vector, which is more or less what you said you would expect
"When the storage fills up, it would just allocate a new block and keep the old block. When accessing through an iterator, it would do simple >, < checks to see which block the index is in and return it."
Then, a comparison between tbb::concurrent_vector and std::vector would give you a better understanding of the advantages (speed) and disadvantages (cannot grow std::vector concurrently) of contiguous memory. I expect tbb::concurrent_vector to be better optimized than std::deque and that is why tbb::concurrent_vector vs std::vector is a more fair comparison.
By making std::vector
contiguous, it can be treated much like an array. However, it's also resizable. Its size is definable at runtime, rather than compile time. Also, a vector can be used to allocate memory for functions that require a buffer. The advantage of this is the memory will be free'd by the vector
when it goes out of scope. For example, when using ReadFile a vector can be used to create a buffer.:
unsigned int bytesRead = 0;
std::vector<char> buffer(fileSize);
// open file, etc.
ReadFile(hFileIn, buffer.data(), buffer.size(), &bytesRead, nullptr);
Note that data is new in C++11. In older code you will probably seen an equivalent &(buffer.at(0))
or &(buffer[0])
which returns the address of the first element.
A std::deque
would be a better fit for what you're describing.
The standard C++ library defines a non-contiguous array-like container, too: std::deque<T>
. Iteration over a std::deque<T>
is much slower than iterating over a std::vector<T>
. If the operation is fairly trivial, it may be something like 5 times slower: these are actual times I get when comparing accumulating a sequence of integers. This is the cost you are paying for a non-contiguous representation!
The reason for this fairly steep slowdown is that gcc knows how to vectorize the loop over a std::vector<int>
but not for a std::deque<int>
. Even without vectorization the iteration is about 30% slower. That is, the fairly small cost of std::vector<T>
's re-allocations actually don't really matter that much!