问题
When numbers are smaller, it's quick to grow the size of an array list from 2 to 4 memory addresses but when it starts to increase the amount of space closer to the max amount of space allowed in an array list (close to the 2MB limit). Would changing how much space is allotted in those bigger areas be more efficient if it was only growing the size of the array by a fraction of the size it needs at some point? Obviously growing the size from 1mb to 2mb isn't really a big deal now-days HOWEVER, if you had 50,000 people running something per hour that did this doubling the size of an array, I'm curious if that would be a good enough reason to alter how this works. Not to mention cut down on un-needed memory space (in theory).
A small graphical representation of what I mean.. ArrayList a has 4 elements in it and that is it's current max size at the moment
||||
Now lets add another item to the arraylist, the internal code will double the size of the array even though we're only adding one thing to the array. The arraylist now becomes 8 elements large
||||||||
At these size levels, I doubt it makes any difference but when you're allocating 1mb up to 2mb everytime someone is doing something like adding some file into an arraylist or something that is around 1.25mb, there's .75mb of un-needed space allocated.
To give you more of an idea of the code that is currently ran in c# by the System.Collections.Generic class. The way it works now is it doubles the size of an array list (read array), every time a user tries to add something to an array that is too small. Doubling the size is a good solution and makes sense, until you're essentially growing it far bigger than you technically need it to be.
Here's the source for this particular part of the class:
private void EnsureCapacity(int min)
{
if (this._items.Length >= min)
return;
// This is what I'm refering to
int num = this._items.Length == 0 ? 4 : this._items.Length * 2;
if ((uint) num > 2146435071U)
num = 2146435071;
if (num < min)
num = min;
this.Capacity = num;
}
I'm going to guess that this is how memory management is handled in many programming languages so this has probably been considered many times before, just wondering if this is a kind of efficiency saver that could save system resources by a large amount on a massive scale.
回答1:
As the size of the collection gets larger, so does the cost of creating a new buffer as you need to copy over all of the existing elements. The fact that the number of these copies that need to be done is indirectly proportional to the expense of each copy is exactly why the amortized cost of adding items to a List
is O(1). If the size of the buffer increases linearly, then the amortized cost of adding an item to a List
actually becomes O(n).
You save on memory, allowing the "wasted" memory to go from being O(n) to being O(1). As with virtually all performance/algorithm decisions, we're once again faced with the quintessential decision of exchanging memory for speed. We can save on memory and have slower adding speeds (because of more copying) or we can use more memory to get faster additions. Of course there is no one universally right answer. Some people really would prefer to have a slower addition speed in exchange for less wasted memory. The particular resource that is going to run out first is going to vary based on the program, the system that it's running on, and so forth. Those people in the situation where the memory is the scarcer resource may not be able to use List
, which is designed to be as wildly applicable as possible, even though it can't be universally the best option.
回答2:
The idea behind the exponential growth factor for dynamic arrays such as List<T>
is that:
The amount of wasted space is always merely proportional to the amount of data in the array. Thus you are never wasting resources on a more massive scale than you are properly using.
Even with many, many reallocations, the total potential time spent copying while creating an array of size N is O(N) -- or O(1) for a single element.
Access time is extremely fast at O(1) with a small coefficient.
This makes List<T>
very appropriate for arrays of, say, in-memory tables of references to database objects, for which near-instant access is required but the array elements themselves are small.
Conversely, linear growth of dynamic arrays can result in n-squared memory wastage. This happens in the following situation:
You add something to the array, expanding it to size N for large N, freeing the previous memory block (possibly quite large) of size N-K for small K.
You allocate a few objects. The memory manager puts some in the large memory block just vacated, because why not?
You add something else to the array, expanding it to size N+K for some small K. Because the previously freed memory block now is sparsely occupied, the memory manager does not have a large enough contiguous free memory block and must request more virtual memory from the OS.
Thus virtual memory committed grows quadratically despite the measured size of objects created growing linearly.
This isn't a theoretical possibility. I actually had to fix an n-squared memory leak that arose because somebody had manually coded a linearly-growing dynamic array of integers. The fix was to throw away the manual code and use the library of geometrically-growing arrays that had been created for that purpose.
That being said, I also have seen problems with the exponential reallocation of List<T>
(as well as the similarly-growing memory buffer in Dictionary<TKey,TValue>
) in 32-bit processes when the total memory required needs to grow past 128 MB. In this case the List or Dictionary will frequently be unable to allocate a 256 MB contiguous range of memory even if there is more than sufficient virtual address space left. The application will then report an out-of-memory error to the user. In my case, customers complained about this since Task Manager was reporting that VM use never went over, say, 1.5GB. If I were Microsoft I would damp the growth of 'List' (and the similar memory buffer in Dictionary) to 1% of total virtual address space.
来源:https://stackoverflow.com/questions/24831998/lists-double-their-space-in-c-sharp-when-they-need-more-room-at-some-point-does