问题
I work on an iPad application that has a sync process that uses web services and Core Data in a tight loop. To reduce the memory footprint according to Apple's Recomendation I allocate and drain an NSAutoreleasePool
periodically. This currently works great and there are no memory issues with the current application. However, I plan on moving to ARC where the NSAutoreleasePool
is no longer valid and would like to maintain this same kind of performance. I created a few examples and timed them and I am wondering what is the best approach, using ARC, to acheive the same kind of performance and maintain code readability.
For testing purposes I came up with 3 scenarios, each create a string using a number between 1 and 10,000,000. I ran each example 3 times to determine how long they took using a Mac 64 bit application with the Apple LLVM 3.0 compiler (w/o gdb -O0) and XCode 4.2. I also ran each example through instruments to see roughly what the memory peak was.
Each of the examples below are contained within the following code block:
int main (int argc, const char * argv[])
{
@autoreleasepool {
NSDate *now = [NSDate date];
//Code Example ...
NSTimeInterval interval = [now timeIntervalSinceNow];
printf("Duration: %f\n", interval);
}
}
NSAutoreleasePool Batch [Original Pre-ARC] (Peak Memory: ~116 KB)
static const NSUInteger BATCH_SIZE = 1500;
NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
for(uint32_t count = 0; count < MAX_ALLOCATIONS; count++)
{
NSString *text = [NSString stringWithFormat:@"%u", count + 1U];
[text class];
if((count + 1) % BATCH_SIZE == 0)
{
[pool drain];
pool = [[NSAutoreleasePool alloc] init];
}
}
[pool drain];
Run Times:
10.928158
10.912849
11.084716
Outer @autoreleasepool (Peak Memory: ~382 MB)
@autoreleasepool {
for(uint32_t count = 0; count < MAX_ALLOCATIONS; count++)
{
NSString *text = [NSString stringWithFormat:@"%u", count + 1U];
[text class];
}
}
Run Times:
11.489350
11.310462
11.344662
Inner @autoreleasepool (Peak Memory: ~61.2KB)
for(uint32_t count = 0; count < MAX_ALLOCATIONS; count++)
{
@autoreleasepool {
NSString *text = [NSString stringWithFormat:@"%u", count + 1U];
[text class];
}
}
Run Times:
14.031112
14.284014
14.099625
@autoreleasepool w/ goto (Peak Memory: ~115KB)
static const NSUInteger BATCH_SIZE = 1500;
uint32_t count = 0;
next_batch:
@autoreleasepool {
for(;count < MAX_ALLOCATIONS; count++)
{
NSString *text = [NSString stringWithFormat:@"%u", count + 1U];
[text class];
if((count + 1) % BATCH_SIZE == 0)
{
count++; //Increment count manually
goto next_batch;
}
}
}
Run Times:
10.908756
10.960189
11.018382
The goto
statement offered the closest performance, but it uses a goto
. Any thoughts?
Update:
Note: The goto
statement is a normal exit for an @autoreleasepool as stated in the documentation and will not leak memory.
On entry, an autorelease pool is pushed. On normal exit (break, return, goto, fall-through, and so on) the autorelease pool is popped. For compatibility with existing code, if exit is due to an exception, the autorelease pool is not popped.
回答1:
The following should achieve the same thing as the goto
answer without the goto
:
for (NSUInteger count = 0; count < MAX_ALLOCATIONS;)
{
@autoreleasepool
{
for (NSUInteger j = 0; j < BATCH_SIZE && count < MAX_ALLOCATIONS; j++, count++)
{
NSString *text = [NSString stringWithFormat:@"%u", count + 1U];
[text class];
}
}
}
回答2:
Note that ARC enables significant optimizations which are not enabled at -O0
. If you're going to measure performance under ARC, you must test with optimizations enabled. Otherwise, you'll be measuring your hand-tuned retain/release placement against ARC's "naive mode".
Run your tests again with optimizations and see what happens.
Update: I was curious, so I ran it myself. These are the runtime results in Release mode (-Os), with 7,000,000 allocations.
arc-perf[43645:f803] outer: 8.1259
arc-perf[43645:f803] outer: 8.2089
arc-perf[43645:f803] outer: 9.1104
arc-perf[43645:f803] inner: 8.4817
arc-perf[43645:f803] inner: 8.3687
arc-perf[43645:f803] inner: 8.5470
arc-perf[43645:f803] withGoto: 7.6133
arc-perf[43645:f803] withGoto: 7.7465
arc-perf[43645:f803] withGoto: 7.7007
arc-perf[43645:f803] non-ARC: 7.3443
arc-perf[43645:f803] non-ARC: 7.3188
arc-perf[43645:f803] non-ARC: 7.3098
And the memory peaks (only run with 100,000 allocations, because Instruments was taking forever):
Outer: 2.55 MB
Inner: 723 KB
withGoto: ~747 KB
Non-ARC: ~748 KB
These results surprise me a little. Well, the memory peak results don't; it's exactly what you'd expect. But the run time difference between inner
and withGoto
, even with optimizations enabled, is higher than what I would anticipate.
Of course, this is somewhat of a pathological micro-test, which is very unlikely to model real-world performance of any application. The takeaway here is that ARC may indeed some amount of overhead, but you should always measure your actual application before making assumptions.
(Also, I tested @ipmcc's answer using nested for loops; it behaved almost exactly like the goto
version.)
来源:https://stackoverflow.com/questions/9675355/reduce-peak-memory-usage-with-autoreleasepool