iOS Concurrency - Not reaching anywhere's near theoretical maximum

前端 未结 2 671
臣服心动
臣服心动 2020-12-22 03:31

I\'m new to Grand Central Dispatch and have been running some tests with it doing some processing on an image. Basically I\'m running a grayscale algorithm both sequentially

2条回答
  •  被撕碎了的回忆
    2020-12-22 04:05

    In my tests, I found that if I just focused on the concurrent B&W conversion, I achieved something close to the "twice the speed" that you were expecting (the parallel rendition took 53% as long as the serial rendition). When I also included the ancillary portions of the conversion (not only the conversion, but also the retrieval of the image, preparation of the output pixel buffer, and creation of the new image, etc.), then the resulting performance improvement was less spectacular, where elapsed time was 79% as long as the serial rendition.

    In terms of why you might not achieve an absolute doubling of performance, even if you just focus on the portion that can enjoy concurrency, Apple attributes this behavior to the overhead in scheduling code for execution. In their discussion about using dispatch_apply in the Performing Loop Iterations Concurrently in the Concurrency Programming Guide, they contemplate the balance between the performance gain of concurrent tasks and the overhead that each dispatched block entails:

    You should make sure that your task code does a reasonable amount of work through each iteration. As with any block or function you dispatch to a queue, there is overhead to scheduling that code for execution. If each iteration of your loop performs only a small amount of work, the overhead of scheduling the code may outweigh the performance benefits you might achieve from dispatching it to a queue. If you find this is true during your testing, you can use striding to increase the amount of work performed during each loop iteration. With striding, you group together multiple iterations of your original loop into a single block and reduce the iteration count proportionately. For example, if you perform 100 iterations initially but decide to use a stride of 4, you now perform 4 loop iterations from each block and your iteration count is 25. For an example of how to implement striding, see “Improving on Loop Code.”

    As an aside, I think it might be worth considering creating your own concurrent queue and using dispatch_apply. It is designed for precisely this purpose, optimizing for loops that can enjoy concurrency.


    Here is my code that I used for my benchmarking:

    - (UIImage *)convertImage:(UIImage *)image algorithm:(NSString *)algorithm
    {
        CGImageRef imageRef = image.CGImage;
        NSAssert(imageRef, @"Unable to get CGImageRef");
    
        CGDataProviderRef provider = CGImageGetDataProvider(imageRef);
        NSAssert(provider, @"Unable to get provider");
    
        NSData *data = CFBridgingRelease(CGDataProviderCopyData(provider));
        NSAssert(data, @"Unable to copy image data");
    
        NSInteger       bitsPerComponent = CGImageGetBitsPerComponent(imageRef);
        NSInteger       bitsPerPixel     = CGImageGetBitsPerPixel(imageRef);
        CGBitmapInfo    bitmapInfo       = CGImageGetBitmapInfo(imageRef);
        NSInteger       bytesPerRow      = CGImageGetBytesPerRow(imageRef);
        NSInteger       width            = CGImageGetWidth(imageRef);
        NSInteger       height           = CGImageGetHeight(imageRef);
        CGColorSpaceRef colorspace       = CGImageGetColorSpace(imageRef);
    
        void *outputBuffer = malloc(width * height * bitsPerPixel / 8);
        NSAssert(outputBuffer, @"Unable to allocate buffer");
    
        uint8_t *buffer = (uint8_t *)[data bytes];
    
        CFAbsoluteTime start = CFAbsoluteTimeGetCurrent();
    
        if ([algorithm isEqualToString:kImageAlgorithmSimple]) {
            [self convertToBWSimpleFromBuffer:buffer toBuffer:outputBuffer width:width height:height];
        } else if ([algorithm isEqualToString:kImageAlgorithmDispatchApply]) {
            [self convertToBWConcurrentFromBuffer:buffer toBuffer:outputBuffer width:width height:height count:2];
        } else if ([algorithm isEqualToString:kImageAlgorithmDispatchApply4]) {
            [self convertToBWConcurrentFromBuffer:buffer toBuffer:outputBuffer width:width height:height count:4];
        } else if ([algorithm isEqualToString:kImageAlgorithmDispatchApply8]) {
            [self convertToBWConcurrentFromBuffer:buffer toBuffer:outputBuffer width:width height:height count:8];
        }
    
        NSLog(@"%@: %.2f", algorithm, CFAbsoluteTimeGetCurrent() - start);
    
        CGDataProviderRef outputProvider = CGDataProviderCreateWithData(NULL, outputBuffer, sizeof(outputBuffer), releaseData);
    
        CGImageRef outputImageRef = CGImageCreate(width,
                                                  height,
                                                  bitsPerComponent,
                                                  bitsPerPixel,
                                                  bytesPerRow,
                                                  colorspace,
                                                  bitmapInfo,
                                                  outputProvider,
                                                  NULL,
                                                  NO,
                                                  kCGRenderingIntentDefault);
    
        UIImage *outputImage = [UIImage imageWithCGImage:outputImageRef];
    
        CGImageRelease(outputImageRef);
        CGDataProviderRelease(outputProvider);
    
        return outputImage;
    }
    
    /** Convert the image to B&W as a single (non-parallel) task.
     *
     * This assumes the pixel buffer is in RGBA, 8 bits per pixel format.
     *
     * @param inputButter  The input pixel buffer.
     * @param outputBuffer The output pixel buffer.
     * @param width        The image width in pixels.
     * @param height       The image height in pixels.
     */
    - (void)convertToBWSimpleFromBuffer:(uint8_t *)inputBuffer toBuffer:(uint8_t *)outputBuffer width:(NSInteger)width height:(NSInteger)height
    {
        for (NSInteger row = 0; row < height; row++) {
    
            for (NSInteger col = 0; col < width; col++) {
    
                NSUInteger offset = (col + row * width) * 4;
                uint8_t *rgba = inputBuffer + offset;
    
                uint8_t red   = rgba[0];
                uint8_t green = rgba[1];
                uint8_t blue  = rgba[2];
                uint8_t alpha = rgba[3];
    
                uint8_t gray = 0.2126 * red + 0.7152 * green + 0.0722 * blue;
    
                outputBuffer[offset]     = gray;
                outputBuffer[offset + 1] = gray;
                outputBuffer[offset + 2] = gray;
                outputBuffer[offset + 3] = alpha;
            }
        }
    }
    
    /** Convert the image to B&W, using GCD to split the conversion into several concurrent GCD tasks.
     *
     * This assumes the pixel buffer is in RGBA, 8 bits per pixel format.
     *
     * @param inputButter  The input pixel buffer.
     * @param outputBuffer The output pixel buffer.
     * @param width        The image width in pixels.
     * @param height       The image height in pixels.
     * @param count        How many GCD tasks should the conversion be split into.
     */
    - (void)convertToBWConcurrentFromBuffer:(uint8_t *)inputBuffer toBuffer:(uint8_t *)outputBuffer width:(NSInteger)width height:(NSInteger)height count:(NSInteger)count
    {
        dispatch_queue_t queue = dispatch_queue_create("com.domain.app", DISPATCH_QUEUE_CONCURRENT);
        NSInteger stride = height / count;
    
        dispatch_apply(height / stride, queue, ^(size_t idx) {
    
            size_t j = idx * stride;
            size_t j_stop = MIN(j + stride, height);
    
            for (NSInteger row = j; row < j_stop; row++) {
    
                for (NSInteger col = 0; col < width; col++) {
    
                    NSUInteger offset = (col + row * width) * 4;
                    uint8_t *rgba = inputBuffer + offset;
    
                    uint8_t red   = rgba[0];
                    uint8_t green = rgba[1];
                    uint8_t blue  = rgba[2];
                    uint8_t alpha = rgba[3];
    
                    uint8_t gray = 0.2126 * red + 0.7152 * green + 0.0722 * blue;
    
                    outputBuffer[offset]     = gray;
                    outputBuffer[offset + 1] = gray;
                    outputBuffer[offset + 2] = gray;
                    outputBuffer[offset + 3] = alpha;
                }
            }
        });
    
        return YES;
    }
    
    void releaseData(void *info, const void *data, size_t size)
    {
        free((void *)data);
    }
    

    On an iPhone 5, this took 2.24 seconds to convert a 7360 × 4912 image with the simple, serial method, and took 1.18 seconds when I used dispatch_apply with two loops. When I tried 4 or 8 dispatch_apply loops, I saw no further performance gain.

提交回复
热议问题