How to Speed Up Metal Code for iOS/Mac OS

后端 未结 2 789
轮回少年
轮回少年 2020-12-31 20:18

I\'m trying to implement code in Metal that performs a 1D convolution between two vectors with lengths. I\'ve implemented the following which works correctly



        
2条回答
  •  悲哀的现实
    2020-12-31 20:48

    The following code shows how to render encoded commands in parallel on the GPU using the Objective-C Metal API (the threading code above only divides rendering of the output into grid sections for parallel processing; the calculations are still not performed in parallel). It is what you're referring to in your question, even while it's not exactly what you want. I've provided this answer to help anyone who might have stumbled upon this question, thinking that it was going to provide an answer related to parallel rendering (when, in fact, it does not):

        - (void)drawInMTKView:(MTKView *)view
        {
            dispatch_async(((AppDelegate *)UIApplication.sharedApplication.delegate).cameraViewQueue, ^{
                        id  drawable = [view currentDrawable]; //[(CAMetalLayer *)view.layer nextDrawable];
                        MTLRenderPassDescriptor *renderPassDesc = [view currentRenderPassDescriptor];
                        renderPassDesc.colorAttachments[0].loadAction = MTLLoadActionClear;
                        renderPassDesc.colorAttachments[0].clearColor = MTLClearColorMake(0.0,0.0,0.0,1.0);
                        renderPassDesc.renderTargetWidth = self.texture.width;
                        renderPassDesc.renderTargetHeight = self.texture.height;
                        renderPassDesc.colorAttachments[0].texture = drawable.texture;
                        if (renderPassDesc != nil)
                        {
                            dispatch_semaphore_wait(self._inflight_semaphore, DISPATCH_TIME_FOREVER);
                            id  commandBuffer = [self.metalContext.commandQueue commandBuffer];
                            [commandBuffer enqueue];
                // START PARALLEL RENDERING OPERATIONS HERE
                            id  parallelRCE = [commandBuffer parallelRenderCommandEncoderWithDescriptor:renderPassDesc];
    // FIRST PARALLEL RENDERING OPERATION
                            id  renderEncoder = [parallelRCE renderCommandEncoder];
    
                            [renderEncoder setRenderPipelineState:self.metalContext.renderPipelineState];
    
                            [renderEncoder setVertexBuffer:self.metalContext.vertexBuffer offset:0 atIndex:0];
                            [renderEncoder setVertexBuffer:self.metalContext.uniformBuffer offset:0 atIndex:1];
    
                            [renderEncoder setFragmentBuffer:self.metalContext.uniformBuffer offset:0 atIndex:0];
    
                            [renderEncoder setFragmentTexture:self.texture
                                                      atIndex:0];
    
                            [renderEncoder drawPrimitives:MTLPrimitiveTypeTriangleStrip
                                              vertexStart:0
                                              vertexCount:4
                                            instanceCount:1];
    
                            [renderEncoder endEncoding];
                // ADD SECOND, THIRD, ETC. PARALLEL RENDERING OPERATION HERE
    .
    .
    .
    // SUBMIT ALL RENDERING OPERATIONS IN PARALLEL HERE
                            [parallelRCE endEncoding];
    
                            __block dispatch_semaphore_t block_sema = self._inflight_semaphore;
                            [commandBuffer addCompletedHandler:^(id buffer) {
                                dispatch_semaphore_signal(block_sema);
    
                            }];
    
                            if (drawable)
                                [commandBuffer presentDrawable:drawable];
                            [commandBuffer commit];
                            [commandBuffer waitUntilScheduled];
                        }
            });
        }
    

    In the above example, you would duplicate the renderEncoder-related for each calculation you want to perform in parallel. I do not see how this would be of benefit to you in your code example, as one operation appears to be dependent on another. Probably, then, the best you could hope for is the code provided to you by warrenm, even though that doesn't really qualify as parallel rendering, though.

提交回复
热议问题