Memory write performance - GPU CPU Shared Memory

北战南征 提交于 2019-11-28 22:44:28

Reduce the size of the type

Originally, I was pre-converting signed 16-bit sized Integers as Floats (32-bit) since ultimately that is how they'll be used. This is a case where performance starts forcing you to store the values as 16-bits to cut your data-size in half.

Objective-C over Swift

For the code dealing with movement of data, you might choose Objective-C over Swift (Warren Moore recommendation). Performance of Swift in these special situations still isn't up to scratch. You can also try calling out to memcpy or similar methods. I've seen a couple of examples that used for-loop Buffer Pointers and this in my experiments performed slowly.

Difficulty of testing

I really wanted to do some of the experiments with relation to various copying methods in a playground on the machine and unfortunately this was useless. The iOS device versions of the same experiments performed completely differently. One might think that the relative performance would be the similar, but I found this to also be an invalid assumption. It would be really convenient if you could have a playground that used the iOS device as the interpreter.

You might get a large speedup via encoding your data to huffman codes and decoding on the GPU, see MetalHuffman. It depends on your data though.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!