how to check if vDSP function runs scalar or SIMD on neon

夙愿已清 提交于 2019-12-14 03:55:53

问题


Im currently using some functions from the vDSP framework, especially the vDSP_conv and I'm wondering if there is any way to check if the function invokes scalar mode or is processed SIMD on the neon processor.
The documentation of the function mentions some criteria for power-pc-architecture which have to be fulfilled or scalar mode is invoked. Now i neither know if these criteria apply for the iphone as well nor how to check if my function invokes scalar mode or runs properly on neon.

is there a way to check this?
thanks!


回答1:


NEON code is used in the vDSP_conv implementation. It is used in some cases and not in others.

We (the Vector and Numerics Group, which produces vDSP) are not publishing criteria about which functions use NEON in part because there are a number of complicating factors: specifics about each call (strides, lengths, and alignments of multiple parameters), processor model that the code is executed on, and software version.

If you have a question about a specific case, I may be able to investigate it.

Are you asking out of curiosity, or is the performance not what you expected? Generally, the underlying concern is how fast an implementation performs and whether it could be better. SIMD may be a proxy for some of that, but it is not the actual goal.

Updated to address a comment below:

Surveying the source code for recent iOS, it looks like all you need to get SIMD code when doing correlation is to execute on a processor with NEON and set all the strides to 1. However, the code is specialized to use alignment hints if addresses are aligned, so you may get better performance on certain processor models if you arrange for the signal, filter, and output addresses to be multiples of 16 bytes. If you can, use multiples of eight for the number of filter elements, but multiples of four are good too.

Unfortunately, the code is not O(n•log(n)); it uses direct arithmetic and not an FFT implementation, so it is O(n2). Generally, it is designed for shorter lengths, where direct arithmetic is suitable. If an FFT algorithm for correlation would help you, please file a feature request at https://bugreport.apple.com.

Regardless of the algorithm used, shorter lengths are not better if you want the same information independent of length. That is because, if you process shorter lengths, you would have to process more of them, in various combinations, to get the same information. I expect the design would be to figure out what length you need so that the correlation produces the information you require, then use that length without subdividing it.



来源:https://stackoverflow.com/questions/13809552/how-to-check-if-vdsp-function-runs-scalar-or-simd-on-neon

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!