C++ (and maths) : fast approximation of a trigonometric function

后端 未结 7 1288
野的像风
野的像风 2020-12-30 04:52

I know this is a recurring question, but I haven\'t really found a useful answer yet. I\'m basically looking for a fast approximation of the function acos in C+

7条回答
  •  借酒劲吻你
    2020-12-30 05:25

    Significant gains can be made by aligning memory and streaming in the data to your kernel. Most often this dwarfs the gains that can be made by recreating the math functions. Think of how you can improve memory access to/from your kernel operator.

    Memory access can be improved by using buffering techniques. This depends on your hardware platform. If you are running this on a DSP, you could DMA your data onto an L2 cache and schedule the instructions so that multiplier units are fully occupied.

    If you are on general purpose CPU, most you can do is to use aligned data, feed the cache lines by prefetching. If you have nested loops, then the inner most loop should go back and forth (i.e. iterate forward and then iterate backward) so that cache lines are utilised, etc.

    You could also think of ways to parallelize the computation using multiple cores. If you can use a GPU this could significantly improve performance (albeit with a lesser precision).

提交回复
热议问题