Apple Accelerate Framework scale and normalize a vector

浪子不回头ぞ 提交于 2020-05-12 02:48:58

问题


What functions can I use in Accelerate.framework to scale a vector by a scalar, and normalize a vector? I found one I think might work for scaling in the documentation but I am confused about it's operation.

vDSP_vsma
Vector scalar multiply and vector add; single precision.

void vDSP_vsma (
   const float *__vDSP_A,
   vDSP_Stride __vDSP_I,
   const float *__vDSP_B,
   const float *__vDSP_C,
   vDSP_Stride __vDSP_K,
   float *__vDSP_D,
   vDSP_Stride __vDSP_L,
   vDSP_Length __vDSP_N
);

回答1:


The easiest way to normalize a vector in-place is something like

int n = 3;
float v[3] = {1, 2, 3};
cblas_sscal(n, 1.0 / cblas_snrm2(n, v, 1), v, 1);

You'll need to

#include <cblas.h>

or

#include <vblas.h>

(or both). Note that several of the functions are in the "matrix" section when they operate on vectors.

If you want to use the vDSP functions, see the Vector-Scalar Division section. There are several things you can do:

  • vDSP_dotpr(), sqrt(), and vDSP_vsdiv()
  • vDSP_dotpr(), vrsqrte_f32(), and vDSP_vsmul() (vrsqrte_f32() is a NEON GCC built-in, though, so you need to check you're compiling for armv7).
  • vDSP_rmsqv(), multiply by sqrt(n), and vDSP_vsdiv()

The reason why there isn't a vector-normalization function is because the "vector" in vDSP means "lots of things at once" (up to around 4096/8192) and necessarily the "vector" from linear algebra. It's pretty meaningless to normalize a 1024-element vector, and a quick function for normalizing a 3-element vector isn't something that will make your app significantly faster, which is why there isn't one.

The intended usage of vDSP is more like normalizing 1024 2- or 3-element vectors. I can spot a handful of ways to do this:

  • Use vDSP_vdist() to get a vector of lengths, followed by vDSP_vdiv(). You have to use vDSP_vdist() multiple times for vectors of length greater than 2, though.
  • Use vDSP_vsq() to square all the inputs, vDSP_vadd() multiple times to add all of them, the equivalent of vDSP_vsqrt() or vDSP_vrsqrt(), and vDSP_vmul() or vDSP_vdiv() as appropriate. It shouldn't be too hard to write the equivalent of vDSP_vsqrt() or vDSP_vrsqrt().
  • Various ways which pretend your input is a complex vector. Not likely to be faster.

Of course, if you don't have 1024 vectors to normalize, don't overcomplicate things.

Notes:

  1. I don't use "2-vector" and "3-vector" to avoid confusion with the "four-vector" from relativity.
  2. A good choice of n is one that nearly fills your L1 data cache. It's not difficult; they've been relatively fixed at 32K for around a decade or more (they may be shared between virtual cores in a hyperthreaded CPU and some older/cheaper processors might have 16K), so the most you should do is around 8192 for in-place operation on floats. You might want to subtract a little for stack space, and if you're doing several sequential operations you probably want to keep it all in cache; 1024 or 2048 seem pretty sensible and any more will probably hit diminishing returns. If you care, measure performance...


来源:https://stackoverflow.com/questions/4251716/apple-accelerate-framework-scale-and-normalize-a-vector

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!