Faster approximate reciprocal square root of an array
问题 How to calculate approximate reciprocal square root of an array faster on a cpu with popcnt and SSE4.2? The input is positive integers (ranges from 0 to about 200,000) stored in an array of floats. The output is an array of floats. Both arrays have correct memory alignment for sse. The code below only use 1 xmm register, runs on linux, and can be compiled by gcc -O3 code.cpp -lrt -msse4.2 Thank you. #include <iostream> #include <emmintrin.h> #include <time.h> using namespace std; void print