将32位循环计数器替换为64位会在Intel CPU上使用_mm_popcnt_u64引起疯狂的性能偏差
问题: I was looking for the fastest way to popcount large arrays of data. 我一直在寻找最快的方法来 popcount 大量数据的数量。 I encountered a very weird effect: Changing the loop variable from unsigned to uint64_t made the performance drop by 50% on my PC. 我遇到了一个 非常奇怪的 效果:将循环变量从 unsigned 更改为 uint64_t 使PC上的性能下降了50%。 The Benchmark 基准测试 #include <iostream> #include <chrono> #include <x86intrin.h> int main(int argc, char* argv[]) { using namespace std; if (argc != 2) { cerr << "usage: array_size in MB" << endl; return -1; } uint64_t size = atol(argv[1])<<20; uint64_t* buffer = new uint64_t[size/8]; char* charbuffer =