why does GCC __builtin_prefetch not improve performance?
问题 I\'m writing a program to analyze a graph of social network. It means the program needs a lot of random memory accesses. It seems to me prefetch should help. Here is a small piece of the code of reading values from neighbors of a vertex. for (size_t i = 0; i < v.get_num_edges(); i++) { unsigned int id = v.neighbors[i]; res += neigh_vals[id]; } I transform the code above to the one as below and prefetch the values of the neighbors of a vertex. int *neigh_vals = new int[num_vertices]; for (size