Math functions takes more cycles after running any intel AVX function [duplicate]

一个人想着一个人 提交于 2019-12-07 03:19:38

问题


I've noticed that math functions (like ceil, round, ...) take more CPU cycles after running any intel AVX function.

See following example:

#include <stdio.h>
#include <math.h>
#include <immintrin.h>


static unsigned long int get_rdtsc(void)
{
        unsigned int a, d;
        asm volatile("rdtsc" : "=a" (a), "=d" (d));
        return (((unsigned long int)a) | (((unsigned long int)d) << 32));
}

#define NUM_ITERATIONS 10000000

void run_round()
{
    unsigned long int t1, t2, res, i;
    double d = 3.2;

    t1 = get_rdtsc();
    for (i = 0 ; i < NUM_ITERATIONS ; ++i) {
        res = round(d*i);
    }
    t2 = get_rdtsc();

    printf("round res %lu total cycles %lu CPI %lu\n", res, t2 - t1, (t2 - t1) / NUM_ITERATIONS);
 }

int main ()
{
    __m256d a;

    run_round();

    a = _mm256_set1_pd(1);

    run_round();

    return 0;
}

compile with: gcc -Wall -lm -mavx foo.c

The output is:

round res 31999997 total cycles 224725952 CPI 22

round res 31999997 total cycles 1900864520 CPI 190

Please advise.


回答1:


Disassemble the generated code.

My guess would be that there is additional register saving/restoring going on, or something like that.



来源:https://stackoverflow.com/questions/20545539/math-functions-takes-more-cycles-after-running-any-intel-avx-function

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!