How slow (how many cycles) is calculating a square root?

后端 未结 3 1649
礼貌的吻别
礼貌的吻别 2020-11-30 08:40

How slow (how many cycles) is calculating a square root? This came up in a molecular dynamics course where efficiency is important and taking unnecessary square roots had a

3条回答
  •  天涯浪人
    2020-11-30 09:06

    From Agner Fog's Instruction Tables:

    On Core2 65nm, FSQRT takes 9 to 69 cc's (with almost equal reciprocal throughput), depending on the value and precision bits. For comparison, FDIV takes 9 to 38 cc's (with almost equal reciprocal throughput), FMUL takes 5 (recipthroughput = 2) and FADD takes 3 (recipthroughput = 1). SSE performance is about equal, but looks faster because it can't do 80bit math. SSE has a super fast approximate reciprocal and approximate reciprocal sqrt though.

    On Core2 45nm, division and square root got faster; FSQRT takes 6 to 20 cc's, FDIV takes 6 to 21 cc's, FADD and FMUL haven't changed. Once again SSE performance is about the same.

    You can get the documents with this information from his website.

提交回复
热议问题