flops

How to compare performance of two pieces of codes

纵然是瞬间 提交于 2019-11-29 14:04:06
问题 I have a friendly competition with couple of guys in the field of programming and recently we have become so interested in writing efficient code. Our challenge was to try to optimize the code (in sense of cpu time and complexity) at any cost (readability, reusability, etc). The problem is, now we need to compare our codes and see which approach is better comparing to the others but we don't know any tools for this purpose. My question is, are there some (any!) tools that takes a piece of

What is FLOP/s and is it a good measure of performance?

廉价感情. 提交于 2019-11-27 17:15:15
I've been asked to measure the performance of a fortran program that solves differential equations on a multi-CPU system. My employer insists that I measure FLOP/s (Floating operations per second) and compare the results with benchmarks ( LINPACK ) but I am not convinced that it's the way to go, simply because no one can explain to me what a FLOP is. I did some research on what exactly a FLOP is and I got some pretty contradicting answers. One of the most popular answers I got was '1 FLOP = An addition and a multiplication operation'. Is that true? If so, again, physically, what exactly does

What's the relative speed of floating point add vs. floating point multiply

半世苍凉 提交于 2019-11-27 08:01:09
A decade or two ago, it was worthwhile to write numerical code to avoid using multiplies and divides and use addition and subtraction instead. A good example is using forward differences to evaluate a polynomial curve instead of computing the polynomial directly. Is this still the case, or have modern computer architectures advanced to the point where *,/ are no longer many times slower than +,- ? To be specific, I'm interested in compiled C/C++ code running on modern typical x86 chips with extensive on-board floating point hardware, not a small micro trying to do FP in software. I realize

What's the relative speed of floating point add vs. floating point multiply

烂漫一生 提交于 2019-11-26 22:16:57
问题 A decade or two ago, it was worthwhile to write numerical code to avoid using multiplies and divides and use addition and subtraction instead. A good example is using forward differences to evaluate a polynomial curve instead of computing the polynomial directly. Is this still the case, or have modern computer architectures advanced to the point where *,/ are no longer many times slower than +,- ? To be specific, I'm interested in compiled C/C++ code running on modern typical x86 chips with

FLOPS per cycle for sandy-bridge and haswell SSE2/AVX/AVX2

a 夏天 提交于 2019-11-26 12:03:23
I'm confused on how many flops per cycle per core can be done with Sandy-Bridge and Haswell. As I understand it with SSE it should be 4 flops per cycle per core for SSE and 8 flops per cycle per core for AVX/AVX2. This seems to be verified here, How do I achieve the theoretical maximum of 4 FLOPs per cycle? ,and here, Sandy-Bridge CPU specification . However the link below seems to indicate that Sandy-bridge can do 16 flops per cycle per core and Haswell 32 flops per cycle per core http://www.extremetech.com/computing/136219-intels-haswell-is-an-unprecedented-threat-to-nvidia-amd . Can someone

What is FLOP/s and is it a good measure of performance?

て烟熏妆下的殇ゞ 提交于 2019-11-26 11:56:52
问题 I\'ve been asked to measure the performance of a fortran program that solves differential equations on a multi-CPU system. My employer insists that I measure FLOP/s (Floating operations per second) and compare the results with benchmarks (LINPACK) but I am not convinced that it\'s the way to go, simply because no one can explain to me what a FLOP is. I did some research on what exactly a FLOP is and I got some pretty contradicting answers. One of the most popular answers I got was \'1 FLOP =

FLOPS per cycle for sandy-bridge and haswell SSE2/AVX/AVX2

安稳与你 提交于 2019-11-26 02:42:04
问题 I\'m confused on how many flops per cycle per core can be done with Sandy-Bridge and Haswell. As I understand it with SSE it should be 4 flops per cycle per core for SSE and 8 flops per cycle per core for AVX/AVX2. This seems to be verified here, How do I achieve the theoretical maximum of 4 FLOPs per cycle? ,and here, Sandy-Bridge CPU specification. However the link below seems to indicate that Sandy-bridge can do 16 flops per cycle per core and Haswell 32 flops per cycle per core http://www