compiler-optimization

Do compilers optimize switches differently than long if-then-else chains?

橙三吉。 提交于 2021-02-07 14:53:04
问题 Suppose I have N different integral values known at compile time, V_1 through V_N. Consider the following structures: const int x = foo(); switch(x) { case V_1: { /* commands for V_1 which don't change x */ } break; case V_2: { /* commands for V_1 which don't change x */ } break; /* ... */ case V_N: { /* commands for V_1 which don't change x */ } break; } versus const int x = foo(); if (x == V_1) { /* commands for V_1 which don't change x */ } else if (x == V_2) { /* commands for V_2 which

Which AVX and march should be specified on a cluster with different architectures?

北慕城南 提交于 2021-02-07 14:40:39
问题 I'm currently trying to compile software for the use on a HPC-Cluster using Intel compilers. The login-node, which is where I compile and prepare the computations uses Intel Xeon Gold 6148 Processors, while the compute nodes use either Haswell- (Intel Xeon E5-2660 v3 / Intel Xeon Processor E5-2680 v3) or Skylake-processors (Intel Xeon Gold 6138). As far as I understand from the links above, my login-node supports Intel SSE4.2, Intel AVX, Intel AVX2, as well as Intel AVX-512 but my compute

Which AVX and march should be specified on a cluster with different architectures?

不打扰是莪最后的温柔 提交于 2021-02-07 14:40:33
问题 I'm currently trying to compile software for the use on a HPC-Cluster using Intel compilers. The login-node, which is where I compile and prepare the computations uses Intel Xeon Gold 6148 Processors, while the compute nodes use either Haswell- (Intel Xeon E5-2660 v3 / Intel Xeon Processor E5-2680 v3) or Skylake-processors (Intel Xeon Gold 6138). As far as I understand from the links above, my login-node supports Intel SSE4.2, Intel AVX, Intel AVX2, as well as Intel AVX-512 but my compute

Which AVX and march should be specified on a cluster with different architectures?

谁说胖子不能爱 提交于 2021-02-07 14:40:22
问题 I'm currently trying to compile software for the use on a HPC-Cluster using Intel compilers. The login-node, which is where I compile and prepare the computations uses Intel Xeon Gold 6148 Processors, while the compute nodes use either Haswell- (Intel Xeon E5-2660 v3 / Intel Xeon Processor E5-2680 v3) or Skylake-processors (Intel Xeon Gold 6138). As far as I understand from the links above, my login-node supports Intel SSE4.2, Intel AVX, Intel AVX2, as well as Intel AVX-512 but my compute

Which AVX and march should be specified on a cluster with different architectures?

守給你的承諾、 提交于 2021-02-07 14:39:13
问题 I'm currently trying to compile software for the use on a HPC-Cluster using Intel compilers. The login-node, which is where I compile and prepare the computations uses Intel Xeon Gold 6148 Processors, while the compute nodes use either Haswell- (Intel Xeon E5-2660 v3 / Intel Xeon Processor E5-2680 v3) or Skylake-processors (Intel Xeon Gold 6138). As far as I understand from the links above, my login-node supports Intel SSE4.2, Intel AVX, Intel AVX2, as well as Intel AVX-512 but my compute

Allowing struct field to overflow to the next field

送分小仙女□ 提交于 2021-02-07 06:16:48
问题 Consider the following simple example: struct __attribute__ ((__packed__)) { int code[1]; int place_holder[100]; } s; void test(int n) { int i; for (i = 0; i < n; i++) { s.code[i] = 1; } } The for-loop is writing to the field code , which is of size 1. The next field after code is place_holder . I would expect that in case of n > 1 , the write to code array would overflow and 1 would be written to place_holder . However, when compiling with -O2 (on gcc 4.9.4 but probably on other versions as

Allowing struct field to overflow to the next field

前提是你 提交于 2021-02-07 06:15:20
问题 Consider the following simple example: struct __attribute__ ((__packed__)) { int code[1]; int place_holder[100]; } s; void test(int n) { int i; for (i = 0; i < n; i++) { s.code[i] = 1; } } The for-loop is writing to the field code , which is of size 1. The next field after code is place_holder . I would expect that in case of n > 1 , the write to code array would overflow and 1 would be written to place_holder . However, when compiling with -O2 (on gcc 4.9.4 but probably on other versions as

Does clang offer anything similar to GCC 6.x's function multi-versioning (target_clones)?

扶醉桌前 提交于 2021-02-07 05:46:19
问题 I've read this LWN article with great interest. Executive summary: GCC 6.x supports something called function multi-versioning which builds multiple versions of the same function, optimized for different instruction sets. Let's say you have a machine with AVX2 support and one without. It's possible to run the same binary on both, with function foo() existing in two versions, one of which uses AVX2 instructions. The function with the AVX2 instructions are, however, only called if the CPU

Will my compiler ignore useless code?

大兔子大兔子 提交于 2021-02-06 09:36:26
问题 I've been through a few questions over the network about this subject but I didn't find any answer for my question, or it's for another language or it doesn't answer totally (dead code is not useless code) so here's my question: Is (explicit or not) useless code ignored by the compiler? For example, in this code: double[] TestRunTime = SomeFunctionThatReturnDoubles; // A bit of code skipped int i = 0; for (int j = 0; j < TestRunTime.Length; j++) { } double prevSpec_OilCons = 0; will the for

Can the compiler optimize from heap to stack allocation?

断了今生、忘了曾经 提交于 2021-02-05 14:24:01
问题 As far as compiler optimizations go, is it legal and/or possible to change a heap allocation to a stack allocation? Or would that break the as-if rule? For example, say this is the original version of the code { Foo* f = new Foo(); f->do_something(); delete f; } Would a compiler be able to change this to the following { Foo f{}; f.do_something(); } I wouldn't think so, because that would have implications if the original version was relying on things like custom allocators. Does the standard