compiler-optimization

Why isn't string assignment optimised when the length is known to the compiler?

戏子无情 提交于 2019-12-11 06:16:31
问题 I was playing around today with some timing code and discovered that when asigning a string literal to std::string, that it was around 10% faster (with a short 12 char string, so likly even bigger difference for large strings) to do so with a literal of known length (using the sizeof operator) than not. (Only tested with the VC9 compiler, so I guess other compilers may do it better). std::string a("Hello World!"); std::string b("Hello World!", sizeof("Hello World!");//10% faster in my tests

Jump in the middle of basic block

拥有回忆 提交于 2019-12-11 05:49:08
问题 A basic block is defined as a sequence of (non-jump) instructions ending with a jump (direct or indirect) instruction. The jump target address should be the start of another basic block. Consider I have the following assembly code : 106ac: ba00000f blt 106f0 <main+0xb8> 106b0: e3099410 movw r9, #37904 ; 0x9410 106b4: e3409001 movt r9, #1 106b8: e79f9009 ldr r9, [pc, r9] 106bc: e3a06000 mov r6, #0 106c0: e1a0a008 mov sl, r8 106c4: e30993fc movw r9, #37884 ; 0x93fc 106c8: e3409001 movt r9, #1

Strange gcc6.1 -O2 compiling behaviour

▼魔方 西西 提交于 2019-12-11 05:47:22
问题 I am compiling the same benchmark using gcc -O2 -march=native flags. However, Interesting thing is when I look at the objdump , it actually produce some instructions like vxorpd , etc, which I think should only appear when -ftree-vectorize is enabled (and -O2 should not enable this by default?) If I add -m32 flag to compile in 32 bit instruction, these packed instructions disappeared. Anyone met similar situations could give some explanations? Thanks. 回答1: XORPD is the classic SSE2

How to optimize R performance

◇◆丶佛笑我妖孽 提交于 2019-12-11 04:41:40
问题 We have a recent performance bench mark that I am trying to understand. We have a large script that performance appears 50% slower on a Redhat Linux machine than a Windows 7 laptop where the specs are comparable. The linux machine is virtualized using kvm and has 4 cores assigned to it along with 16GB of memory. The script is not io intensive but has quite a few for loops. Mainly I am wondering if there are any R compile options that I can use to optimize or any kernel compiler options that

How to tell GCC to set structure size boundary to 4 bytes through compiler settings (not pragma's)?

不羁的心 提交于 2019-12-10 22:52:20
问题 I want my c++ program compiled under GCC to have maximum alignment of 4 bytes (of members of structures). I really can do this through #pragma pack directive. However, it's uncomfortable in my case because the project is quite big, and I would need to make a single header with #pragma pack, that has to be included everywhere. Now, the gcc compiler has an option -mstructure-size-boundary=n documented here http://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html#ARM-Options , it says "Permissible

Compiler instruction reordering optimizations in C++ (and what inhibits them)

戏子无情 提交于 2019-12-10 19:28:10
问题 I've reduced my code down to the following, which is as simple as I could make it whilst retaining the compiler output that interests me. void foo(const uint64_t used) { uint64_t ar[100]; for(int i = 0; i < 100; ++i) { ar[i] = some_global_array[i]; } const uint64_t mask = ar[0]; if((used & mask) != 0) { return; } bar(ar); // Not inlined } Using VC10 with /O2 and /Ob1, the generated assembly pretty much reflects the order of instructions in the above C++ code. Since the local array ar is only

Compiler choice of not using REP MOVSB instruction for a byte array move

余生颓废 提交于 2019-12-10 18:50:47
问题 I'm checking the Release build of my project done with the latest version of the VS 2017 C++ compiler. And I'm curious why did compiler choose to build the following code snippet: //ncbSzBuffDataUsed of type INT32 UINT8* pDst = (UINT8*)(pMXB + 1); UINT8* pSrc = (UINT8*)pDPE; for(size_t i = 0; i < (size_t)ncbSzBuffDataUsed; i++) { pDst[i] = pSrc[i]; } as such: UINT8* pDst = (UINT8*)(pMXB + 1); UINT8* pSrc = (UINT8*)pDPE; for(size_t i = 0; i < (size_t)ncbSzBuffDataUsed; i++) 00007FF66441251E 4C

Does any of current C++ compilers ever emit “rep movsb/w/d”?

混江龙づ霸主 提交于 2019-12-10 18:18:52
问题 This question made me wonder, if current modern compilers ever emit REP MOVSB/W/D instruction. Based on this discussion, it seems that using REP MOVSB/W/D could be beneficial on current CPUs. But no matter how I tried, I cannot made any of the current compilers (GCC 8, Clang 7, MSVC 2017 and ICC 18) to emit this instruction. For this simple code, it could be reasonable to emit REP MOVSB : void fn(char *dst, const char *src, int l) { for (int i=0; i<l; i++) { dst[i] = src[i]; } } But compilers

Complex compiler output for simple constructor

允我心安 提交于 2019-12-10 18:18:41
问题 I have a struct X with two 64-bit integer members, and a constructor: struct X { X(uint64_t a, uint64_t b) { a_ = a; b_ = b; } uint64_t a_, b_; }; When I look at the compiler output (x86-64 gcc 8.3 and x86-64 clang 8.0.0, on 64-bit Linux), with no optimizations enabled, I see the following code for the constructor. x86-64 gcc 8.3: X::X(unsigned long, unsigned long): push rbp mov rbp, rsp mov QWORD PTR [rbp-8], rdi mov QWORD PTR [rbp-16], rsi mov QWORD PTR [rbp-24], rdx mov rax, QWORD PTR [rbp

Compiler optimization or my misunderstanding

狂风中的少年 提交于 2019-12-10 17:14:41
问题 Recently I was testing some C++ deep and dark corners and I got confused about one subtle point. My test is so simple actually: // problem 1 // no any constructor call, g++ acts as a function declaration to the (howmany()) // g++ turns (howmany()) into (howmany(*)()) howmany t(howmany()); // problem 2 // only one constructor call howmany t = howmany(); My expectation from above line was; first howmany() constructor call will produce one temporary object and then compiler will use that