How to get `gcc` to generate `bts` instruction for x86-64 from standard C?

倾然丶 夕夏残阳落幕 提交于 2019-11-29 01:23:52

It is in the first answer for the first link - how much does it matter in grand scheme of things. The only part when you test bits are:

  • Low level drivers. However if you are writing one you probably know ASM, it is sufficient tided to the system and probably most delays are on I/O
  • Testing for flags. It is usually either on initialisation (one time only at the beginning) or on some shared computation (which takes much more time).

The overall impact on performance of applications and macrobenchmarks is likely to be minimal even if microbenchmarks shows an improvement.

To the Edit part - using bts alone does not guarantee the atomic of the operation. All it guarantee is that it will be atomic on this core (so is or done on memory). On multi-processor units (uncommon) or multi-core units (very common) you still have to synchronize with other processors.

As synchronization is much more expensive I belive that difference between:

asm("lock bts %0, %1" : "+m" (*array) : "r" (bit));

and

asm("lock or %0, %1" : "+m" (*array) : "r" (1 << bit));

is minimal. And the second form:

  • Can set several flag at once
  • Have nice __sync_fetch_and_or (array, 1 << bit) form (working on gcc and intel compiler as far as I remember).

I use the gcc atomic builtins such as __sync_lock_test_and_set( http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html ). Changing the -march flag will directly affect what is generated. I'm using it with i686 right now, but http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/i386-and-x86_002d64-Options.html#i386-and-x86_002d64-Options shows all the possibilities.

I realize it's not exactly what you are asking for, but I found those two web pages very useful when I was looking for mechanisms like that.

I believe (but am not certain) that neither the C++ or C standards have any mechanisms for these types of synchronization mechanisms yet. Support for higher level synchronization mechanisms are in various states of standardization, but I don't even think one of those would allow you the access of the type of primitive you're after.

Are you programming lock-free datastructures where locks are insufficient?

You probably want to just go ahead and use gcc's non-standard extensions and/or operating system or library provided synchronization primitives. I would bet there's a library that might provide the type of portability you're looking for if you're concerned about using compiler intrinsics. (Though really, I think most people just bite the bullet and use gcc-specific code when they need it. Not ideal, but the standards haven't really been keeping up.)

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!