x86-64 | 易学教程

How can I change the device on wich OpenCL-code will be executed with Umat in OpenCV?

阅读更多关于 How can I change the device on wich OpenCL-code will be executed with Umat in OpenCV?

As known, OpenCV 3.0 supports new class cv::Umat which provides Transparent API (TAPI) to use OpenCL automaticaly if it can: http://code.opencv.org/projects/opencv/wiki/Opencv3#tapi There are two indtroductions to the cv::Umat and TAPI: Intel: https://software.intel.com/en-us/articles/opencv-30-architecture-guide-for-intel-inde-opencv AMD: http://developer.amd.com/community/blog/2014/10/15/opencv-3-0-transparent-api-opencl-acceleration/ But if I have: Intel CPU Core i5 (Haswell) 4xCores (OpenCL Intel CPUs with SSE 4.1, SSE 4.2 or AVX support ) Intel Integrated HD Graphics which supports OpenCL

Why does printf print random value with float and integer format specifier

阅读更多关于 Why does printf print random value with float and integer format specifier

I wrote a simple code on a 64 bit machine int main() { printf("%d", 2.443); } So, this is how the compiler will behave. It will identify the second argument to be a double hence it will push 8 bytes on the stack or possibly just use registers across calls to access the variables. %d expects a 4 byte integer value, hence it prints some garbage value. What is interesting is that the value printed changes everytime I execute this program. So what is happening? I expected it to print the same garbage value everytime not different ones everytime. It's undefined behaviour, of course, to pass

How to get a “backtrace” (like gdb) using only ptrace (linux, x86/x86_64)

阅读更多关于 How to get a “backtrace” (like gdb) using only ptrace (linux, x86/x86_64)

I want to get a backtrace -like output as gdb does. But I want to do this via ptrace() directly. My platform is Linux, x86; and, later x86_64. Now I want only to read return addresses from the stack, without conversion into symbol names. So, for test program, compiled in -O0 mode by gcc-4.5 : int g() { kill(getpid(),SIGALRM); } int f() { int a; int b; a = g(); b = a; return a+b; } int e() { int c; c = f(); } main() { return e(); } I will start a my program and connect with ptrace to test program at very beginning. Then, I will do PTRACE_CONT and will wait for signal. When test program will do

What is long double on x86-64?

阅读更多关于 What is long double on x86-64?

Someone told me that: Under x86-64, FP arithmetic is done with SSE, and therefore long double is 64 bits. But in the x86-64 ABI it says that: C type | sizeof | alignment | AMD64 Architecture long double | 16 | 16 | 80-bit extended (IEEE-754) See: amd64-abi.pdf and gcc says sizeof(long double) is 16 and gives FLT_DBL = 1.79769e+308 and FLT_LDBL = 1.18973e+4932 So I'm confused, how is long double 64 bit? I thought it is an 80-bit representation. Under x86-64, FP arithmetic is done with SSE, and therefore long double is 64 bits. That's what usually happens under x86-64 (where the presence of SSE

x86_64: Is it possible to “in-line substitute” PLT/GOT references?

阅读更多关于 x86_64: Is it possible to “in-line substitute” PLT/GOT references?

I'm not sure what a good subject line for this question is, but here we go ... In order to force code locality / compactness for a critical section of code, I'm looking for a way to call a function in an external (dynamically-loaded) library through a "jump slot" (an ELF R_X86_64_JUMP_SLOT relocation) directly at the call site - what the linker ordinarily puts into PLT / GOT, but have these inlined right at the call site. If I emulate the call like: #include <stdio.h> int main(int argc, char **argv) { asm ("push $1f\n\t" "jmp *0f\n\t" "0: .quad %P0\n" "1:\n\t" : : "i"(printf), "D"("Hello,

Python ctypes and function calls

阅读更多关于 Python ctypes and function calls

My friend produced a small proof-of-concept assembler that worked on x86. I decided to port it for x86_64 as well, but I immediately hit a problem. I wrote a small piece of program in C, then compiled and objdumped the code. After that I inserted it to my python script, therefore the x86_64 code is correct: from ctypes import cast, CFUNCTYPE, c_char_p, c_long buffer = ''.join(map(chr, [ #0000000000000000 <add>: 0x55, # push %rbp 0x48, 0x89, 0xe5, # mov %rsp,%rbp 0x48, 0x89, 0x7d, 0xf8, # mov %rdi,-0x8(%rbp) 0x48, 0x8b, 0x45, 0xf8, # mov -0x8(%rbp),%rax 0x48, 0x83, 0xc0, 0x0a, # add $0xa,%rax

Is increment an integer atomic in x86? [duplicate]

阅读更多关于 Is increment an integer atomic in x86? [duplicate]

问题 This question already has answers here : Can num++ be atomic for 'int num'? (13 answers) Closed 3 years ago . On a multicore x86 machine, Say a thread executing on core1 increments an integer variable a at the same time thread on core 2 also increments it. Given that the initial value of a was 0, would it always be 2 in the end? Or it could have some other value? Assume that a is declared as volatile and we are not using atomic variables (such as atomic<> of C++ and built in atomic operations

How to get `gcc` to generate `bts` instruction for x86-64 from standard C?

阅读更多关于 How to get `gcc` to generate `bts` instruction for x86-64 from standard C?

Inspired by a recent question , I'd like to know if anyone knows how to get gcc to generate the x86-64 bts instruction (bit test and set) on the Linux x86-64 platforms, without resorting to inline assembly or to nonstandard compiler intrinsics. Related questions: Why doesn't gcc do this for a simple |= operation were the right-hand side has exactly 1 bit set? How to get bts using compiler intrinsics or the asm directive Portability is more important to me than bts , so I won't use and asm directive, and if there's another solution, I prefer not to use compiler instrinsics. EDIT : The C source

Basic OS X Assembly and the Mach-O format

阅读更多关于 Basic OS X Assembly and the Mach-O format

I am interested in programming in x86-64 assembly on the Mac OS X platform. I came across this page about creating a 248B Mach-O program , which led me to Apple's own Mach-O format reference . After that I thought I'd make that same simple C program in Xcode and check out the generated assembly. This was the code: int main(int argc, const char * argv[]) { return 42; } But the assembly generated was 334 lines, containing (based on the 248B model) a lot of excess content. Firstly, why is so much DWARF debug info included in the Release build of a C executable? Secondly, I notice the Mach-O

Find which assembly instruction caused an Illegal Instruction error without debugging

阅读更多关于 Find which assembly instruction caused an Illegal Instruction error without debugging

问题 While running a program I've written in assembly, I get Illegal instruction error. Is there a way to know which instruction is causing the error, without debugging that is, because the machine I'm running on does not have a debugger or any developement system. In other words, I compile in one machine and run on another. I cannot test my program on the machine I'm compiling because they don't support SSE4.2. The machine I'm running the program on does support SSE4.2 instructions nevertheless.