x86-64

x86-64 usage of LFENCE

拈花ヽ惹草 提交于 2019-12-01 05:30:19
I'm trying to understand the right way to use fences when measuring time with RDTSC/RDTSCP. Several questions on SO related to this have already been answered elaborately. I have gone through a few of them. I have also gone through this really helpful article on the same topic: http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-32-ia-64-benchmark-code-execution-paper.pdf However, in another online blog, there's an example of using LFENCE instead of CPUID on x86. I was wondering how LFENCE prevents earlier stores from contaminating the RDTSC measurements. E.g. <Instr A>

Was there a P4 model with double-pumped 64-bit operations?

只愿长相守 提交于 2019-12-01 05:27:20
问题 I recall that one of the interesting features of the initial P4 micro-architecture was it's double-pumped ALU. I think Intel called it something like the Rapid Execution Unit , but basically it meant that each execution unit in the ALU was effectively running at twice the frequency, and could handle two simple ALU operations in a single cycle, even if they were dependent . This feature disappeared at some point (before or at the same time as the P4), but was there ever a 64-bit P4 with a

Meaning of REX.w prefix before AMD64 jmp (FF25)

最后都变了- 提交于 2019-12-01 04:56:31
While solving a bug I came across a difference between import jump tables of two Win64 DLLs. 64bit version of kernel32.dll uses plain FF25 jmp instruction in its import jump tables. On the other hand 64bit version of advapi32.dll uses 48FF25 which indicates REX.w=1 prefix before the jmp opcode. However, both seem to have 32bit operand specifying a RIP+offset address. Is there any meaning for REX.w prefix on this specific opcode? I'm not working with machine code often, so please excuse any factual mistakes. The REX.W prefix is ignored. In 64-bit mode the FF /4 opcode is always has a 64-bit

OS X - x64: stack not 16 byte aligned error

前提是你 提交于 2019-12-01 04:48:30
问题 I know that OS X is 16 byte stack align, but I don't really understand why it is causing an error here. All I am doing here is to pass an object size (which is 24) to %rdi, and call malloc. Does this error mean I have to ask for 32 bytes ? And the error message is: libdyld.dylib`stack_not_16_byte_aligned_error: -> 0x7fffc12da2fa <+0>: movdqa %xmm0, (%rsp) 0x7fffc12da2ff <+5>: int3 libdyld.dylib`_dyld_func_lookup: 0x7fffc12da300 <+0>: pushq %rbp 0x7fffc12da301 <+1>: movq %rsp, %rbp Here is the

GCC doesn't make use of inc

隐身守侯 提交于 2019-12-01 04:46:11
The GCC compiler $ gcc --version gcc (GCC) 4.8.2 ... doesn't generate an inc assembly instruction, where it could actually be useful, like in this C program: int main(int argc, char **argv) { int sum = 0; int i; for(i = 0; i < 1000000000L; i++) <---- that "i++" sum += i; return sum; } Instead, it generates an add instruction: 0000000000000000 <main>: 0: 31 d2 xor %edx,%edx 2: 31 c0 xor %eax,%eax 4: 0f 1f 40 00 nopl 0x0(%rax) 8: 01 d0 add %edx,%eax a: 83 c2 01 add $0x1,%edx <---- HERE d: 81 fa 00 ca 9a 3b cmp $0x3b9aca00,%edx 13: 75 f3 jne 8 <main+0x8> 15: f3 c3 repz retq Why does it do this?

How to avoid stdin input that does not fit in buffer be sent to the shell in Linux 64-bit Intel (x86-64) assembly

不问归期 提交于 2019-12-01 04:25:57
问题 Edit: Title changed, as @Gunner pointed out that this is not a buffer overflow. In reading user input from stdin with NR_read in Linux 64-bit Intel assembly, I wonder how can I avoid that the input that does not fit in the input buffer being sent to Linux shell eg. bash? For example in this example program I have defined an input buffer of 255 bytes (the size of the buffer can be whatever >= 1). The rest of an input longer than 255 bytes is sent to bash (if running from bash) and and this is

How to build 64-bit Python on OS X 10.6 — ONLY 64 bit, no Universal nonsense

微笑、不失礼 提交于 2019-12-01 04:25:26
I just want to build this on my development machine -- the binary install from Python.org is still 32 bits and installing extensions (MySQLdb, for example) is driving me nuts with trying to figure out the proper flags for each and every extension. Clarification: I did NOT replace the system Python, I just installed the Python.org binary into its normal place at /Library/..., not /System/Library/.... Everything else seems to build 64 bit by default, and the default Python 2.6.1 was 64 bit (before I replaced it with the Python.org build figuring it was a direct replacement)` I just want a 64 bit

Is it possible to detect the CPU architecture from machine code?

為{幸葍}努か 提交于 2019-12-01 04:24:30
问题 Let's say that there are 2 possible architectures, ARM and x86. Is there a way to detect what system the code is running on, to achieve something like this from assembly/machine code? if (isArm) jmp to arm machine code if (isX86) jmp to x86 machine code I know that ARM machine code differs from x86 machine code significantly. What I'm thinking about is some well crafted assembly instructions that would result in the same binary machine code. 回答1: Assuming you have already taken care of all

Xcode 5.1 and compiling error for architecture x86_64

☆樱花仙子☆ 提交于 2019-12-01 04:16:52
问题 Yesterday I had a project working without problem with Xcode 5. Today, after the update to Xcode 5.1 i have 6 errors and the project is not compiling. Undefined symbols for architecture x86_64: "_OBJC_CLASS_$_PayPal", referenced from: objc-class-ref in SUAppDelegate.o objc-class-ref in SUTViewController.o "_OBJC_CLASS_$_PayPalAdvancedPayment", referenced from: objc-class-ref in SUTViewController.o "_OBJC_CLASS_$_PayPalInvoiceData", referenced from: objc-class-ref in SUTViewController.o "_OBJC

Understanding %rip register in intel assembly

。_饼干妹妹 提交于 2019-12-01 04:13:16
Concerning the following small code, which was illustrated in another post about the size of structure and all the possibilities to align data correctly : struct { char Data1; short Data2; int Data3; char Data4; } x; unsigned fun ( void ) { x.Data1=1; x.Data2=2; x.Data3=3; x.Data4=4; return(sizeof(x)); } I get the corresponding disassembly (with 64 bits) 0000000000000000 <fun>: 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: c6 05 00 00 00 00 01 movb $0x1,0x0(%rip) # b <fun+0xb> b: 66 c7 05 00 00 00 00 movw $0x2,0x0(%rip) # 14 <fun+0x14> 12: 02 00 14: c7 05 00 00 00 00 03 movl $0x3,0x0(%rip) # 1e