x86-64

SSE “denormals are zeros” option

Deadly 提交于 2019-12-10 15:23:01
问题 I just experimented with the SSE option "denormals are zeros" through setting this option with _mm_setcsr( _mm_getcsr() | 0x40 ) . I found an in interesting thing: this doesn't prevent SSE from generating denormals when both operands are non-denormal! It just makes SSE consider denormal operands as if they were zeros. As I explained I know what this option does. But what is this option good for? Addendum I just read the Intel article linked by user nucleon. And I was curious about the

Minimal opcode size x86-64 strlen implementation

守給你的承諾、 提交于 2019-12-10 15:22:54
问题 I'm investigating a minimal opcode size x86-64 strlen implementation for my code golfing / binary executable that is not supposed to exceed some size (think of demoscene for simplicity). General idea comes from here, size optimization ideas from here and here. Input string address is in rdi , max length should be not bigger than Int32 xor eax,eax ; 2 bytes or ecx,-1 ; 3 bytes repne scasb ; 2 bytes not ecx ; 2 bytes dec ecx ; 2 bytes Final result is in ecx in 11 bytes total. The question is

Detect if the processor is 64-bit under 32 bit OS

纵然是瞬间 提交于 2019-12-10 13:53:14
问题 Normally, x86-64 architecture offers compatibility with x86. A 32-bit Windows (or other OS) can run on an x86-64 processor. (Correct me if I am wrong). I would like to know if it is possible (in C++) for a 32-bit Windows to know that if underlying processor is 64-bit. For example, if Windows 7 32-bit running on Core i5, we should be able to know that processor is 64-bit (although Windows 7 32 bit is running). You may question the requirement that even if processor is 64 bit and OS is 32 bit,

Why does the ia32/x64 opcode map document 0x66 and 0xF2 as a double mandatory prefix for opcode 0x0F38F1 (CRC32)?

给你一囗甜甜゛ 提交于 2019-12-10 13:49:23
问题 In the Intel 64 and IA-32 Architectures Software Developer's Manual, Row F table A-4 Appendix A.3 Volume 2C (Order Number 326018-045US January 2013) is unique in that it has a prefix sub-row for a combination of two prefixes: 0x66 and 0xF2. The only opcode for which this is relevant is 0x0F38F1 (CRC32). For the prefix 0xF2 alone, the source operand is Ey (memory or general purpose register; 32 bit or 64 bit), and for the prefixes 0x66 and 0xF2 together, the source operand is Ew (memory or

X86 64-bits Assembly Linux 'Hello World' linking issue

血红的双手。 提交于 2019-12-10 13:38:20
问题 I am attempting to follow up on this thread which unfortunately does not quite solve my problem. The code I am trying to run is as follows: ; File hello.asm section .data msg: db "Hello World!",0x0a,0 section .text global main extern printf main: push rbp mov rbp, rsp lea rdi, [msg] ; parameter 1 for printf xor eax, eax ; 0 floating point parameter call printf xor eax, eax ; returns 0 pop rbp ret My system is debian stretch: $ uname -a Linux <host> 4.8.0-1-amd64 #1 SMP Debian 4.8.7-1 (2016-11

Can “mov eax, 0x1” always be used instead of “mov rax, 0x1”?

喜夏-厌秋 提交于 2019-12-10 13:36:01
问题 When assembling this code with nasm : BITS 64 mov eax, 0x1 mov rax, 0x1 I get this output: b8 01 00 00 00 b8 01 00 00 00 which is the opcode for mov eax, 0x1 repeated twice. Does this mean that mov rax, 0x1 can always be replaced by mov eax, 0x1 or is it just in this case? If this is correct wouldn't it be better to use than: xor rax, rax inc rax as that becomes 6 bytes when assembled while mov eax, 0x1 is only 5 bytes? 回答1: Always. Most (if not all) 32-bit MOVs and ALU operations clear bits

Is it possible to decode x86-64 instructions in reverse?

不想你离开。 提交于 2019-12-10 13:24:43
问题 I was wondering if it is possible to decode x86-64 instructions in reverse? I need this for a runtime dissembler. Users can point to a random location in memory and then should be able to scroll upwards and see what instructions came before the specified address. I want to do this by reverse decoding. 回答1: The basic format of x86 instructions is like this Modern CPUs can support VEX and EVEX prefixes. In x86-64 there might also be the REX prefix at the beginning Looking at the format it can

command to compile c files with .a files

假如想象 提交于 2019-12-10 12:45:13
问题 I have several .c files and one .a object file. What command with gcc should I use to compile them to one exe file? If we use a makefile, how will it look like? 回答1: The .a file is a library, already compiled. You compile your .c file to a .o, then you use the linker to link your .o with the .a to produce an executable. 回答2: For simple cases you can probably do this: gcc -o maybe.exe useful.a something.c Makefiles for non-trivial projects usually first invoke gcc to compile each .c file to a

Why doesn't time() from time.h have a syscall to sys_time?

不羁的心 提交于 2019-12-10 12:45:07
问题 I wrote a very simple program with calls time() to illustrate the use of strace , but I'm having a problem; the time() call doesn't seem to actually produce a syscall! I ended up stepping into the time() function in GDB and now I'm more confused than ever. From the disassembly of the time() function: 0x7ffff7ffad90 <time>: push rbp 0x7ffff7ffad91 <time+1>: test rdi,rdi 0x7ffff7ffad94 <time+4>: mov rax,QWORD PTR [rip+0xffffffffffffd30d] # 0x7ffff7ff80a8 0x7ffff7ffad9b <time+11>: mov rbp,rsp

MESI cache protocol

坚强是说给别人听的谎言 提交于 2019-12-10 12:44:45
问题 I was reading about the MESI snooping cache coherence protocol, which I guess is the protocol that is used in modern multicore x86 processors (please correct me if I'm wrong). Now that article says this at one place. A cache that holds a line in the Modified state must snoop (intercept) all attempted reads (from all of the other caches in the system) of the corresponding main memory location and insert the data that it holds. This is typically done by forcing the read to back off (i.e. retry