x86-64

Is it allowed to access memory that spans the zero boundary in x86?

一笑奈何 提交于 2019-12-01 04:11:07
问题 Is it allowed for a single access to span the bounary between 0 and 0xFFFFFF... in x86 1 ? For example given that eax ( rax in 64-bit) is zero, is the following access allowed: mov ebx, DWORD [eax - 2] I'm interested in both x86 (32-bit) and x86-64 in case the answers are different. 1 Of course given that the region is mapped in your process etc. 回答1: I just tested with this EFI program. (And it worked, as expected.) If you want to reproduce this result, you would need an implementation of

Intel x86-64 XSAVE/XRSTOR

妖精的绣舞 提交于 2019-12-01 03:57:43
I'm a CS student writing in Intel x86-64 assembly, compiling with nasm , and running on an Core i7 processor with Ubuntu 12.04 as the guest OS. Does anyone have an example of how to use XSAVE and XRSTOR ? I've read the section on XSAVE in Intel Architectures Software Developers manual several times. I tried to implement xsave in C++ and then disassemble the binary to get an understanding of what it's doing. And of course I've scoured the Internet for examples. Any suggestions would be much obliged. Finally, an answer to this question. Thanks to user: harold who helped answer the question for

Efficient way to set first N or last N bits of __m256i to 1, the rest to 0

本小妞迷上赌 提交于 2019-12-01 03:53:38
How to set to 1 efficiently with AVX2 first N bits last N bits of __m256i , setting the rest to 0 ? These are 2 separate operations for tail and head of a bit range, when the range may start and end in the middle of __m256i value. The part of the range occupying full __m256i values is processed with all- 0 or all- 1 masks. The AVX2 shift instructions vpsllvd and vpsrlvd have the nice property that shift counts greater than or equal to 32 lead to zero integers within the ymm register. In other words: the shift counts are not masked, in contrast to the shift counts for the x86 scalar shift

Precompiled headers and compiling universal objects on OSX

青春壹個敷衍的年華 提交于 2019-12-01 03:53:12
We are using precompiled headers with GCC for our project and build them like this: gcc $(CFLAGS) precompiledcommonlib.h Now I'm building the project on OSX 10.6 and trying to use the nifty feature of building for all architectures at the same time like this: gcc $(CFLAGS) -c -arch i386 -arch x86_64 commonlib.c However, it seems this does not work for the precompiled headers: gcc $(CFLAGS) -arch i386 -arch x86_64 precompiledcommonlib.h Undefined symbols for architecture i386: "_main", referenced from: start in crt1.10.6.o ld: symbol(s) not found for architecture i386 collect2: ld returned 1

Constraining r10 register in gcc inline x86_64 assembly

痞子三分冷 提交于 2019-12-01 03:30:47
I'm having a go at writing a very light weight libc replacement library so that I can better understand the kernel - application interface. The first task is clearly getting some system call wrappers in place. I've successfully got 1 to 3 argument wrappers working but I'm struggling with a 4 argument varient. Here's my starting point: long _syscall4(long type, long a1, long a2, long a3, long a4) { long ret; asm ( "syscall" : "=a"(ret) : "a"(type), "D"(a1), "S"(a2), "d"(a3), "r10"(a4) : "c", "r11" ); return ret; } The compiler gives me the following error: error: matching constraint references

Determine 32/64 bit architecture in assembly

空扰寡人 提交于 2019-12-01 02:55:55
问题 I was reading over this question and wondered if the accepted answer might also be a way to determine the architecture. For instance, in asm could I push a WORD onto the stack and then check SP. Compare the new SP to the old SP: Diff of 4 means 32 bit Diff of 8 means 64 bit Am I correct in this thinking? 回答1: No, because the size of your stack is based on what mode you are running in (real, protected, long/64, vm86, smm, etc), not on the architecture. If your assembly is running in protected

How to build 64-bit Python on OS X 10.6 — ONLY 64 bit, no Universal nonsense

こ雲淡風輕ζ 提交于 2019-12-01 02:47:02
问题 I just want to build this on my development machine -- the binary install from Python.org is still 32 bits and installing extensions (MySQLdb, for example) is driving me nuts with trying to figure out the proper flags for each and every extension. Clarification: I did NOT replace the system Python, I just installed the Python.org binary into its normal place at /Library/..., not /System/Library/.... Everything else seems to build 64 bit by default, and the default Python 2.6.1 was 64 bit

What is callq instruction?

↘锁芯ラ 提交于 2019-12-01 02:43:41
I have some gnu assembler code for the x86_64 architecture generated by a tool and there are these instructions: movq %rsp, %rbp leaq str(%rip), %rdi callq puts movl $0, %eax I can not find actual documentation on the "callq" instruction. I have looked at http://support.amd.com/TechDocs/24594.pdf which is "AMD64 Architecture Programmer’s Manual Volume 3: General-Purpose and System Instructions" but they only describe CALL near and far instructions. I have looked at documentation for gnu assembler https://sourceware.org/binutils/docs/as/index.html but could not find the section detailing the

x86-64 usage of LFENCE

北城余情 提交于 2019-12-01 02:20:44
问题 I'm trying to understand the right way to use fences when measuring time with RDTSC/RDTSCP. Several questions on SO related to this have already been answered elaborately. I have gone through a few of them. I have also gone through this really helpful article on the same topic: http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-32-ia-64-benchmark-code-execution-paper.pdf However, in another online blog, there's an example of using LFENCE instead of CPUID on x86. I was

NEON, SSE and interleaving loads vs shuffles

吃可爱长大的小学妹 提交于 2019-12-01 01:48:26
I'm trying to understand the comment made by "Iwillnotexist Idonotexist" at SIMD optimization of cvtColor using ARM NEON intrinsics : ... why you don't use the ARM NEON intrisics that map to the VLD3 instruction? That spares you all of the shuffling, both simplifying and speeding up the code. The Intel SSE implementation requires shuffles because it lacks 2/3/4-way deinterleaving load instructions, but you shouldn't pass on them when they are available. The trouble I am having is the solution offers code that is non-interleaved, and it performs fused multiplies on floating points. I'm trying