cpu-architecture

Regarding instruction ordering in executions of cache-miss loads before cache-hit stores on x86

天涯浪子 提交于 2019-12-01 23:29:42
Given the small program shown below (handcrafted to look the same from a sequential consistency / TSO perspective), and assuming it's being run by a superscalar out-of-order x86 cpu: Load A <-- A in main memory Load B <-- B is in L2 Store C, 123 <-- C is L1 I have a few questions: Assuming a big enough instruction-window, will the three instructions be fetched, decoded, executed at the same time? I assume not, as that would break execution in program order. The 2nd load is going to take longer to fetch A from memory than B. Will the later have to wait until the first is fully executed? Will

Why segmentation cannot be completely disable?

旧时模样 提交于 2019-12-01 22:39:26
According to AMD manual segmentation can not be disabled. My question is why, why it's impossible? Another question, it says that 64-bit disables it, what does that mean? Is segmentation completly disabled on 64-bit mode? AMD Manual: https://s7.postimg.cc/hk15o6swr/Capture.png Introduction In 64-bit mode, whenever a non-null segment selector is loaded into any of the segment registers, the processor automatically loads the corresponding segment descriptor in the hidden part of the segment register, just like in protected/compatibility mode. However, the segment descriptors selected by the DS,

What happens when you use a memory override prefix but all the operands are registers?

社会主义新天地 提交于 2019-12-01 22:10:17
What happens when you use a memory override prefix but all the operands are registers? So, let's say you code mov eax, ebx or add eax, ebx and the default is 32-bit but you use a 67h override. How does the processor handle that situation? The Intel Software Developer's Manual*, volume 2, section 2.1, details the behavior of each instruction prefix. It says use of the address-size prefix (67h) with an instruction that doesn't have a memory operand is reserved and may cause unpredictable behavior. The operand-size prefix (66h) may be used to switch between 16- and 32-bit operand sizes and also

Is it possible to perform some computations within the RAM?

蓝咒 提交于 2019-12-01 21:17:38
Theoretically, is there any way to perform any computations within the RAM, using memory related instructions such as move , clflush or whatever, such as an xor between two adjacent rows for example? With my limited knowledge about RAM and assembly, I can't think of any such possibilities. No, any computation is done in the CPU (or GPU, or other system devices that can load/store to RAM). Even the Turing-complete mov stuff that @PaulR linked in a comment is just using the CPU's address-generation hardware with data in registers to do calculations. The memory still just sees 64B burst-loads and

Why do 32-bit applications work on 64-bit x86 CPUs?

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-01 20:53:45
32-bit application executables contain machine code for a 32-bit CPU, but the assembly and internal architecture (number of registers, register width, calling convention) of 32-bit and 64-bit Intel CPU's differ, so how can a 32-bit exe run on a 64-bit machine? Wikipedia's x86-64 article says: x86-64 is fully backwards compatible with 16-bit and 32-bit x86 code. Because the full x86 16-bit and 32-bit instruction sets remain implemented in hardware without any intervening emulation , existing x86 executables run with no compatibility or performance penalties, whereas existing applications that

How do data caches route the object in this example?

时间秒杀一切 提交于 2019-12-01 20:04:17
Consider the diagrammed data cache architecture. (ASCII art follows.) -------------------------------------- | CPU core A | CPU core B | | |------------|------------| Devices | | Cache A1 | Cache B1 | with DMA | |-------------------------| | | Cache 2 | | |------------------------------------| | RAM | -------------------------------------- Suppose that an object is shadowed on a dirty line of Cache A1, an older version of the same object is shadowed on a clean line of Cache 2, and the newest version of the same object has recently been written to RAM via DMA. Diagram: -------------------------

How does MIPS I forward from EX to ID for branches without stalling?

有些话、适合烂在心里 提交于 2019-12-01 19:51:56
addiu $6,$6,5 bltz $6,$L5 nop ... $L5: Is that safe on MIPS I? If so, how? Original MIPS I is a classic 5-stage RISC IF ID EX MEM WB design that hides all of its branch latency with a single branch-delay slot by checking branch conditions early, in the ID stage. (Which is why it's limited to equal/not-equal, or sign-bit checks like lt or ge zero, not lt between two registers that would need carry-propagation through an adder.) Doesn't this mean that branches need their input ready a cycle earlier than ALU instructions? The bltz enters the ID stage in the same cycle that addiu enters EX. MIPS I

Unable to disable Hardware prefetcher in Core i7

☆樱花仙子☆ 提交于 2019-12-01 18:01:06
I am getting Error while trying to disable Hardware prefetcher in my Core i7 system. I am following the method as per the link How do I programmatically disable hardware prefetching? In my system grep -i msr /boot/config-$(uname -r) CONFIG_X86_DEBUGCTLMSR=y CONFIG_X86_MSR=y CONFIG_SCSI_ARCMSR=m Here is my error message root@ ./rdmsr 0x1a0 850089 [root@ ./wrmsr -p 0 0x1a0 0x850289 (to disable hardware prefetcher in Core i7) wrmsr:pwrite: Input/output error I am getting same error for disabling Adjacent cache line prefetcher Any idea how to resolve this problem ? Thanks in advance . MSR

Is it possible to detect processor architecture in java? [duplicate]

坚强是说给别人听的谎言 提交于 2019-12-01 16:22:26
This question already has an answer here: get OS-level system information 15 answers Is it possible to detect processor architecture in java? like x86 or sun SPARC, etc? If so, how would I go about doing it? System.getProperty ("os.arch"); On my PC returns amd64 . CloudyMarble You can try the System.getenv() to get environment variables, use the PROCESSOR_ARCHITECTURE Key to get the CPU-architechture: System.out.println(System.getenv("PROCESSOR_ARCHITECTURE")); or in case of 64 bit: System.out.println(System.getenv("PROCESSOR_ARCHITEW6432")); The other way would be to use the "os.arch" system

Why did Intel change the static branch prediction mechanism over these years?

馋奶兔 提交于 2019-12-01 15:47:54
From here I know Intel implemented several static branch prediction mechanisms these years: 80486 age: Always-not-taken Pentium4 age: Backwards Taken/Forwards Not-Taken Newer CPUs like Ivy Bridge, Haswell have become increasingly intangible, see Matt G's experiment here . And Intel seems don't want to talk about it any more, because the latest material I found within Intel Document was written about ten years ago. I know static branch prediction is (far?) less important than dynamic, but in quite a few situations, CPU will be completely lost and programmers(with compiler) are usually the best