cpu-architecture

Was there a P4 model with double-pumped 64-bit operations?

不想你离开。 提交于 2019-12-01 06:54:18
I recall that one of the interesting features of the initial P4 micro-architecture was it's double-pumped ALU . I think Intel called it something like the Rapid Execution Unit , but basically it meant that each execution unit in the ALU was effectively running at twice the frequency, and could handle two simple ALU operations in a single cycle, even if they were dependent . This feature disappeared at some point (before or at the same time as the P4), but was there ever a 64-bit P4 with a double dumped ALU? The 64-bit variants of the P4 came out in 2004, about four years after the initial 32

The inner workings of Spectre (v2)

99封情书 提交于 2019-12-01 06:36:18
I have done some reading about Spectre v2 and obviously you get the non technical explanations. Peter Cordes has a more in-depth explanation but it doesn't fully address a few details. Note: I have never performed a Spectre v2 attack so I do not have hands on experience. I have only read up about about the theory. My understanding of Spectre v2 is that you make an indirect branch mispredict for instance if (input < data.size) . If the Indirect Target Array (which I'm not too sure of the details of -- i.e. why it is separate from the BTB structure) -- which is rechecked at decode for RIPs of

x86-64 usage of LFENCE

拈花ヽ惹草 提交于 2019-12-01 05:30:19
I'm trying to understand the right way to use fences when measuring time with RDTSC/RDTSCP. Several questions on SO related to this have already been answered elaborately. I have gone through a few of them. I have also gone through this really helpful article on the same topic: http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-32-ia-64-benchmark-code-execution-paper.pdf However, in another online blog, there's an example of using LFENCE instead of CPUID on x86. I was wondering how LFENCE prevents earlier stores from contaminating the RDTSC measurements. E.g. <Instr A>

Was there a P4 model with double-pumped 64-bit operations?

只愿长相守 提交于 2019-12-01 05:27:20
问题 I recall that one of the interesting features of the initial P4 micro-architecture was it's double-pumped ALU. I think Intel called it something like the Rapid Execution Unit , but basically it meant that each execution unit in the ALU was effectively running at twice the frequency, and could handle two simple ALU operations in a single cycle, even if they were dependent . This feature disappeared at some point (before or at the same time as the P4), but was there ever a 64-bit P4 with a

How is load->store reordering possible with in-order commit?

孤街醉人 提交于 2019-12-01 05:19:22
ARM allows the reordering loads with subsequent stores, so that the following pseudocode: // CPU 0 | // CPU 1 temp0 = x; | temp1 = y; y = 1; | x = 1; can result in temp0 == temp1 == 1 (and, this is observable in practice as well). I'm having trouble understanding how this occurs; it seems like in-order commit would prevent it (which, it was my understanding, is present in pretty much all OOO processors). My reasoning goes "the load must have its value before it commits, it commits before the store, and the store's value can't become visible to other processors until it commits." I'm guessing

Right way to detect cpu architecture?

一曲冷凌霜 提交于 2019-12-01 04:26:45
问题 I'm attempting to detect the right cpu architecture for installing either a x86 msi or x64 msi file. If I'm right, for the msi I need the os cpu architecture I'm not totally sure if my way is right because I can't test it. What do you think? private static string GetOSArchitecture() { string arch = System.Environment.GetEnvironmentVariable("PROCESSOR_ARCHITECTURE"); string archWOW = System.Environment.GetEnvironmentVariable("PROCESSOR_ARCHITEW6432"); if(archWOW != null && archWOW != "" &&

Avoid stalling pipeline by calculating conditional early

不打扰是莪最后的温柔 提交于 2019-12-01 04:18:49
When talking about the performance of ifs, we usually talk about how mispredictions can stall the pipeline. The recommended solutions I see are: Trust the branch predictor for conditions that usually have one result; or Avoid branching with a little bit of bit-magic if reasonably possible; or Conditional moves where possible. What I couldn't find was whether or not we can calculate the condition early to help where possible. So, instead of: ... work if (a > b) { ... more work } Do something like this: bool aGreaterThanB = a > b; ... work if (aGreaterThanB) { ... more work } Could something

Is it allowed to access memory that spans the zero boundary in x86?

一笑奈何 提交于 2019-12-01 04:11:07
问题 Is it allowed for a single access to span the bounary between 0 and 0xFFFFFF... in x86 1 ? For example given that eax ( rax in 64-bit) is zero, is the following access allowed: mov ebx, DWORD [eax - 2] I'm interested in both x86 (32-bit) and x86-64 in case the answers are different. 1 Of course given that the region is mapped in your process etc. 回答1: I just tested with this EFI program. (And it worked, as expected.) If you want to reproduce this result, you would need an implementation of

How does mtune actually work?

孤街浪徒 提交于 2019-12-01 03:21:10
There's this related question: GCC: how is march different from mtune? However, the existing answers don't go much further than the GCC manual itself. At most, we get: If you use -mtune , then the compiler will generate code that works on any of them, but will favour instruction sequences that run fastest on the specific CPU you indicated. and The -mtune=Y option tunes the generated code to run faster on Y than on other CPUs it might run on. But exactly how does GCC favor one specific architecture, when bulding, while still being capable of running the build on other (usually older)

I get 'A 32 bit processes cannot access modules of a 64 bit process.' exception invoking Process.Start()

和自甴很熟 提交于 2019-12-01 03:19:39
Here is the code sample var startInfo = new ProcessStartInfo { Arguments = commandStr, FileName = @"C:\Windows\SysWOW64\logman.exe", }; using (var createCounterProc = new Process { StartInfo = startInfo }) { createCounterProc.Start(); createCounterProc.WaitForExit(); } After running the code I get "A 32 bit processes cannot access modules of a 64 bit process." message in MainModule (NativeErrorCode:299). My solution is configured to AnyCPU. I've tried both 64 and 32 bit versions of logman.exe (C:\Windows\SysWOW64\logman.exe and C:\Windows\System32\logman.exe) but I still have the same error.