cpu-architecture

Unable to disable Hardware prefetcher in Core i7

删除回忆录丶 提交于 2019-12-30 19:01:31
问题 I am getting Error while trying to disable Hardware prefetcher in my Core i7 system. I am following the method as per the link How do I programmatically disable hardware prefetching? In my system grep -i msr /boot/config-$(uname -r) CONFIG_X86_DEBUGCTLMSR=y CONFIG_X86_MSR=y CONFIG_SCSI_ARCMSR=m Here is my error message root@ ./rdmsr 0x1a0 850089 [root@ ./wrmsr -p 0 0x1a0 0x850289 (to disable hardware prefetcher in Core i7) wrmsr:pwrite: Input/output error I am getting same error for disabling

Is there a compiler flag to indicate lack of armv7s architecture

扶醉桌前 提交于 2019-12-30 03:19:06
问题 With the iPhone 5 and other armv7s devices now appearing, there are compatibility problems with existing (closed-source) 3rd-party frameworks such as Flurry which are built without this newer architecture. One option is to wait until they release a new build, but I was hoping there might be a compiler flag or something I can use in my Xcode project that would let the linker know not to expect armv7s architecture from this framework, and use the armv7 instead. Does anything like this exist?

How to target multiple architectures using NDK?

三世轮回 提交于 2019-12-30 01:37:11
问题 Background I've recently started to develop some code using the NDK, and I've thought of a possible portability problem that could occur while developing using NDK. The problem Since NDK uses native code, it needs to be compiled per CPU architecture. This is a problem since the user needs to run the app no matter what CPU the device has. Possible solutions I've found so far I've noticed I can modify the file "jni/Application.mk" and use: APP_ABI := armeabi armeabi-v7a x86 however, I don't

Does each Floating point operation take the same time?

 ̄綄美尐妖づ 提交于 2019-12-29 09:15:11
问题 I believe integer addition or subtraction always take the same time no matter how big the operands are. Time needed for ALU output to be stabilized may vary over input operands, but CPU component that exploits ALU output will wait sufficiently long time so that any integer operation will be processed in SAME cycles. (Cycles needed for ADD, SUB, MUL, and DIV will be different, but ADD will take the same cycles regardless of input operands, I think.) Is this true for floating point operation,

Does a memory barrier acts both as a marker and as an instruction?

佐手、 提交于 2019-12-29 08:15:20
问题 I have read different things about how a memory barrier works. For example, the user Johan 's answer in this question says that a memory barrier is an instruction that the CPU executes. While the user Peter Cordes 's comment in this question says the following about how the CPU reorders instructions: It reads faster than it can execute, so it can see a window of upcoming instructions. For details, see some of the links in the x86 tag wiki, like Agner Fog's microarch pdf, and also David Kanter

Why misaligned address access incur 2 or more accesses?

时光毁灭记忆、已成空白 提交于 2019-12-28 16:14:26
问题 The normal answers to why data alignment is to access more efficiently and to simplify the design of CPU. A relevant question and its answers is here. And another source is here. But they both do not resolve my question. Suppose a CPU has a access granularity of 4 bytes. That means the CPU reads 4 bytes at a time. The material I listed above both says that if I access a misaligned data, say address 0x1, then the CPU has to do 2 accesses (one from addresses 0x0, 0x1, 0x2 and 0x3, one from

Why isn't RDTSC a serializing instruction?

萝らか妹 提交于 2019-12-28 12:14:07
问题 The Intel manuals for the RDTSC instruction warn that out of order execution can change when RDTSC is actually executed, so they recommend inserting a CPUID instruction in front of it because CPUID will serialize the instruction stream (CPUID is never executed out of order). My question is simple: if they had the ability to make instructions serializing, why didn't they make RDTSC serializing? The entire point of it appears to be to get cycle accurate timings. Is there a situation under which

What specifically marks an x86 cache line as dirty - any write, or is an explicit change required?

青春壹個敷衍的年華 提交于 2019-12-28 03:05:27
问题 This question is specifically aimed at modern x86-64 cache coherent architectures - I appreciate the answer can be different on other CPUs. If I write to memory, the MESI protocol requires that the cache line is first read into cache, then modified in the cache (the value is written to the cache line which is then marked dirty). In older write-though micro-architectures, this would then trigger the cache line being flushed, under write-back the cache line being flushed can be delayed for some

what is a store buffer?

╄→гoц情女王★ 提交于 2019-12-27 19:11:37
问题 can anyone explain what is load buffer and how it's different from invalidation queues. and also difference between store buffers and write combining buffers? The paper by Paul E Mckenny http://www.rdrop.com/users/paulmck/scalability/paper/whymb.2010.07.23a.pdf explains very nicely about the store buffers and invalidation queues but unfortunately doesn't talk about write combining buffers 回答1: An invalidate queue is more like a store buffer, but it's part of the memory system, not the CPU.

what is a store buffer?

会有一股神秘感。 提交于 2019-12-27 19:11:09
问题 can anyone explain what is load buffer and how it's different from invalidation queues. and also difference between store buffers and write combining buffers? The paper by Paul E Mckenny http://www.rdrop.com/users/paulmck/scalability/paper/whymb.2010.07.23a.pdf explains very nicely about the store buffers and invalidation queues but unfortunately doesn't talk about write combining buffers 回答1: An invalidate queue is more like a store buffer, but it's part of the memory system, not the CPU.