cpu-architecture

If I don't use fences, how long could it take a core to see another core's writes?

淺唱寂寞╮ 提交于 2019-11-27 15:09:10
问题 I have been trying to Google my question but I honestly don't know how to succinctly state the question. Suppose I have two threads in a multi-core Intel system. These threads are running on the same NUMA node. Suppose thread 1 writes to X once, then only reads it occasionally moving forward. Suppose further that, among other things, thread 2 reads X continuously. If I don't use a memory fence, how long could it be between thread 1 writing X and thread 2 seeing the updated value? I understand

What branch misprediction does the Branch Target Buffer detect?

我是研究僧i 提交于 2019-11-27 15:00:21
问题 I am currently looking at the various parts of the CPU pipeline which can detect branch mispredictions. I have found these are: Branch Target Buffer (BPU CLEAR) Branch Address Calculator (BA CLEAR) Jump Execution Unit (not sure of the signal name here??) I know what 2 and 3 detect, but I do not understand what misprediction is detected within the BTB. The BAC detects where the BTB has erroneously predicted a branch for a non-branch instruction, where the BTB has failed to detect a branch, or

how much memory can be accessed by a 32 bit machine?

北城以北 提交于 2019-11-27 10:43:52
What is meant by 32bit or 64 bit machine? It’s the processor architecture…a 32 bit machine can read and write 32bit data at a time same way with 64 bit machine…. whats the maximum memory that a 32 bit machine can access? It is 2^32=4Gb (4Gigabit = 0.5 GigaByte) That means 4Gb ram? If I consider the same way for a 64 bit machine then I can have a ram of 16ExbiBytes ..is that possible? Are my concepts right? Yes, a 32-bit architecture is limited to addressing a maximum of 4 gigabytes of memory. Depending on the operating system, this number can be cut down even further due to reserved address

atomic operation cost

谁说我不能喝 提交于 2019-11-27 10:21:09
What is the cost of the atomic operation (any of compare-and-swap or atomic add/decrement)? How much cycles does it consume? Will it pause other processors on SMP or NUMA, or will it block memory accesses? Will it flush reorder buffer in out-of-order CPU? What effects will be on the cache? I'm interested in modern, popular CPUs: x86, x86_64, PowerPC, SPARC, Itanium. Blaisorblade I have looked for actual data for the past days, and found nothing. However, I did some research, which compares the cost of atomic ops with the costs of cache misses. The cost of the x86 LOCK prefix, or CAS, before

What is the difference between x64 and IA-64?

青春壹個敷衍的年華 提交于 2019-11-27 10:06:19
问题 I was on Microsoft's website and noticed two different installers, one for x64 and one for IA-64. Reference:Installing the .NET Framework 4.5, 4.5.1 My understanding is that IA-64 is a subclass of x64, so I'm curious why it would have a separate installer. 回答1: x64 is used as a short term for the 64 bit extensions of the "classical" x86 architecture; almost any "normal" PC produced in the last years have a processor based on such architecture. AMD invented the AMD64 extensions; Intel was more

Cycles/cost for L1 Cache hit vs. Register on x86?

隐身守侯 提交于 2019-11-27 09:53:24
问题 I remember assuming that an L1 cache hit is 1 cycle (i.e. identical to register access time) in my architecture class, but is that actually true on modern x86 processors? How many cycles does an L1 cache hit take? How does it compare to register access? 回答1: Here's a great article on the subject: http://arstechnica.com/gadgets/reviews/2002/07/caching.ars/1 To answer your question - yes, a cache hit has approximately the same cost as a register access. And of course a cache miss is quite

What is locality of reference?

人盡茶涼 提交于 2019-11-27 09:52:30
问题 I am having problem in understanding locality of reference. Can anyone please help me out in understanding what it means and what is, Spatial Locality of reference Temporal Locality of reference 回答1: This would not matter if your computer was filled with super-fast memory. But unfortunately that's not the case and computer-memory looks something like this 1 : +----------+ | CPU | <<-- Our beloved CPU, superfast and always hungry for more data. +----------+ |L1 - Cache| <<-- works at 100% of

Parallel programming using Haswell architecture [closed]

▼魔方 西西 提交于 2019-11-27 09:42:16
问题 I want to learn about parallel programming using Intel's Haswell CPU microarchitecture. About using SIMD: SSE4.2, AVX2 in asm/C/C++/(any other langs)?. Can you recommend books, tutorials, internet resources, courses? Thanks! 回答1: It sounds to me like you need to learn about parallel programming in general on the CPU. I started looking into this about 10 months ago before I ever used SSE, OpenMP, or intrinsics so let me give a brief summary of some important concepts I have learned and some

Which CPU architectures support Compare And Swap (CAS)?

瘦欲@ 提交于 2019-11-27 09:41:57
问题 just curious to know which CPU architectures support compare and swap atomic primitives? 回答1: Powerpc has more powerful primitives available: "lwarx" and "stwcx" lwarx loads a value from memory but remembers the location. Any other thread or cpu that touches that location will cause the "stwcx", a conditional store instruction, to fail. So the lwarx /stwcx combo allows you to implement atomic increment / decrement, compare and swap, and more powerful atomic operations like "atomic increment

What's the difference between a word and byte?

只愿长相守 提交于 2019-11-27 09:00:09
问题 I've done some research. A byte is 8 bits and a word is the smallest unit that can be addressed on memory. The exact length of a word varies. What I don't understand is what's the point of having a byte? Why not say 8 bits? I asked a prof this question and he said most machines these days are byte-addressable, but what would that make a word? 回答1: Byte : Today, a byte is almost always 8 bit. However, that wasn't always the case and there's no "standard" or something that dictates this. Since