Understanding CPU cache and cache line

柔情痞子 提交于 2019-11-28 15:55:10

A cache consists of data and tag RAM, arranged as a compromise of access time vs efficiency and physical layout. You're missing an important stat: number of ways (sets). You rarely have 1-way caches, because they perform pathologically badly with simple patterns. Anyway:

1) Yes, tags take extra space. This is part of the design compromise - you don't want it to be a large fraction of the total area, and why line size isn't just 1 byte or 1 word. Also, all tags for an index are simultaneously accessed, and that can affect efficiency and layout if there's a large number of ways. The size is slightly bigger than your estimate. There's usually also a few bits extra bits to mark validity and sometimes hints. More ways and smaller lines needs a larger fraction taken up by tags, so generally lines are large (32+ bytes) and ways are small (4-16).

2) Yes. Some caches also do a "critical word first" fetch, where they start with the word that caused the line fill, then fetch the rest. This reduces the number of cycles the CPU is waiting for the data it actually asked for. Some caches will "write thru" and not allocate a line if you miss on a write, which avoids having to read the entire cache line first, before writing to it (this isn't always a win).

3) The tags won't store the lower 5 bits as they're not needed to match a cache line. They just index into individual lines.

Wikipedia has a pretty good, if a bit intense, write-up on caches: http://en.wikipedia.org/wiki/CPU_cache - see "Implementation". There's a diagram of how data and tags are split. Me, I think everyone should learn this stuff because you really can improve performance of code when you know what the underlying machine is actually capable of.

  1. The cache metadata is typically not counted as a part of the cache itself. It might not even be stored in the same part of the CPU (it could be in another cache, implemented using special CPU registers, etc).
  2. This depends on whether your CPU will fetch unaligned addresses. If it will only fetch aligned addresses, then the example you gave would be correct. If the CPU fetches unaligned addresses, then it might fetch the range 0xFFFF0008 to 0xFFFF0027.
  3. The index bytes are still useful, even when cache access is aligned. This gives the CPU a shorthand method for referencing a byte within a cache line that it can use in its internal bookkeeping. You could get the same information by knowing the address associated with the cache line and the address associated with the byte, but that's a whole lot more information to carry around.

Different CPUs implement caching very differently. For the best answer to your question, please give some additional details about the particular CPU (type, model, etc) that you are talking about.

This is based on my vague memory, you should read books like "Computer Architecture: A Quantitative Approach" by Hennessey and Patterson. Great book.

Assuming a 32-bit CPU... (otherwise your figures would need to use >4 bytes (maybe <8 bytes since some/most 64-bit CPU don't have all 64 bits of address line used)) for the address.

1) I believe it's at least 4*32 bytes. Depending on the CPU, the chip architects may have decided to keep track of other info besides the full address. But it's usually not considered part of the cache.

2) Yes, but how that mapping is done is different. See Wikipedia - CPU cache - associativity There's the simple direct mapped cache and the more complex associative mapped cache. You want to avoid the case where some code needs two piece of information but the two addresses map to the exact same cache line.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!