Observe the following program written in Java (complete runnable version follows, but the important part of the program is in the snippet a little bit further below):
Edit: This answer did not stand up to testing.
I have no way to test this right now (no multicore CPU in this machine), but here is a theory: The Foo
instances might not be in the same cache lines, but perhaps the Reader
instances are.
This means the slowdown could be explained by the write to bar
, rather than the read of foo
, because writing to bar
would invalidate that cache line for the other core and cause lots of copying between caches. Commenting out the write to bar
(which is the only write to a field of Reader
in the loop) stops the slowdown, which is consistent with this explanation.
Edit: According to this article, the memory layout of objects is such that the bar
reference would be the last field in the layout of the Reader
object. This means it is probable to land in the same cache line as the next object on the Heap. Since I am not sure about the order in which new objects are allocated on the Heap, I suggested in the comment below to pad both "hot" object types with references, which would be effective in separating the objects (At least, I hope it will, but it depends on how fields of the same type are sorted in memory).