I understand that the processor brings data into the cache via cache lines, which - for instance, on my Atom processor - brings in about 64 bytes at a time, whatever the siz
Processors may have multi-level caches (L1, L2, L3), and these differ on size and speed.
Yet, to understand what exactly goes into each cache you'll have to study the branch predictor used by that specific processor, and how the instructions/data of your program behave against it.
Read about branch predictor, CPU cache and replacement policies.
This is not an easy task. If at the end of the day all you want is a performance test, you can use a tool like Cachegrind. However, as this is a simulation, its result may differ at some degree.