I am working on a system, written in C++, running on a Xeon on Linux, that needs to run as fast as possible. There is a large data structure (basically an array of structs)
Old SO question that has some info that might be of use to you (in particular the first answer where to look for Linux CPU info - responder doesn't mention line size proper, but 'other info' on top of associativity etc). Question is for x86, but answers are more general. Worth a look.
Where is the L1 memory cache of Intel x86 processors documented?