I am working on a system, written in C++, running on a Xeon on Linux, that needs to run as fast as possible. There is a large data structure (basically an array of structs)
You might want to head over to http://agner.org/optimize/ and grab the optimization PDFs available there - there's a lot of good (low-level) information in there. Pretty focused on assembly language level, but there's lessons to be learned for C/C++ programmers as well.
Volume 3, "The microarchitecture of Intel, AMD and VIA CPUs" should be of interest :-)