I am working on a system, written in C++, running on a Xeon on Linux, that needs to run as fast as possible. There is a large data structure (basically an array of structs)
A cache line for any current Xeon processor is 64 bytes. One other thing that you might want to think about is the TLB. If you are really doing random accesses across 10GB of memory then you are likely to have a lot of TLB misses which can potentially be as costly as cache misses. You can get work around with with large pages, but it's something to keep in mind.